Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: altairexpress (Python) #3

Open
10 of 18 tasks
tejasph opened this issue Mar 16, 2020 · 5 comments
Open
10 of 18 tasks

Submission: altairexpress (Python) #3

tejasph opened this issue Mar 16, 2020 · 5 comments
Assignees

Comments

@tejasph
Copy link

tejasph commented Mar 16, 2020


Submitting Authors: Tejas Phaterpekar(@tejasph), Lesley Miller (@aromatic-toast), Jack Tan (@thejacktan), Wenjiao Zou(@zouwenjiao)
Package Name: altairexpress
One-Line Description of Package: Provides efficient EDA plots in Altair
Repository Link: https://github.com/UBC-MDS/altairexpress
Version submitted: V1.1.16
Editor: Varada Kolhatkar (@kvarada)
Reviewer 1: Evhen Dytyniak (@evhend)
Reviewer 2: Katherine Birchard (@katieb1)
Archive: TBD
Version accepted: TBD


Description

  • Include a brief paragraph describing what your package does:

This package simplifies the process of conducting Exploratory Data Analysis (EDA) on new datasets. It is designed to allow the user to explore the data graphically as well as obtain some basic summary statistics, all by writing only one line of code. Plots are produced using the Altair package under the hood. As Jenny Bryan once said: “Someone has to write for-loops, but it doesn’t have to be you!”. This sentiment has been implemented here for EDA analysis. The user is able to spend more time on analyzing the dataset and less time on configuring complex Altair plot settings.

Scope

  • Please indicate which category or categories this package falls under:
    • Data retrieval
    • Data extraction
    • Data munging
    • Data deposition
    • Reproducibility
    • Geospatial
    • Education
    • Data visualization*

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.

  • Explain how the and why the package falls under these categories (briefly, 1-2 sentences):

Our package provides users with the ability to efficiently visualize their data with scatterplots, histograms, time-series, and Fourier transforms. These plots are also accompanied with axis transformation options and summary statistics.

  • Who is the target audience and what are scientific applications of this package?

altairexpress is meant for machine learning data scientists of all levels who want convenient tools to visualize their data both effectively and efficiently. The goal is to enable them to allocate more time to model planning/engineering.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

From our knowledge, there has yet to be a popular, well-known Altair EDA package. There may be other packages out there, but our package offers a variety of plots that can fit different data needs.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has an OSI approved license
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Publication options

No

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

@katieb1
Copy link

katieb1 commented Mar 17, 2020

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all user-facing functions
  • [] Examples for all user-facing functions
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • [] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.

Readme requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.
  • Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
  • Installation instructions
  • Any additional setup required (authentication tokens, etc)
  • Brief demonstration usage
  • Direction to more detailed documentation (e.g. your documentation files or website).
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages
  • Citation information

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: ~ 3 hours


Review Comments

Overall, very interesting idea for a package and really well done! Just a few notes below:

Documentation
  • Although the usage section has comprehensive examples, I think you need to provide the output (i.e. the resulting plot) to satisfy the "Vignette" criteria.

  • I enjoyed the "Package Walkthrough" section that documents each function contained in the altairexpress package. However, the README seemed a little repetitive in that there were multiple sections where there were usage examples, as you have both the package walkthrough demos and the usage section. I'm not sure how these differ (the same time series example is even used), so I would either amalgamate them or differentiate them more.

  • I did not put a check beside the example section because when trying to implement the examples on my local machine I could not run the examples from just copying and pasting (see further notes in functionality).

  • For the 'Metadata' section, I would just include the emails beside each package developer in the README and CONTRIBUTORS

  • All of the important elements of the README are present, although the order is not quite the same as outlined above.

Functionality
  • Installation worked!

  • When I tried to run the usage examples in my jupyter lab they did not work from just copying and pasting. I think they just had a few typos - such as not putting gdpPercap in quotes and capitalizating the 'c' in gdpPercap. Also, when importing altairexpress.hist as hist the function should be hist(), rather than altairexpress.hist() or else it throws an error. There were a lot of typos in the other functions as well, so I would just go through and proofread these. The docstrings were very helpful in figuring out how to use the hist and fourier_transform functions, though. However, I had to look at the function code to learn how to run the scatter and ts functions.

  • Once I managed to run the functions, they worked really well and did everything they said they would! This is a bit nitpicky, but when running the different functions I noticed a few inconsistencies between the styles of visualizations. For instance, some plots had titles while others did not. I would just do a little touch up to ensure that all plots contain the same elements.

  • All tests passed!

@evhend
Copy link

evhend commented Mar 21, 2020

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all user-facing functions
  • Examples for all user-facing functions
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.

Readme requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.
  • Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
  • Installation instructions
  • Any additional setup required (authentication tokens, etc)
  • Brief demonstration usage
  • Direction to more detailed documentation (e.g. your documentation files or website).
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages
  • Citation information

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3


Review Comments

  • Read the Docs

  • "From Sources" installation links are broken (not certain if the links provided here are correct).

  • "Usage" contains no examples.

  • README

  • "Usage" is a bit confusing - I'd like to see some explanations in this section.

    • "Package Walkthrough"
  • no fourier_transform working example

    • Function names in descriptions do not match function calls in usage examples:
  • gghist() but hist() is called

  • scatter_express() but scatter() is called

  • ts_plot() but ts_alt() is called

  • It may be useful to provide an example of the outputs of each function in the "Usage" or "Package Walkthrough"

  • Include a description of the returns of each function in the function descriptions

    • Dependencies:
  • not all package dependencies are included; my installation failed because I did not have statsmodels, vega-datasets, and gapminder installed

  • altair dependency version not correct (>4 required)

  • Installation

  • installation failed when running the command in the README, should instead be:

    • pip install -i https://test.pypi.org/simple/ altairexpress
  • Functionality

    • hist
  • Package Walk Through has incorrect call after import, should be:

    - `hist(gapminder, 'lifeExp')`      
    
  • the text output of certain "variables" are overlaid and are illegible

    • make_scatter
  • Package Walk Through has incorrect import call and function call, should be:

    - `... import make_scatter`    
    - `make_scatter(...)`    
    
  • Package documentation on Read the Docs indicates a log transformation parameters of x_logtransform() and y_logtransform(), however, actual function parameters do not include the "log"

    • ts_alt
  • Package Walk Through has incorrect import call, should be:

  • from altairexpress.ts import ts_alt

    • fourier_transform performs as documented
  • Testing

    • Coverage is excellent.
    • The test functions, in general, are difficult to follow.
  • Some of your test functions have quite a bit of commented out code - if this is unnecessary or broken code, there is no need to include it.

  • General Comments

  • I like the idea of streamlining some of EDA process in some easy to implement functions. The make_scatter function is particularly effective and makes the hist function somewhat redundant (instead of both functions, perhaps overlay the summary statistics from hist on to make_scatter). Further, the ability to log transform the data is quite useful.

  • The ts_alt function is an effective visual, but if the data is being decomposed, I would like to have access to this transformed data. In addition to returning the plots, perhaps consider also returning a dictionary of trend, seasonal, and residual components.

Overall, I enjoyed working my way through your package; I think there are some improvements that can be made, some of which are suggested, but the practical use of the package is evident.

@tejasph
Copy link
Author

tejasph commented Mar 24, 2020

Thanks for the insightful feedback @katieb1 and @evhend. We will work through all your suggestions.

@tejasph
Copy link
Author

tejasph commented Mar 26, 2020

From Evhen's comments I have implement the following changes in our latest version:

  1. Corrected the hist() import call in the Readme Package Walkthrough

  2. Corrected the make_scatter import call in the Readme Package Walkthrough Section

  3. In Read the docs, corrected a miss-specified in the make_scatter function. The defined variables now line up accurately with variables found inside the function.

  4. Corrected incorrect import, changing ts to ts_alt

  5. fixed installation command; the install still fails for me though with the following errors:

ERROR: Could not find a version that satisfies the requirement datetools<2.0,>=1.1 (from altairexpress) (from versions: none)
ERROR: No matching distribution found for datetools<2.0,>=1.1 (from altairexpress)

According to someone on Slack, we can't account for these errors unless we have a setup.py or a requirements.txt file. Instead, the user must upgrade the packages that are being referred to in the error. We are still trying to find a fix for this.

And the following is our latest release which has addressed the feedback above:
v 1.1.24

@aromatic-toast
Copy link

All comments that have been checked off under Katie's review that has been addressed. Please see her review comments on details of the action items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants