Submission: altairexpress (Python) #3

tejasph · 2020-03-16T19:42:48Z

Submitting Authors: Tejas Phaterpekar(@tejasph), Lesley Miller (@aromatic-toast), Jack Tan (@thejacktan), Wenjiao Zou(@zouwenjiao)
Package Name: altairexpress
One-Line Description of Package: Provides efficient EDA plots in Altair
Repository Link: https://github.com/UBC-MDS/altairexpress
Version submitted: V1.1.16
Editor: Varada Kolhatkar (@kvarada)
Reviewer 1: Evhen Dytyniak (@evhend)
Reviewer 2: Katherine Birchard (@katieb1)
Archive: TBD
Version accepted: TBD

Description

Include a brief paragraph describing what your package does:

This package simplifies the process of conducting Exploratory Data Analysis (EDA) on new datasets. It is designed to allow the user to explore the data graphically as well as obtain some basic summary statistics, all by writing only one line of code. Plots are produced using the Altair package under the hood. As Jenny Bryan once said: “Someone has to write for-loops, but it doesn’t have to be you!”. This sentiment has been implemented here for EDA analysis. The user is able to spend more time on analyzing the dataset and less time on configuring complex Altair plot settings.

Scope

Please indicate which category or categories this package falls under:
- Data retrieval
- Data extraction
- Data munging
- Data deposition
- Reproducibility
- Geospatial
- Education
- Data visualization*

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.

Explain how the and why the package falls under these categories (briefly, 1-2 sentences):

Our package provides users with the ability to efficiently visualize their data with scatterplots, histograms, time-series, and Fourier transforms. These plots are also accompanied with axis transformation options and summary statistics.

Who is the target audience and what are scientific applications of this package?

altairexpress is meant for machine learning data scientists of all levels who want convenient tools to visualize their data both effectively and efficiently. The goal is to enable them to allocate more time to model planning/engineering.

Are there other Python packages that accomplish the same thing? If so, how does yours differ?

From our knowledge, there has yet to be a popular, well-known Altair EDA package. There may be other packages out there, but our package offers a variety of plots that can fit different data needs.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
has an OSI approved license
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a vignette with examples of its essential functions and uses.
has a test suite.
has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Publication options

Do you wish to automatically submit to the Journal of Open Source Software? If so:

No

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

The text was updated successfully, but these errors were encountered:

katieb1 · 2020-03-17T04:14:44Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all user-facing functions
[] Examples for all user-facing functions
Community guidelines including contribution guidelines in the README or CONTRIBUTING.
[] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.

Readme requirements
The package meets the readme requirements below:

Package has a README.md file in the root directory.

The README should include, from top to bottom:

The package name
Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.
Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
Installation instructions
Any additional setup required (authentication tokens, etc)
Brief demonstration usage
Direction to more detailed documentation (e.g. your documentation files or website).
If applicable, how the package compares to other similar packages and/or how it relates to other packages
Citation information

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software
Authors: A list of authors with their affiliations
A statement of need clearly stating problems the software is designed to solve and its target audience.
References: with DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: ~ 3 hours

Review Comments

Overall, very interesting idea for a package and really well done! Just a few notes below:

Documentation

Although the usage section has comprehensive examples, I think you need to provide the output (i.e. the resulting plot) to satisfy the "Vignette" criteria.
I enjoyed the "Package Walkthrough" section that documents each function contained in the altairexpress package. However, the README seemed a little repetitive in that there were multiple sections where there were usage examples, as you have both the package walkthrough demos and the usage section. I'm not sure how these differ (the same time series example is even used), so I would either amalgamate them or differentiate them more.
I did not put a check beside the example section because when trying to implement the examples on my local machine I could not run the examples from just copying and pasting (see further notes in functionality).
For the 'Metadata' section, I would just include the emails beside each package developer in the README and CONTRIBUTORS
All of the important elements of the README are present, although the order is not quite the same as outlined above.

Functionality

Installation worked!
When I tried to run the usage examples in my jupyter lab they did not work from just copying and pasting. I think they just had a few typos - such as not putting gdpPercap in quotes and capitalizating the 'c' in gdpPercap. Also, when importing altairexpress.hist as hist the function should be hist(), rather than altairexpress.hist() or else it throws an error. There were a lot of typos in the other functions as well, so I would just go through and proofread these. The docstrings were very helpful in figuring out how to use the hist and fourier_transform functions, though. However, I had to look at the function code to learn how to run the scatter and ts functions.
Once I managed to run the functions, they worked really well and did everything they said they would! This is a bit nitpicky, but when running the different functions I noticed a few inconsistencies between the styles of visualizations. For instance, some plots had titles while others did not. I would just do a little touch up to ensure that all plots contain the same elements.
All tests passed!

evhend · 2020-03-21T19:47:56Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all user-facing functions
Examples for all user-facing functions
Community guidelines including contribution guidelines in the README or CONTRIBUTING.
Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.

Readme requirements
The package meets the readme requirements below:

Package has a README.md file in the root directory.

The README should include, from top to bottom:

The package name
Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.
Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
Installation instructions
Any additional setup required (authentication tokens, etc)
Brief demonstration usage
Direction to more detailed documentation (e.g. your documentation files or website).
If applicable, how the package compares to other similar packages and/or how it relates to other packages
Citation information

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software
Authors: A list of authors with their affiliations
A statement of need clearly stating problems the software is designed to solve and its target audience.
References: with DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3

Review Comments

Read the Docs
"From Sources" installation links are broken (not certain if the links provided here are correct).
"Usage" contains no examples.
README
"Usage" is a bit confusing - I'd like to see some explanations in this section.
- "Package Walkthrough"
no fourier_transform working example
- Function names in descriptions do not match function calls in usage examples:
gghist() but hist() is called
scatter_express() but scatter() is called
ts_plot() but ts_alt() is called
It may be useful to provide an example of the outputs of each function in the "Usage" or "Package Walkthrough"
Include a description of the returns of each function in the function descriptions
- Dependencies:
not all package dependencies are included; my installation failed because I did not have statsmodels, vega-datasets, and gapminder installed
altair dependency version not correct (>4 required)
Installation
installation failed when running the command in the README, should instead be:
- pip install -i https://test.pypi.org/simple/ altairexpress
Functionality
- hist
Package Walk Through has incorrect call after import, should be:
```
- `hist(gapminder, 'lifeExp')`      
```
the text output of certain "variables" are overlaid and are illegible
- make_scatter
Package Walk Through has incorrect import call and function call, should be:
```
- `... import make_scatter`    
- `make_scatter(...)`    
```
Package documentation on Read the Docs indicates a log transformation parameters of x_logtransform() and y_logtransform(), however, actual function parameters do not include the "log"
- ts_alt
Package Walk Through has incorrect import call, should be:
from altairexpress.ts import ts_alt
- fourier_transform performs as documented
Testing
- Coverage is excellent.
- The test functions, in general, are difficult to follow.
Some of your test functions have quite a bit of commented out code - if this is unnecessary or broken code, there is no need to include it.
General Comments
I like the idea of streamlining some of EDA process in some easy to implement functions. The make_scatter function is particularly effective and makes the hist function somewhat redundant (instead of both functions, perhaps overlay the summary statistics from hist on to make_scatter). Further, the ability to log transform the data is quite useful.
The ts_alt function is an effective visual, but if the data is being decomposed, I would like to have access to this transformed data. In addition to returning the plots, perhaps consider also returning a dictionary of trend, seasonal, and residual components.

Overall, I enjoyed working my way through your package; I think there are some improvements that can be made, some of which are suggested, but the practical use of the package is evident.

tejasph · 2020-03-24T16:17:38Z

Thanks for the insightful feedback @katieb1 and @evhend. We will work through all your suggestions.

tejasph · 2020-03-26T14:59:01Z

From Evhen's comments I have implement the following changes in our latest version:

Corrected the hist() import call in the Readme Package Walkthrough
Corrected the make_scatter import call in the Readme Package Walkthrough Section
In Read the docs, corrected a miss-specified in the make_scatter function. The defined variables now line up accurately with variables found inside the function.
Corrected incorrect import, changing ts to ts_alt
fixed installation command; the install still fails for me though with the following errors:

ERROR: Could not find a version that satisfies the requirement datetools<2.0,>=1.1 (from altairexpress) (from versions: none)
ERROR: No matching distribution found for datetools<2.0,>=1.1 (from altairexpress)

According to someone on Slack, we can't account for these errors unless we have a setup.py or a requirements.txt file. Instead, the user must upgrade the packages that are being referred to in the error. We are still trying to find a fix for this.

And the following is our latest release which has addressed the feedback above:
v 1.1.24

aromatic-toast · 2020-03-26T22:57:08Z

All comments that have been checked off under Katie's review that has been addressed. Please see her review comments on details of the action items.

tejasph assigned katieb1 and evhend Mar 16, 2020

SamEdwardes mentioned this issue Mar 16, 2020

Submission: pylaundry (Python) #14

Closed

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submission: altairexpress (Python) #3

Submission: altairexpress (Python) #3

tejasph commented Mar 16, 2020 •

edited

katieb1 commented Mar 17, 2020 •

edited by aromatic-toast

evhend commented Mar 21, 2020 •

edited by aromatic-toast

tejasph commented Mar 24, 2020

tejasph commented Mar 26, 2020 •

edited by zouwenjiao

aromatic-toast commented Mar 26, 2020

Submission: altairexpress (Python) #3

Submission: altairexpress (Python) #3

Comments

tejasph commented Mar 16, 2020 • edited

Description

Scope

Technical checks

Publication options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Code of conduct

Editor and Review Templates

katieb1 commented Mar 17, 2020 • edited by aromatic-toast

Package Review

Documentation

Functionality

For packages co-submitting to JOSS

Final approval (post-review)

Review Comments

Documentation

Functionality

evhend commented Mar 21, 2020 • edited by aromatic-toast

Package Review

Documentation

Functionality

For packages co-submitting to JOSS

Final approval (post-review)

Review Comments

tejasph commented Mar 24, 2020

tejasph commented Mar 26, 2020 • edited by zouwenjiao

aromatic-toast commented Mar 26, 2020

tejasph commented Mar 16, 2020 •

edited

katieb1 commented Mar 17, 2020 •

edited by aromatic-toast

evhend commented Mar 21, 2020 •

edited by aromatic-toast

tejasph commented Mar 26, 2020 •

edited by zouwenjiao