Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coRPysprofiling (Python) #38

Open
11 of 22 tasks
ssyayayy opened this issue Mar 19, 2021 · 2 comments
Open
11 of 22 tasks

coRPysprofiling (Python) #38

ssyayayy opened this issue Mar 19, 2021 · 2 comments

Comments

@ssyayayy
Copy link

ssyayayy commented Mar 19, 2021

Submitting Author: Anita Li (@AnitaLi-0371), Elanor Boyle-Stanley (@eboylestanley), Junghoo Kim (@jkim222383), Ivy Zhang (@ssyayayy)
Package Name: coRPysprofiling
One-Line Description of Package: coRPysprofiling performs EDA and EDV on text
Repository Link: https://github.com/UBC-MDS/coRPysprofiling/tree/0.1.6
Version submitted: 0.1.6
Editor: Tiffany Timbers(@ttimbers )
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD


Description

  • coRPysprofiling is an open-source library designed to bring exploratory data analysis and visualization to the domain of natural language processing. Functions in the package will be used to provide some elementary statistics and visualizations for a single text corpus or provide functions to compare multiple corpora with each other.

Scope

  • Please indicate which [category or categories][PackageCategories] this package falls under:

    • Data retrieval
    • Data extraction
    • Data munging
    • Data deposition
    • Reproducibility
    • Geospatial
    • Education
    • Data visualization*
  • Explain how the and why the package falls under these categories (briefly, 1-2 sentences):

The core functionalities for coRPysprofiling are to provide elementary statistics and visualizations for a single text corpus, and to compare multiple corpora with each other. It can also download and load pretrained word2vector models from github repository.

  • Who is the target audience and what are scientific applications of this package?

The target audience can be talent acquisition specialists who want to quickly retrieve valuable information from resume texts or compare text from two resumes.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

To our knowledge, while wordcloud library generates wordcloud visualization for a given corpus, there is no general-purpose library for exploratory analysis and visualization of a text corpus in the Python ecosystem. There are several advanced libraries for comparing similarities between different corpora: most notably, gensim provides similarity comparison between large corpora using word embeddings. We believe that coRPysprofiling will provide some useful functionality for exploratory analysis and visualization and help bridge the gap between elementary text analysis to more sophisticated approaches utilizing word embeddings.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Technical checks

For details about the pyOpenSci packaging requirements, see our [packaging guide][PackagingGuide]. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has an [OSI approved license][OsiApprovedLicense].
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Publication options

  • Do you wish to automatically submit to the [Journal of Open Source Software][JournalOfOpenSourceSoftware]? If so:
JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

  • I agree to abide by [pyOpenSci's Code of Conduct][PyOpenSciCodeOfConduct] during the review process and in maintaining my package should it be accepted.
@vigneshRajakumar
Copy link

vigneshRajakumar commented Mar 25, 2021

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

I have worked with one of the team members on a project in the past but I do not have any conflicts of interest.

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all user-facing functions
  • Examples for all user-facing functions
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.

Readme requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
    • Creative!
  • Badges for continuous integration and test coverage, a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the badge for pyOpenSci peer-review will be provided upon acceptance.)
  • Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
  • Installation instructions
    • This might need an update though, see below
  • Any additional setup required (authentication tokens, etc)
  • Brief demonstration usage
    • This part is excellent
  • Direction to more detailed documentation (e.g. your documentation files or website).
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider:

  • The documentation is easy to find and understand
  • The need for the package is clear
  • All functions have documentation and associated examples for use

Functionality

  • Installation: Installation succeeds as documented.
    • I made some suggestions to the installation guide below!
  • Functionality: Any functional claims of the software been confirmed.
    • This works but you will have to update the README a bit
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:


Review Comments

Hey Team!

Overall, great job! lots of tests, good documentation and a comprehensive README. I just had a couple of issues getting it to work out of the box but these are pretty quick fixes. I've gone ahead and added some recommended changes to my review comments. The changes requested are all in the README and should be rather simple to incorporate.

Installation

I had some issues with the installation:

You might need to include vanilla PyPi as an extra index to be able to install all of your dependencies. Some packages don't seem to publish the same versions to both TestPyPi and PyPi.
I suggest editing the install instructions to this:

pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple corpysprofiling

Even after I'd done this, I had to update my visual C++ to get the install working. Maybe you could list visual C++ 14.0 as hard dependency?

Functionality

I tested the functionality and it works as detailed. You might need to update the usage to:

from corpysprofiling import corpysprofiling

and update the functions calls to use corpysprofiling.method_name() instead of just method_name()

As it stands right now, copy-pasting from the usage section would mean the user would run into errors when running the examples

Testing

Good coverage!

You're using try-except to check for error handling. Maybe consider replacing it with the recommended pytest way

@huan-ds
Copy link

huan-ds commented Mar 25, 2021

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all user-facing functions
  • Examples for all user-facing functions
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.

Readme requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
    • I made a review comment
  • Badges for continuous integration and test coverage, a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the badge for pyOpenSci peer-review will be provided upon acceptance.)
  • Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
  • Installation instructions
    • I made a review comment
  • Any additional setup required (authentication tokens, etc)
  • Brief demonstration usage
  • Direction to more detailed documentation (e.g. your documentation files or website).
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider:

  • The documentation is easy to find and understand
  • The need for the package is clear
  • All functions have documentation and associated examples for use

Functionality

  • Installation: Installation succeeds as documented.
    • I made a review comment
  • Functionality: Any functional claims of the software been confirmed.
    • I made a review comment
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3 hours


Review Comments

Hi, team,

Good job on this Python Package! This is really useful and clear detailed and organized. There are just a few comments that you might want to consider:

Package name:
You use the same package name for R and Python pacakges. I feel it is a bit confusing.

Usage:
As Vignesh mentioned above, the installation of import corpysprofiling does not work. You might need to update to from corpysprofiling import corpysprofiling.

Functionality:
I tested function corpus_analysis and corpus_viz. They work well. Unfortunately, I got an error message when testing the other functions:

AttributeError: 'Logger' object has no attribute 'getLogger'

Test:
There are a lot of test cases, which is great!

Nice work on this project. I enjoyed reviewing it. Let me know if you have any questions about this feedback.

Huanhuan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants