New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build stoplists from document collections #686

Merged
merged 25 commits into from Feb 22, 2018

Conversation

Projects
None yet
3 participants
@diyclassics
Copy link
Contributor

diyclassics commented Feb 15, 2018

Second round of code for a generalized (i.e. non-language-specific) Stop module, building on the work from #600. Code here for building a stoplist from a "corpus", i.e. a document collection. Like the StringStoplist class, it is customizable through parameters such as length and preprocessing options.

Also includes tests for CorpusStoplist and improved coverage of StringStoplist.

@diyclassics diyclassics changed the title Build stopwords lists from document collections Build stoplists from document collections Feb 15, 2018

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Feb 15, 2018

Codecov Report

Merging #686 into master will increase coverage by 0.93%.
The diff coverage is 95.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #686      +/-   ##
==========================================
+ Coverage   86.65%   87.59%   +0.93%     
==========================================
  Files         145      145              
  Lines        8612     8689      +77     
==========================================
+ Hits         7463     7611     +148     
+ Misses       1149     1078      -71
Impacted Files Coverage Δ
cltk/tests/test_stop.py 100% <100%> (+1.12%) ⬆️
cltk/stop/stop.py 92.16% <93.02%> (+0.86%) ⬆️
cltk/prosody/latin/Verse.py 75.86% <0%> (-2.27%) ⬇️
cltk/tests/test_corpus.py 99.16% <0%> (+0.2%) ⬆️
cltk/corpus/greek/tlg/parse_tlg_indices.py 81.59% <0%> (+0.34%) ⬆️
cltk/tests/test_phonology.py 100% <0%> (+0.36%) ⬆️
cltk/tests/test_stem.py 100% <0%> (+0.38%) ⬆️
cltk/corpus/utils/importer.py 58.95% <0%> (+0.4%) ⬆️
cltk/ir/query.py 62.77% <0%> (+0.63%) ⬆️
cltk/tests/test_tag.py 98% <0%> (+0.63%) ⬆️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87f5f01...21663b2. Read the comment docs.

diyclassics and others added some commits Feb 15, 2018

@kylepjohnson
Copy link
Member

kylepjohnson left a comment

Looks really clean, I got it right away.

@kylepjohnson

This comment has been minimized.

Copy link
Member

kylepjohnson commented Feb 22, 2018

@diyclassics I'll wait for the build server to finish (updated the branch) and will merge.

Just to be sure, the entire project won't fail if sklearn is not installed, right? I can pull and check myself, too.

@diyclassics

This comment has been minimized.

Copy link
Contributor Author

diyclassics commented Feb 22, 2018

It shouldn't fail—there are try blocks for the numpy & sklearn imports. Wouldn't hurt to test on another machine though.

@kylepjohnson

This comment has been minimized.

Copy link
Member

kylepjohnson commented Feb 22, 2018

Just checked locally, works just like you say.

@kylepjohnson kylepjohnson merged commit f4070dd into cltk:master Feb 22, 2018

3 checks passed

codecov/patch 95.45% of diff hit (target 86.65%)
Details
codecov/project 87.59% (+0.93%) compared to 87f5f01
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@diyclassics diyclassics deleted the diyclassics:stops-dev branch Feb 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment