Permalink
Browse files

Build stoplists from document collections (#686)

* Add different bases to corpus-based stop module

* Update .coveragerc

* Update stop tests

* Update target lists

* Add frequency basis to stop

* Update corpus stop for basis frequency

* Add numpy, sklearn, and scipy to requirements

* Fix pip install name

* Update stops; increase test coverage

* Remove set union

* Add tests for checking package imports

* Remove old test strings

* Cleanup test

* Fix to have correct license

* Add else to basis check

* Add docstring

* Add option for inc_counts for string

* Clean up stop.py

* Update docs

* Cleanup stoplist docs

* Fix typo in stoplist docs

* Fix typo

* Add tf-idf to CorpusStoplist bases

* Add test for tfidf; fix test for zou
  • Loading branch information...
diyclassics authored and kylepjohnson committed Feb 22, 2018
1 parent 87f5f01 commit f4070dd2a65706874c4bcf4f0fe3c31a0138f93f
Showing with 454 additions and 142 deletions.
  1. +9 −1 .coveragerc
  2. +1 −0 .travis.yml
  3. +260 −139 cltk/stop/stop.py
  4. +123 −2 cltk/tests/test_stop.py
  5. +61 −0 docs/multilingual.rst
@@ -1,4 +1,12 @@
[report]
omit =
*/python?.?/*
*/site-packages/nose/*
*/site-packages/nose/*
exclude_lines =
pragma: no cover
def __repr__
if self\.debug
if 0:
if __name__ == .__main__.:
# raise AssertionError
# raise NotImplementedError
@@ -20,6 +20,7 @@ before_script:
- pip install fuzzywuzzy
- pip install python-Levenshtein
- pip install gensim # for word2vec.py
- pip install numpy scipy scikit-learn
script:
# Notes on nose:
Oops, something went wrong.

0 comments on commit f4070dd

Please sign in to comment.