Calling nlp() on an article causes 'tokenizers/punkt/english.pickle' Not Found Error #1

codelucas · 2013-12-21T09:38:32Z

I know the fix to this, will wait for tomorrow to implement it, it's late. I'll have the setup.py install the required nltk tokenizers.

codelucas · 2013-12-29T02:09:02Z

Closing, if users want nlp() features they can just run an extra line to download some corpus files specified in the README.

Require BeautifulSoup4 so that pip3 install works.

Fulltext extraction improvement #1

post_cleanup more lenient, `<li>` => newlines, less strict outputformatting, remove trailing media after article

updating with exceptions for top_node = None

codelucas closed this as completed Dec 29, 2013

codelucas pushed a commit that referenced this issue Dec 17, 2014

Merge pull request #1 from queenvictoria/master

60a9536

Require BeautifulSoup4 so that pip3 install works.

codelucas added a commit that referenced this issue Jan 15, 2015

Merge pull request #106 from codelucas/fulltext-improvement-1

c5eefcb

Fulltext extraction improvement #1

codelucas added a commit that referenced this issue Jan 22, 2015

Improve full-text extraction #1

7a6afc2

post_cleanup more lenient, `<li>` => newlines, less strict outputformatting, remove trailing media after article

hartym added a commit to hartym/newspaper that referenced this issue Jan 3, 2017

Merge pull request codelucas#1 from kjam/master

4891a39

updating with exceptions for top_node = None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling nlp() on an article causes 'tokenizers/punkt/english.pickle' Not Found Error #1

Calling nlp() on an article causes 'tokenizers/punkt/english.pickle' Not Found Error #1

codelucas commented Dec 21, 2013

codelucas commented Dec 29, 2013

Calling nlp() on an article causes 'tokenizers/punkt/english.pickle' Not Found Error #1

Calling nlp() on an article causes 'tokenizers/punkt/english.pickle' Not Found Error #1

Comments

codelucas commented Dec 21, 2013

codelucas commented Dec 29, 2013