Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop Words suggestion #5

Open
DruidSmith opened this issue Aug 9, 2016 · 5 comments
Open

Stop Words suggestion #5

DruidSmith opened this issue Aug 9, 2016 · 5 comments

Comments

@DruidSmith
Copy link

There isn't much documentation on how to use the stop-words list - and would it make sense to add the capability to use a custom stop-word list rather than having to modify an existing one? Or does that capability already exist?

@davidmogar
Copy link
Owner

You are more than right about the custom lists. At the moment there is nothing like that bu could be added easily. I'll find some time to do it. Thanks for your suggestion ;)

@davidmogar
Copy link
Owner

I should make this easier, but you could find the path to stop-words files and create a file named stop-custom. After that you should only set the language to custom when initialising Normalizr:

from normalizr import Normalizr

normalizr = Normalizr(language='custom')

I'm leaving this issue open till I decide what to do ;)

@DruidSmith
Copy link
Author

DruidSmith commented Sep 6, 2016

Thanks, will give it a try.

@JasonCrowe
Copy link

Adding my 2 cents...

I don't have a use for my own custom stopword list, but it would be nice to be able to add words to the stop list with the normalization settings. IE.

normalizations = [
    'remove_extra_whitespaces',
    'remove_stop_words',
    ('add_stopwords', [ 'stopword_1',  'stopword_2',  'stopword_3',  'stopword_4', ])
]

@davidmogar
Copy link
Owner

davidmogar commented Apr 17, 2018

Makes sense. I'll think about it and come up with something. I still have to find some time to implement changes in cucco (really needed ones).

EDIT: Thank you btw!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants