Migrate to sklearn #52

marksverdhei · 2022-04-02T22:50:35Z

Replace nltk based bayes model with sklearn.
This also changes parameters in the model so we limit its vocabulary by 5000 tokens.
This is most likely the reason why the nltk bayes model blew up
With new model and vectorizer, it's about 25mb. Is that ok?

I also think the scripts should be rewritten for a different pr. I see clearly that I have learned a lot about readable and idiomatic python code
since i wrote this lol

This closes issue #35

LBlend

Code looks good. Just some nitpicking about typing and import sorting. No worries though. I''ll take care of it!

I'll take care of the pep8 issues as well while I'm at it

LBlend · 2022-04-03T19:46:04Z

src/scripts/train_bayes.py

-    tokens: list[str], trigrams: bool = False, use_stopwords: bool = False
-) -> dict[str, bool | None]:
-    normalized = (token.lower() for token in tokens if token not in punctuation)
+def read_file(path):


needs typing

LBlend · 2022-04-03T19:46:45Z

src/scripts/train_bayes.py

 import nltk
 import pickle
 from string import punctuation
+from numpy import vectorize


sort imports alphabetically

Migrate to sklearn

6e94a59

LBlend self-requested a review April 3, 2022 16:16

LBlend added this to In progress in Website via automation Apr 3, 2022

LBlend linked an issue Apr 3, 2022 that may be closed by this pull request

Replace all NLTK with scikit-learn #35

Closed

LBlend moved this from In progress to Ready for review in Website Apr 3, 2022

LBlend requested changes Apr 3, 2022

View reviewed changes

LBlend added 4 commits April 3, 2022 21:53

typing

f72af26

sort imports

d492941

remove unused dependencies

90d836d

pep8

17a5641

LBlend self-requested a review April 3, 2022 20:39

Website automation moved this from Ready for review to Dev - Testing stage Apr 3, 2022

LBlend approved these changes Apr 3, 2022

View reviewed changes

LBlend merged commit a380e9d into dev Apr 3, 2022

Website automation moved this from Dev - Testing stage to Done Apr 3, 2022

LBlend deleted the markus/migrate-sklearn branch April 3, 2022 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to sklearn #52

Migrate to sklearn #52

marksverdhei commented Apr 2, 2022 •

edited

Loading

LBlend left a comment •

edited

Loading

LBlend Apr 3, 2022

LBlend Apr 3, 2022

Migrate to sklearn #52

Migrate to sklearn #52

Conversation

marksverdhei commented Apr 2, 2022 • edited Loading

LBlend left a comment • edited Loading

Choose a reason for hiding this comment

LBlend Apr 3, 2022

Choose a reason for hiding this comment

LBlend Apr 3, 2022

Choose a reason for hiding this comment

marksverdhei commented Apr 2, 2022 •

edited

Loading

LBlend left a comment •

edited

Loading