Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore stop words in TFIDF canopy creation #59

Closed
derekeder opened this issue Dec 17, 2012 · 1 comment
Closed

ignore stop words in TFIDF canopy creation #59

derekeder opened this issue Dec 17, 2012 · 1 comment

Comments

@derekeder
Copy link
Contributor

for tokens that appear more than x% of the total number of tokens, ignore them when creating TFIDF canopies.

@derekeder
Copy link
Contributor Author

resolved by f5c5145

derekeder added a commit that referenced this issue Dec 18, 2012
+ storing idf and occurrences in memory (inverted_index dict)
+ storing raw tokens for corpus in token_vector
+ removed select_function from createCanopies (reading this info from memory now)
+ ignoring high occurrence stop words in TFIDF canopy creation - resolves #59
+ creating canopies for 100,000 records in ~300 seconds
derekeder added a commit that referenced this issue Jan 28, 2013
+ storing idf and occurrences in memory (inverted_index dict)
+ storing raw tokens for corpus in token_vector
+ removed select_function from createCanopies (reading this info from memory now)
+ ignoring high occurrence stop words in TFIDF canopy creation - resolves #59
+ creating canopies for 100,000 records in ~300 seconds
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant