-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange words - bug? #59
Comments
@NadavB I suspect the apostrophes may have been removed by mistake. Feel free to re-add them in. PR gladly accepted. |
How? |
@NadavB As far as I can tell, it looks like the process would be ad hoc for this project. Open |
But where is the current code that generated words_alpha.txt from words.txt so we can modify it? |
I don't think it was ever committed. What I see in the history is that someone just added a words_alpha file, and other people modified it directly. |
Also is "giggish" actually a word? |
@LameLemon I couldn't find a definition for "giggish," and it looks like it came from the original infochimps dataset. You can probably remove it. To address to the original issue of "are strange words a bug," I think we should say no and close the thread. The underlying reason for the presence of nonwords is the choice of data sources. More carefully curated corpora either cost more or have fewer words. |
@dbrakman so can you commit it please? Otherwise people can't contribute to it... |
@NadavB I understand why it should be committed, but I don't have that script. I didn't make these lists. |
Ahh, I understand. So if someone from the authors see this thread, please commit, thanks... |
@dbrakman It won't help. The word "aren" is found in words.txt as well. So unless someone show how the file words.txt was extracted from the corpus, I don't think this whole repository is usable at all. |
'aaa' isn't a word either |
H |
Example chooser added Re-allowed non-es6 browsers, as this repo will stay es5 for backwards compatibility Removed examples that were v5 only Use terser rather than uglify-js, better maintained Only grab v4 tags. Default to 'release'
Yes it is: https://www.wordnik.com/words/giggish |
There are words like "isn" "aren" "wouldn" - smells like a bug?
The text was updated successfully, but these errors were encountered: