New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use wordnet instead of spell to find English words #3242
Conversation
Ok, I went for another tool in the case of the Python version: https://pypi.org/project/spylls/. Do we want to use the same tool for both? E.g https://pypi.org/project/wn/ instead. |
@theseion What's the advantage of wordnet over the spell/ispell/aspell family for us? @fzipi are you booked on using python for the PHP stuff or can we switch to shell and use wordnet there too? I do not see anything out of the ordinary in the python scripts. If we keep python, I think the gen.sh should be transferred to python as well. |
I don't mind using other scripting possibilities, shell included (if someone wants to port it). |
WordNet is a corpus with additional linguistics information. The other programs use dictionaries to correct spelling. These dictionaries also contain words that aren't really English words (because users would probably still want correction on them) like "xterm" and "comm". In addition, they weren't built for finding English words so their interfaces are really not built for solving our problem. I spent some time with
WordNet isn't perfect of course but it easily beat the other choices. Added benefit: WordNet is very easy to use in Python (e.g. with |
Primitive example of Python with # don't store data in home
export NLTK_DATA=/tmp/nltk_corpi import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet
words = set(wordnet.words())
'xterm' in words |
WordNet There does appear to be a Debian package, though, which maybe should be a good rule for what tooling we do or don't include and rely on. |
That is a point. But the features @theseion described make it very worthwhile in my eyes. Thus: Importance of FP > ease of installation of developer tool |
Fair point @RedXanadu. I did check for a Debian package. And since it's in Homebrew, it's available on macOS and Linux. And there's a Windows binary on the WordNet page too. |
Look over this and it looks good. The spell.sh is not overly beautiful, but improving that was not the goal of this PR. So all OK and I'll merge now. (@fzipi : I know you've been assigned during the meeting, but I had a few spare cycles.) |
No description provided.