Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add new language #26

Closed
MukhtarShaima opened this issue Nov 17, 2018 · 5 comments
Closed

How to add new language #26

MukhtarShaima opened this issue Nov 17, 2018 · 5 comments
Labels
question Further information is requested

Comments

@MukhtarShaima
Copy link

Will you please give me clear instructions or steps ,so that I can add Urdu language,as I'm not able to download the Urdu file from that link which you mentioned.

@barrust
Copy link
Owner

barrust commented Nov 17, 2018

Sure, the steps to generating a new language are fairly straight forward:

  1. Download the set of words that should be added to the dictionary
  2. Load that file into a dictionary; this will depend on your source. The key, is to turn this into a dictionary in the form: key=word, val=frequency as an int.

If your data in in a dictionary form, you can load it like so:

from spellchecker import SpellChecker
spell = SpellChecker(language=None)
spell.word_frequency.load_dictionary(file_to_dictionary)
spell.export(location_for_export)

If you only have txt files with words, etc, you can just load those words directly and have spellchecker build the word frequency for you:

from spellchecker import SpellChecker
spell = SpellChecker(language=None)
spell.word_frequency.load_text_file(path_to_text_file)
spell.export(location_for_export)

Once you have exported the dictionary (really a word frequency list), you can then load that dictionary when you wish to use spellchecker:

from spellchecker import SpellChecker
spell = SpellChecker(language=None, local_dictionary=location_from_export)

@barrust barrust added the question Further information is requested label Nov 19, 2018
@MukhtarShaima
Copy link
Author

Thanx for the clear instructions,I had successfully loaded my text file.
Now the problem is it does not give me correct answers
eg:
for word in misspelled:
# Get the one most likely answer
print(spell.correction(word))
it should return the correct or most likely word,but sometimes it gives me wrong word in the misspelled,
or it returns the whole misspelled string.
Thank you.

@barrust
Copy link
Owner

barrust commented Nov 20, 2018

That is likely due to a few different possible issues.

  1. If you do not have frequency, i.e., everything is set to 1 (or the same thing). Try something like:
 # return those that are within the specified distance
print(spell.candidates(word)) 
  1. If the distance between the word you are trying to correct is greater than 2, then it will not work and it will return the word, as is.

Honestly, I have never tried this with non-latin character languages so I am unsure how it will perform.

@barrust
Copy link
Owner

barrust commented Jan 6, 2019

@MukhtarShaima Let me know if you are still having issues, otherwise, I am going to close this one!

Thanks!

@barrust barrust closed this as completed Jan 6, 2019
@ryuzakinho
Copy link

Hi,

From my understanding, we can load JSON formatted dictionaries or text documents that will be used for building the frequency list.

I would like to directly use the word frequency lists available here (Word Frequency): https://github.com/hermitdave/FrequencyWords/tree/master/content/2018/fi

These are txt files containing frequencies. Is there a way to directly load such files or do I need to convert them to JSON first?

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants