Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'hinglish' #200

Closed
upasana-mittal opened this issue Jul 7, 2022 · 7 comments
Closed

KeyError: 'hinglish' #200

upasana-mittal opened this issue Jul 7, 2022 · 7 comments

Comments

@upasana-mittal
Copy link

upasana-mittal commented Jul 7, 2022

I am getting this error while importing pke

get_alpha_2 = lambda l: LANGUAGE_CODE_BY_NAME[l] KeyError: 'hinglish'

     File "/app/model/src/analysis/AnalysisService.py", line 6, in <module>
  from pke.unsupervised import TextRank, TopicRank, SingleRank
File "/usr/local/lib/python3.7/site-packages/pke/__init__.py", line 5, in <module>
  from pke.base import LoadFile
File "/usr/local/lib/python3.7/site-packages/pke/base.py", line 31, in <module>
  lang_stopwords = {get_alpha_2(l): l for l in stopwords._fileids}
File "/usr/local/lib/python3.7/site-packages/pke/base.py", line 31, in <dictcomp>
  lang_stopwords = {get_alpha_2(l): l for l in stopwords._fileids}
File "/usr/local/lib/python3.7/site-packages/pke/base.py", line 29, in <lambda>
  get_alpha_2 = lambda l: LANGUAGE_CODE_BY_NAME[l]
KeyError: 'hinglish'`
@atabas
Copy link

atabas commented Jul 11, 2022

I'm getting the same error...does anyone know what's wrong?

@ajithb073
Copy link

Reason for KeyError: Pke library requires nltk library for the language codes. In pke's "langcodes.py" there is absence of language code for 'hinglish'.

Solution: In the home location, the "nltk_data" folder will be present. Inside nltk_data/corpora/stopwords there will be
file named as 'hinglish'. Just remove that file from that folder and your error will be taken care of.

@aradhana298
Copy link

where to get "nltk_data" folder in colab?

@hammadmukhtar21
Copy link

where to get "nltk_data" folder in colab?

Check the path where nltk is downloading. Normally it is stored in the /root/ directory. You can access the root directory on the left side of the colab pane by clicking on "..." which means more options. It is visible beside the sample.

nltk
nltk2

@talhaanwarch
Copy link

talhaanwarch commented Aug 17, 2022

you can simply do
!rm /root/nltk_data/corpora/stopwords/hinglish

btw removing did not worked for me

btw i did not face the issue with latest version

@upasana-mittal
Copy link
Author

I had issue because I will installing on commit hash but since I switched to full git, it is working fine. no more error

pip install git+https://github.com/boudinfl/pke.git

@ygorg
Copy link
Collaborator

ygorg commented Sep 30, 2022

As said earlier in the thread, please update to the latest version.
If you are using pke with an unsupported language please provide custom stopwords using stoplist argument as such:

shadok_stoplist = ['ga', 'zo']
preprocessed_document = [  # Obtained via custom pos tagging tool or manual annotation
    [('ga', 'DET'), ('bu', 'NOUN'), ('zo', 'AUX'), ('meu', 'ADJ'), ('.', 'PUNCT')]
]
e = pke.unsupervised.MultipartiteRank()
e.load_document(
    preprocessed_document, language='shadok',
    stoplist=shadok_stoplist, normalization=None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants