Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError after a few translations #8

Closed
jonas-nothnagel opened this issue Feb 2, 2021 · 3 comments
Closed

OSError after a few translations #8

jonas-nothnagel opened this issue Feb 2, 2021 · 3 comments

Comments

@jonas-nothnagel
Copy link

jonas-nothnagel commented Feb 2, 2021

Hi and thanks for the cool library!

I want to include the translation function in one of my data pipelines that loops over thousands of text snippets. Without the GPU support and on Windows I was following the instructions in the other issue and successfully added the function.

from easynmt import EasyNMT
model = EasyNMT('opus-mt')

and I translate with:

language = detect_langs(text)
for each_lang in language:
   if (each_lang.lang != "en"):
      translated_text = model.translate(text, target_lang='en')

whereas text is a string.
However, after a few translations (2-3) I always run into this error:

OSError: Can't load tokenizer for 'Helsinki-NLP/opus-mt-ia-en'. Make sure that:
- 'Helsinki-NLP/opus-mt-ia-en' is a correct model identifier listed on 'https://huggingface.co/models'

Any idea what the problem could be?

@nreimers
Copy link
Member

nreimers commented Feb 2, 2021

One of your sentences was detected as language ia. But there is no translation model for ia -> en.

@jonas-nothnagel
Copy link
Author

jonas-nothnagel commented Feb 2, 2021

My mistake! Please excuse.
Perhaps you may allow a follow-up question. If I change the code to:

language = detect_langs(text)
for each_lang in language:
   if (each_lang.lang == "es" or each_lang.lang == "fr"):
      translated_text = model.translate(text, target_lang='en')

I still run into translation errors, such as no availability of Portugese - English (pt-> en). Does the model.translate() function detect languages again since it contradicts the detect_langs() in those cases?

@nreimers
Copy link
Member

nreimers commented Feb 2, 2021

Yes, if you don't specify the source_lang, it detects the language again automatically.

You can fix it like this:

translated_text = model.translate(text, source_lang=your_detected_source_lang, target_lang='en')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants