Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add glossary to DeepL API #1346

Open
Tracked by #1928
ulliholtgrave opened this issue Apr 6, 2022 · 28 comments
Open
Tracked by #1928

Add glossary to DeepL API #1346

ulliholtgrave opened this issue Apr 6, 2022 · 28 comments
Labels
💡 feature New feature or request 😱 effort: high Big change, which requires >12h
Milestone

Comments

@ulliholtgrave
Copy link
Member

ulliholtgrave commented Apr 6, 2022

Motivation

We want to add and use a glossary for certain words. This should be maintained via some private area/file.

Additional Context

API docs: https://www.deepl.com/de/docs-api/glossaries/

Python client docs: https://github.com/DeepLcom/deepl-python#glossaries

@ulliholtgrave ulliholtgrave added the 💡 feature New feature or request label Apr 6, 2022
@ulliholtgrave ulliholtgrave added this to the Version 1.2 milestone Apr 6, 2022
@svenseeberg svenseeberg modified the milestones: Version 1.2, Version 1.3 Apr 6, 2022
@ulliholtgrave ulliholtgrave added the 😅 effort: medium Should be doable in <12h label May 16, 2022
@dkehne
Copy link

dkehne commented May 17, 2022

We have decided that there is only one Integreat-wide glossary

@svenseeberg svenseeberg added the ⁉️ prio: low Not urgent, can be resolved in the distant future. label Jun 7, 2022
@ulliholtgrave ulliholtgrave modified the milestones: 22Q3, 22Q4 Oct 2, 2022
@timobrembeck timobrembeck added ‼️ prio: high Needs to be resolved ASAP. and removed ⁉️ prio: low Not urgent, can be resolved in the distant future. labels Nov 4, 2022
@charludo
Copy link
Contributor

Is the goal that municipalities can add/edit glossary entries on their own?

Or should only staff roles be able to do that?

@ulliholtgrave
Copy link
Member Author

This should only be done by the staff roles. However, I am not really sure about the way we want to provide it to them.

@osmers Can you bring this up in your team call and come up with an idea about how you want to edit this files?

My initial idea would either be some text file for each language (e.g. like a config file) or we can provide some UI and save the glossary entries in our database.

@timobrembeck
Copy link
Member

My initial idea would either be some text file for each language (e.g. like a config file) or we can provide some UI and save the glossary entries in our database.

I would definitely prefer the database! 😅

@ulliholtgrave
Copy link
Member Author

My initial idea would either be some text file for each language (e.g. like a config file) or we can provide some UI and save the glossary entries in our database.

I would definitely prefer the database! 😅

I definitely agree that the database would be the more ideal solution, but I am a little bit afraid of the amount of phrases we are supposed to show in the UI. If we end up with >200 entries with input, we really need some decent UI to manage this and a JSON in a TXT file might be the easier to implement and with "Strg+F" the directer way 🤷‍♂️

@timobrembeck
Copy link
Member

We could implement a csv or json import/export?
And we could intercept Ctrl + F to directly jump into our own search input field? 😄

@timobrembeck
Copy link
Member

I mean, in theory we can also outsource the storing of information to DeepL itself.
And only query the stored entries from time to time and store them in our cache (if even necessary)...

@timobrembeck
Copy link
Member

Ah ok, just read the API docs, and apparently glossaries can only be created and deleted, not modified. So modification only works via retrieving, then modifying locally, then deleting the old entry and uploading the new entry.
And the API accepts entries via csv. So the most simple solution would probably be the following:

  • Add a simple csv upload, and just validate the file and pass the entries on to DeepL via the python client
  • List all existing glossaries in a simple UI (we don't store the information in our database, but just query it from DeepL each time and store it in our cache which we should invalidate each time we're creating or deleting a glossary)
  • Provide a simple "download" and "delete" functionality for existing glossaries

What do you think?

@ulliholtgrave
Copy link
Member Author

Yes, I agree. That sounds good 👍

@osmers
Copy link

osmers commented Nov 19, 2022

I think we can currently download the glossary as an excel file - so we could provide that?
Since we don't need to change it constantly an option to download, ammend and then upload again would be sufficient.
Not sure if that is what Timo was refering to...

@timobrembeck
Copy link
Member

Wait, we already have a DeepL Pro Advanced account?
Then the basic functionality we're talking about here should already be offered by the DeepL web UI (see here)? I think there is no need to implement this in the CMS if we're just implementing exactly the same functionality as DeepL itself... 🤔

@osmers
Copy link

osmers commented Nov 19, 2022

Not sure if we do - I assume so, yes, because otherwise we would not have enough translation budget. The glossary right now is implemented in MemoQ
I will check DeepL for you - one sec.

@timobrembeck
Copy link
Member

Indeed:
Screenshot 2022-11-19 at 19-09-09 DeepL Translate – Der präziseste Übersetzer der Welt

@osmers
Copy link

osmers commented Nov 19, 2022

So you already checked our account? Then this should be easy enough, right?

@osmers
Copy link

osmers commented Nov 19, 2022

But it seems that glossaries don't work for most of our language pairs...

@osmers
Copy link

osmers commented Nov 19, 2022

@osmers
Copy link

osmers commented Nov 19, 2022

Just English and French are possible...

@timobrembeck
Copy link
Member

Oh, and I noticed another problem: we use two differrent accounts: the glossary can only be uploaded via the UI for the "DeepL Pro" account and we perform our automated translations with the "DeepL API Free" account. Probably, there is no complete trivial way of copying the glossaries... We could however ask the DeepL support whether it's possible to transfer glossaries between accounts, but probably they will refuse to do so.

So back to the drawing board, we probably need to copy the basic upload in our CMS to be able to pass the glossaries to the API account. But yes, let's talk about whether the effort is justified when only two languages are supported with German as source language...

@osmers
Copy link

osmers commented Nov 19, 2022

So if they don't support glossaries for more languages, it does not matter what we build into our system? Couldn't we still use it and somehow enforce certain translations? I don't know, maybe putting in alternatives for the word that DeepL provides if you translate just the word and tell the system or whatever that if it finds of one those, to replace it with ours from the glossary?

@osmers
Copy link

osmers commented Dec 1, 2022

Do you need any more input from our side on this?

@timobrembeck
Copy link
Member

Do you need any more input from our side on this?

Probably yes: So as far as I understood it, we sadly cannot use the DeepL Pro account glossary for our DeepL API account requests.
So this would be a bit of work to do, not sure if worth the effort if it can only be used for two languages.

So if they don't support glossaries for more languages, it does not matter what we build into our system? Couldn't we still use it and somehow enforce certain translations? I don't know, maybe putting in alternatives for the word that DeepL provides if you translate just the word and tell the system or whatever that if it finds of one those, to replace it with ours from the glossary?

You mean like completely implement our own glossary? This would definitely be a lot of effort. Maybe even more effort than having to manually fix machine translations in case potential glossary have been translated incorrectly.
But yes, in theory it's doable.

@osmers
Copy link

osmers commented Dec 1, 2022

Probably yes: So as far as I understood it, we sadly cannot use the DeepL Pro account glossary for our DeepL API account requests. So this would be a bit of work to do, not sure if worth the effort if it can only be used for two languages

Dito - just for two languages it does not make sense - we would need to check the terms for all the languages we have a glossary for.

You mean like completely implement our own glossary? This would definitely be a lot of effort. Maybe even more effort than having to manually fix machine translations in case potential glossary have been translated incorrectly.

Not sure how feasible and realistic manual fixing is - but yes, that is essentially what I meant. But I can see how it is very difficult. Another idea I had was that we compile an alternative list of words, like you always have in dictionary suggestions (e.g. Straße can be road and street in English). So if we have this list, we can at least tell the system that if it finds and of those words, to replace it with the correct one from our glossary?

I am not sure if this is feasible though due to case declination of words (Dativ, Genitiv, etc Anpassung...)

@timobrembeck
Copy link
Member

I am not sure if this is feasible though due to case declination of words (Dativ, Genitiv, etc Anpassung...)

Hmm, in my opinion we're opening Pandora's Box here 😅
I guess that's just one of the limitations of machine translations - there is always some margin of error which can either be accepted or fixed by humans.
I doubt that any manual string replacement on our side is good enough to fix more problems than it causes.
So at the moment, I'd suggest to put this on hold until DeepL supports more languages for the glossary - and as soon as this is the case, I think the effort for implementing support for DeepL's glossary mechanism is justified.

@timobrembeck timobrembeck added ⛔ blocked Blocked by external dependency and removed ‼️ prio: high Needs to be resolved ASAP. labels Dec 1, 2022
@timobrembeck timobrembeck modified the milestones: 22Q4, Backlog Dec 1, 2022
@osmers
Copy link

osmers commented Dec 1, 2022

Agreed!

@osmers
Copy link

osmers commented Oct 31, 2023

Just saw that DeepL now supports more languages -
image

Question remains whether we can use it - would it help if we switched to the DeepL Pro Account to use the API and Glossary? Or are we by now using Pro anyways?

Edit: For DeepL API Free and DeepL API Pro subscribers

You can create glossaries with your DeepL API (Free and Pro) subscription. Please consult this article and our API documentation to learn how you can manage glossaries with the DeepL API.

If you use the DeepL API (Free and Pro) in third-party software, please note that plug-ins are not developed by DeepL SE. DeepL supports glossary functionality via the API, but your plug-in provider might require some time to implement this functionality in their plug-in. For more information, please contact the provider of your plug-in.

@timobrembeck timobrembeck added 😱 effort: high Big change, which requires >12h and removed ⛔ blocked Blocked by external dependency 😅 effort: medium Should be doable in <12h labels Oct 31, 2023
@timobrembeck
Copy link
Member

@osmers good catch!

Question remains whether we can use it - would it help if we switched to the DeepL Pro Account to use the API and Glossary? Or are we by now using Pro anyways?

We only can use the API account for the CMS because it would be way to complicated to program any kind of interaction between the CMS and the DeepL UI – it only makes sense to interact via the API, which is only possible with an API account.
Fortunately, this feature was enabled for the API as well, also with more language 🎉
So I think this issue is no longer blocked and can be prioritized (although keep in mind that I estimate the effort to be high despite the new feature).

Supported Languages
In [1]: import deepl

In [2]: from django.conf import settings

In [3]: glossary_languages = deepl.Translator(settings.DEEPL_AUTH_KEY).get_glossary_languages()
Oct 31 17:11:34 INFO deepl - Request to DeepL API method=GET url=https://api-free.deepl.com/v2/glossary-language-pairs
Oct 31 17:11:34 INFO deepl - DeepL API response status_code=200 url=https://api-free.deepl.com/v2/glossary-language-pairs

In [4]: for language_pair in glossary_languages:
 ...:     print(f"{language_pair.source_lang} to {language_pair.target_lang}")
 ...:
de to en
de to es
de to fr
de to ja
de to it
de to pl
de to nl
de to zh
de to ru
de to pt
en to de
en to es
en to fr
en to ja
en to it
en to pl
en to nl
en to zh
en to ru
en to pt
es to de
es to en
es to fr
es to ja
es to it
es to pl
es to nl
es to zh
es to ru
es to pt
fr to de
fr to en
fr to es
fr to ja
fr to it
fr to pl
fr to nl
fr to zh
fr to ru
fr to pt
ja to de
ja to en
ja to es
ja to fr
ja to it
ja to pl
ja to nl
ja to zh
ja to ru
ja to pt
it to de
it to en
it to es
it to fr
it to ja
it to pl
it to nl
it to zh
it to ru
it to pt
pl to de
pl to en
pl to es
pl to fr
pl to ja
pl to it
pl to nl
pl to zh
pl to ru
pl to pt
nl to de
nl to en
nl to es
nl to fr
nl to ja
nl to it
nl to pl
nl to zh
nl to ru
nl to pt
zh to de
zh to en
zh to es
zh to fr
zh to ja
zh to it
zh to pl
zh to nl
zh to ru
zh to pt
ru to de
ru to en
ru to es
ru to fr
ru to ja
ru to it
ru to pl
ru to nl
ru to zh
ru to pt
pt to de
pt to en
pt to es
pt to fr
pt to ja
pt to it
pt to pl
pt to nl
pt to zh
pt to ru

@osmers
Copy link

osmers commented Oct 31, 2023

@timobrembeck yup, I found the info as well that we can use the API Free Account to implement the glossary :) nice!!
It's something we need to do in order to make automatic translations better. So I think even though the effort is high, it is something we should do soonish :) like next quarter

@timobrembeck timobrembeck modified the milestones: Backlog, 24Q1 Oct 31, 2023
@dkehne dkehne modified the milestones: 24Q1, Backlog Dec 11, 2023
@dkehne
Copy link

dkehne commented Dec 11, 2023

push to backlog. this is not as urgent as other tickets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💡 feature New feature or request 😱 effort: high Big change, which requires >12h
Projects
None yet
Development

No branches or pull requests

6 participants