Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI GPT-4 MT #9009

Closed
AsmoKoskinen opened this issue Mar 29, 2023 · 30 comments · Fixed by #10479
Closed

OpenAI GPT-4 MT #9009

AsmoKoskinen opened this issue Mar 29, 2023 · 30 comments · Fixed by #10479
Assignees
Labels
enhancement Adding or requesting a new feature. good first issue Opportunity for newcoming contributors. hacktoberfest This is suitable for Hacktoberfest. Don’t try to spam. help wanted Extra attention is needed.
Milestone

Comments

@AsmoKoskinen
Copy link

Describe the problem

Many automatic machine translation services are available on Weblate.

Describe the solution you'd like

How about to open a machine translation service for OpenAI GPT-4?

https://openai.com/waitlist/gpt-4-api

Describe alternatives you've considered

No response

Screenshots

No response

Additional context

I have this API in use for GPT-4. GPT-4 translates very well, I use it to translate from English to Finnish.

@ilocit
Copy link
Contributor

ilocit commented Mar 29, 2023

I'd be interested in exploring this too, @AsmoKoskinen

I already created a little Python script to help me check out the possibilities.
image
and if you change the "Style" (or Persona, or instructions, ... what would you like to call it?), the result obviously changes.

image

Maybe an AddOn would work, with a simple UI.

Where you can specify for each component or project what role / persona / target audience you would want GPT to assume and which terminology to respect. (Inject terms from the Glossary component!)

Maybe we can build a team and spec out the desired features,the UI and work on the implementation.

👉🏼 Adding Finnish as an example too 😉
image

@AsmoKoskinen
Copy link
Author

@ilocit Well, I am no developer at all, just using (quite heavy) Weblate (private docker site). Weblate 4.16 added support for IBM Watson Language Translator. Just like to see GPT-4 for Weblate, too.

@nijel nijel added enhancement Adding or requesting a new feature. hacktoberfest This is suitable for Hacktoberfest. Don’t try to spam. help wanted Extra attention is needed. good first issue Opportunity for newcoming contributors. labels Mar 30, 2023
@github-actions
Copy link

This issue seems to be a good fit for newbie contributors. You are welcome to contribute to Weblate! Don't hesitate to ask any questions you would have while implementing this.

You can learn about how to get started in our contributors documentation.

@nijel
Copy link
Member

nijel commented Mar 30, 2023

Adding support for another MT is simple, see how it was done for IBM Watson: #8594

@ilocit
Copy link
Contributor

ilocit commented Mar 31, 2023

Thanks for pointing to the IBM Watson implementation as reference, @nijel .
But the implementation of ChatGPT, if we want to put it into full use to benefit from the possibilities which go beyond "yet another MT engine", is a bit more complex and would require some brainstorming about a possible UI for this service with settings and prompt enhancements.

If there is anybody interested in making this happen, I'd be happy to participate.
I will also give a simple implementation a shot. As a first draft.

@nijel
Copy link
Member

nijel commented Mar 31, 2023

More options can be added later, the form can be customized, and the MT can be installed per project, so that each project can have different settings.

@ilocit
Copy link
Contributor

ilocit commented Mar 31, 2023

You are right, of course. I will see what I can do.

@AsmoKoskinen
Copy link
Author

AsmoKoskinen commented Apr 9, 2023

Well, after reading about IBM Watson MT, reading about OpenAI API and chatting with GPT-3.5/4 I managed figure out something that works on Weblate 4.16.4 (Mac OS).

URL: https://api.openai.com/v1/completions

URL: https://api.openai.com/v1/models/text-davinci-003
MODEL: text-davinci-003

Hopefully someone will pick this up and do it the right way.

Näyttökuva 2023-4-9 kello 17 29 39

I touched models.py, test.py and openai.py.

models.py

[...]
"weblate.machinery.ibm.IBMTranslation",
"weblate.machinery.openai.OpenAITranslation",
[...]

tests.py

[...]
from weblate.machinery.ibm import IBMTranslation
from weblate.machinery.openai import OpenAITranslation
[...]

openai.py as a attachment.
openai.py.zip

@AsmoKoskinen
Copy link
Author

Here is a working try with model "gpt-4", openai.py as a attachment.
openai.py.zip

@AsmoKoskinen
Copy link
Author

AsmoKoskinen commented Apr 12, 2023

(I try PR for this issue.)

Well, too much for me. I'll leave it here.

@nijel nijel added this to the 4.18 milestone Apr 13, 2023
@AsmoKoskinen
Copy link
Author

I encountered this error (machinery failed: not supported language pair: en - fi).

This seems to fix it:

Näyttökuva 2023-4-14 kello 9 08 42

    def map_code_code(self, code):
        return code.replace("_", "-").split("-")[0].lower()

    def is_supported(self, source, language):
        return True

Now it works again.

But I can't do a proper PR the right way.

However, here are the four files that I have changed.
machine.rst.zip
tests.py.zip
models.py.zip
openai.py.zip

@nijel
Copy link
Member

nijel commented Apr 18, 2023

More complex persona will be needed to make this work reliably. Otherwise, it ends up including remarks about some strings not fully translatable between languages, transliteration and other content. So far, I ended up with:

You are a highly skilled translation assistant, adept at translating text between languages with precision and nuance. You always reply with translated string only. You do not include transliteration.

@ilocit
Copy link
Contributor

ilocit commented Apr 18, 2023

The real advantage of GPT over other nMT engines is the customization via e.g. personas and instructions.
So, I think it would be important to have this as a customizable UI field.

Like this perhaps (this is how CustomMT has built it!):

image

Definitely (1) and (2), the option to inject a project glossary would be super cool, of course! (3)

The caption for this section, to make it clear what happens here, could be called Prompt Engineering...
image

@nijel nijel removed this from the 4.18 milestone May 15, 2023
@Grigler
Copy link

Grigler commented May 31, 2023

Is this something that is still being worked on? We're hoping to introduce this to our flow for rapid deployments, with human validation following it

@ilocit
Copy link
Contributor

ilocit commented May 31, 2023

For that (rapid deployments), you don't necessarily need GPT, no? You can use any suitable nMT and human validation.
The real benefit of a GPT integration (different from MT integration) would be to use more context than a single string/segment and play with prompts (different use cases, context, glossary, style, etc.)
Also, consider, GPT is more expensive than nMT

@Grigler
Copy link

Grigler commented May 31, 2023

It's true that we could technically get by with nMT, however we think that with being able to supply context to GPT on small strings (a common case in our game) we'd end up with a higher quality initial set of translations

@nijel
Copy link
Member

nijel commented May 31, 2023

As a quick and dirty solution, you can try what has been posted in #9009 (comment). For full integration, I'd like to have this included in the 5.0 release in August, but the feature has not yet been fixed on the roadmap.

@ilocit
Copy link
Contributor

ilocit commented May 31, 2023

@Grigler

It's true that we could technically get by with nMT, however we think that with being able to supply context to GPT on small strings (a common case in our game) we'd end up with a higher quality initial set of translations

And that is precisely the challenge and why the above-mentioned "quick and dirty" integration won't bring you much benefit.
We would need to look at how to build the "context".
Is it the 5,10,15 strings before and/or after the current one?
Is it developer comments that are added and made usable in Weblate and the prompt?
Is it a dialog with prompt details for Role, Target Audience, Style, Temperature....
or totally different.

In a game you might want to include the character to add specific tone and slang of that particular character of the game.

Lots of options. ;-)

@ilocit

This comment was marked as off-topic.

@nijel

This comment was marked as off-topic.

@dxdx123
Copy link

dxdx123 commented Jul 5, 2023

May I ask if there has been any progress on this issue?

@nijel
Copy link
Member

nijel commented Jul 5, 2023

AFAIK, nobody is working on this issue. I'm still stuck in the API waitlist…

@Grigler
Copy link

Grigler commented Jul 5, 2023

Locally in our organisations fork, we have integrated the 3.5-Turbo API call though the results are relatively poor initially as we haven't spent enough time toying with the setup 'System' string given to the AI. Thankfully the model is the only difference in the API usage, so work could begin while we also sit stuck on the waitlist.

In our current usage, we are using the "Explanation" field of the source string as a Context input but I expect we'll be iterating on this for a long time before we get satisfactory results.

@tipa
Copy link
Contributor

tipa commented Oct 26, 2023

Based on the work here #9009 (comment) I have fixed and adapted the integration for my needs:

  • Removed unneeded URL form in configuration
  • Added checkbox in configuration to switch between gpt-4 and gpt-3.5-turbo (as the latter is significantly cheaper and might be sufficient for certain use-cases)
  • Adjusted the prompt:
messages = [
    {"role": "system", "content": f"You are a highly skilled translation assistant, adept at translating text to language '{language}' with precision and nuance. You always reply with translated string only. You do not include transliteration."},
    {"role": "user", "content": text}
]

openai.zip

(Installation hint for noobs like me: Copy openai.py file to /weblate/machinery/ folder and add this line to the WEBLATE_MACHINERY list in the models.py file in the same directory: "weblate.machinery.openai.OpenAITranslation",
Also: pip install openai

@jeremi
Copy link

jeremi commented Nov 5, 2023

What would be interesting is to use embedding to integrate the glossary and/or other similar translations

@nijel nijel added this to the 5.3 milestone Nov 9, 2023
@tipa
Copy link
Contributor

tipa commented Nov 22, 2023

OpenAI recently released GTP4-Turbo and the Python library was updated to v1.3.5 with some breaking changes to the API, so I updated the openai.py file for Weblate:

openai.zip

@nijel
Copy link
Member

nijel commented Nov 24, 2023

@tipa Can you please share changes you've done to forms.py as well? I will create a PR for that if you are not interested in that.

@nijel nijel self-assigned this Nov 24, 2023
nijel added a commit to nijel/weblate that referenced this issue Nov 24, 2023
- supports model selection
- supports persona customization
- prepared for glossary integration

TODO:

- add documentation
- add tests

Fixes WeblateOrg#9009
@nijel
Copy link
Member

nijel commented Nov 24, 2023

WIP integration is here: #10479

nijel added a commit to nijel/weblate that referenced this issue Nov 24, 2023
- supports model selection
- supports persona customization
- prepared for glossary integration

TODO:

- add documentation
- add tests

Fixes WeblateOrg#9009
nijel added a commit to nijel/weblate that referenced this issue Nov 24, 2023
- supports model selection
- supports persona customization
- prepared for glossary integration

TODO:

- add documentation
- add tests

Fixes WeblateOrg#9009
nijel added a commit to nijel/weblate that referenced this issue Nov 24, 2023
- supports model selection
- supports persona customization
- prepared for glossary integration

TODO:

- add tests

Fixes WeblateOrg#9009
nijel added a commit to nijel/weblate that referenced this issue Nov 24, 2023
- supports model selection
- supports persona customization
- prepared for glossary integration

TODO:

- add tests

Fixes WeblateOrg#9009
@tipa
Copy link
Contributor

tipa commented Nov 24, 2023

@nijel I will add my forms.py changes to the linked PR

nijel added a commit to nijel/weblate that referenced this issue Nov 27, 2023
- supports model selection
- supports persona customization
- prepared for glossary integration

Fixes WeblateOrg#9009
nijel added a commit that referenced this issue Nov 27, 2023
- supports model selection
- supports persona customization
- prepared for glossary integration

Fixes #9009
Copy link

Thank you for your report; the issue you have reported has just been fixed.

  • In case you see a problem with the fix, please comment on this issue.
  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding or requesting a new feature. good first issue Opportunity for newcoming contributors. hacktoberfest This is suitable for Hacktoberfest. Don’t try to spam. help wanted Extra attention is needed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants