Skip to content

Kaneda-1/gptpo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PO File Translation Tool for ChatGPT

NPM Version NPM Downloads

Translation command line tool for gettext (po) files that supports pre-translation and chat-gpt translations with assitant and fine tunning of models.

Features:

  • preprocess a po file based on previous translated po files.
  • fine-tune an open-ai model to get more accurate translations.
  • translate a specified po file to a designated target language using standar and fine-tunned models.
  • translate a specified po file with open-ai assistant

It is recommended to use the paid OpenAI API to improve translation speed, as the free OpenAI API is slower (only 3 translations per minute) and has usage restrictions.

Installation

1.a Install it for the enduser, as a binary:

sudo npm install -g @sitamet/gptpo

Then your new binary can be called lik

or

1.b Install it as a developer:

git clone https://github.com/sitamet/gptpo.git
cd gptpo
npm install

2. Set the environment vars

Set the env var OPENAI_API_KEY before using this tool.

export OPENAI_API_KEY=sk-F***********************************vGL

# optional: you can set a default open ai model
export OPENAI_API_MODEL=gpt-3.5-turbo-1106

in Windows PowerShell

set "OPENAI_API_KEY=sk-F***********************************vGL"
set "OPENAI_API_MODEL=gpt-3.5-turbo-1106"

Usage Scenarios

Show help:

gptpo --help

# or if you are in a development environment:
npm run start -- --help
or
node index.js --help

Preprocess to incorporate translations from previous po files:

gptpo preprocess --help

Usage: gptpo preprocess [options]

update po from previous translated po files (origin po will incorporate translations from previous po files)

Options:
  -f, --force    Overide origin translations
  --po <file>    po file path
  --prev <file>  po file (or dir) where previous po file(s) are placed (default: "./previous")
  -h, --help     display help for command

preprocess demo:

cd demo
gptpo preprocess --po test.po --prev ./previous

Translate a po file using completions with a fine-tuned model

IMPORTANT: Before translating with GPT, we need to establish a system prompt. This prompt will be reused for every translation task.

PROBLEM 1: It is crucial to use brief prompts to reduce the token costs associated with each API request. SOLUTION: The strategy involves "fine-tuning" a model to function as a translator, training it to respond efficiently to succinct prompts.

PROBLEM 2: Standard GPT models do not specifically cater to our "language-domain," such as WordPress plugin translations. SOLUTION: We will employ the gpt fine-tuning process to tell to the model all this translation details.

Create the system prompt.

In your working directory, save the system prompt text in a file named systemprompt.txt.

cd demo
echo 'You are a machine translating English to Catalan text.' > systemprompt.txt

In this next section we will fine-tune a model to asume this role.

Preparing your dataset to tune the model

In the previous section we defined the prompt to use for our system. Now it's time to train the assistant to respond to our users' requests following our language requirements.

The file format is jsonl each message line with an array of 3 objects:

{"messages": [{"role": "system", "content": "You are a machine translating English to Catalan text."}, 
              {"role": "user", "content": "checkbox."}, 
              {"role": "assistant", "content": "casella de selecció."}]}

Example:

cat fine-tuning-ca.jsonl
{"messages": [{"role": "system", "content": "You are a machine translating English to Catalan text."}, {"role": "user", "content": "checkbox."}, {"role": "assistant", "content": "casella de selecció."}]}
{"messages": [{"role": "system", "content": "You are a machine translating English to Catalan text."}, {"role": "user", "content": "This plugin is compatible with your PHP version"}, {"role": "assistant", "content": "Aquesta extensió és compatible amb la versió de PHP"}]}
{"messages": [{"role": "system", "content": "You are a machine translating English to Catalan text."}, {"role": "user", "content": "Accessed a banned URL"}, {"role": "assistant", "content": "S'ha accedit a un URL prohibit"}]}
{"messages": ...

To fine-tune a model, you are required to provide at least 10 examples. We typically see clear improvements from fine-tuning on 50 to 100 training examples with gpt-3.5-turbo but the right number varies greatly based on the exact use case.

Get detailed info in opanai guides: https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset

Start a job to fine tune a model

cd demo

### Fine tune your gpt model:

```shell
gptpo fine-tuning --help


Usage: gptpo fine-tuning [options]

launch a new fine-tune task to customize your OpenAI model

Options:
  -k, --key <key>    openai api key (env: OPENAI_API_KEY)
  --model <model>    openai model to use as starting point or your previous fine-tuned model (default: "gpt-3.5-turbo-1106", env: OPENAI_API_MODEL)
  --suffix <suffix>  fine tunned model suffix
  --file <file>      jsonl file path to the fine-tuning messages
  -h, --help         display help for command

NOTE: The optimal balance of cost and results is achieved by starting with the model gpt-3.5-turbo-1106

fine-tuning demo:

lets fine tune our starting model OPENAI_API_MODEL=gpt-3.5-turbo-1106

cd demo
gptpo fine-tuning --suffix cat01 --file ./fine-tuning-ca.jsonl

List your fine tuned models:

gptpo fine-tuning-jobs --help


list fine-tune jobs

Options:
  -k, --key <key>     openai api key (env: OPENAI_API_KEY)
  -l --limit <limit>  max items listed (default: 10)
  -h, --help          display help for command

fine-tuning-jobs demo:

gptpo fine-tuning-jobs

[
   {
    object: 'fine_tuning.job',
    id: 'ftjob-2u8sMZaK82tKXpmIr9Rt1iXM',
    model: 'gpt-3.5-turbo-1106',
    created_at: 1704570847,
    finished_at: 1704571377,
    fine_tuned_model: 'ft:gpt-3.5-turbo-1106:wetopi:cat01:8e7AgzNb',
    organization_id: 'org-iMMKhtTcRklst7guwU9LzPr6',
    result_files: [ 'file-bg27iX6hYBIN2rCtDlkkZon6' ],
    status: 'succeeded',
    validation_file: null,
    training_file: 'file-7vDQosGoZ48fni1ekm0TBJE3',
    hyperparameters: { n_epochs: 3, batch_size: 1, learning_rate_multiplier: 2 },
    trained_tokens: 12555,
    error: null
  }
]

You can identify your fine tuned model in the json attribute: fine_tuned_model:

Translate po file using our last fine tuned model:

Now we set the environment var OPENAI_API_MODEL to point to our new trained model:

export OPENAI_API_MODEL=ft:gpt-3.5-turbo-1106:wetopi:cat01:8e7AgzNb
gptpo translate --help

Usage: gptpo translate [options]

translate po file with completions (default command)

Options:
  -k, --key <key>      openai api key (env: OPENAI_API_KEY)
  --model <model>      openai model (default: "gpt-3.5-turbo-1106", env: OPENAI_API_MODEL)
  --po <file>          po file path
  --dir <dir>          po file directory
  --verbose            print verbose log
  -o, --output <file>  output file path, overwirte po file by default
  -h, --help           display help for command

Translating our demo file test.po

cd demo
gptpo translate --po ./test.po --verbose

(node:58702) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
░░░░░░░░░░░░░░░░░░░░ 0% 0/1 
==> Someone tried to recover the password for user with email address: %s
==> Algú ha intentat recuperar la contrasenya de l'usuari amb l'adreça de correu electrònic: %s
████████████████████ 100% 1/1 done.

Acknowledgments

This project has been greatly inspired by the pioneering work done by ryanhex53 with https://github.com/ryanhex53/gpt-po The foundational ideas and codebase provided by ryanhex53 gpt-po have been instrumental in shaping the direction and development of this project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%