Getting machine-translated prompts of xP3mt #24

dptam · 2023-11-19T22:18:05Z

Hi,

Thank you for the very interesting work and releasing the code. It is very helpful!

Is there a way I can get the machine-translated prompts per task?

For example, how would I get the Spanish (es) prompt for Paws-x only?

In the HuggingFace repo bigscience/xP3mt seems to contains the input, output pairs in Spanish for all the training tasks. Is there a way I can get the input, output pairs for Paws-x only?
In the creation script data/xp3/prepare_xp3_train.py, setting USE_ENGLISH_PROMPTS to False seems to load prompts in different languages from PromptSource, but PromptSource only has prompts in English for Paws-x (https://github.com/bigscience-workshop/promptsource/tree/main/promptsource/templates/paws-x)

Also, more generally, how do you do machine-translation for prompts if the language is from right-to-left instead of left-to-right or has different ordering like subject-object-verb instead of subject-verb-object? Would the target come before the input or would you reorder the sentences in the input (i.e premise or hypothesis) in the prompt? And if the target comes before the input, how would the model work since it generates from left to right?

Thank you,
Derek

The text was updated successfully, but these errors were encountered:

Muennighoff · 2023-11-19T22:27:58Z

Is there a way I can get the machine-translated prompts per task? / For example, how would I get the Spanish (es) prompt for Paws-x only?

The prompts are here: https://github.com/Muennighoff/promptsource/blob/xp3mt/promptsource/templates/paws-x/es/templates.yaml

Is there a way I can get the input, output pairs for Paws-x only?

You can just download the paws-x files: https://huggingface.co/datasets/bigscience/xP3mt/tree/main/es e.g. https://huggingface.co/datasets/bigscience/xP3mt/blob/main/es/xp3_paws-x_es_train_task_description-no-label_esmt.jsonl

Also see the usage guidelines here that may help: https://huggingface.co/datasets/Muennighoff/xP3x#usage

Also, more generally, how do you do machine-translation for prompts if the language is from right-to-left instead of left-to-right or has different ordering like subject-object-verb instead of subject-verb-object? Would the target come before the input or would you reorder the sentences in the input (i.e premise or hypothesis) in the prompt? And if the target comes before the input, how would the model work since it generates from left to right?

We use Google Machine Translate to translate the prompts and then just put them in the same place for all languages. For right-to-left languages like Arabic everything is the same (i.e. they are processed from beginning of sentence to the end). Usually browsers handle displaying it as right-to-left so we can treat it as left-to-right in the modelling phase.

dptam · 2023-11-19T23:27:00Z

Thank you for the quick response and the pointer. It is very helpful.

In the templates https://github.com/Muennighoff/promptsource/blob/xp3mt/promptsource/templates/paws-x/es/templates.yaml, I saw the metrics was Null, but it seems they have answer choices and the original prompt https://github.com/Muennighoff/promptsource/blob/xp3mt/promptsource/templates/paws/labeled_final/templates.yaml has Accuracy as a metric? Could I still use Accuracy as a metric for the machine-translated prompts?

Muennighoff · 2023-11-19T23:40:29Z

Yes you can use accuracy. The metric field in that file is never used.

dptam · 2023-11-19T23:41:50Z

Great. Thank you for all your help!

dptam closed this as completed Nov 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting machine-translated prompts of xP3mt #24

Getting machine-translated prompts of xP3mt #24

dptam commented Nov 19, 2023

Muennighoff commented Nov 19, 2023

dptam commented Nov 19, 2023

Muennighoff commented Nov 19, 2023

dptam commented Nov 19, 2023

Getting machine-translated prompts of xP3mt #24

Getting machine-translated prompts of xP3mt #24

Comments

dptam commented Nov 19, 2023

Muennighoff commented Nov 19, 2023

dptam commented Nov 19, 2023

Muennighoff commented Nov 19, 2023

dptam commented Nov 19, 2023