Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting machine-translated prompts of xP3mt #24

Closed
dptam opened this issue Nov 19, 2023 · 4 comments
Closed

Getting machine-translated prompts of xP3mt #24

dptam opened this issue Nov 19, 2023 · 4 comments

Comments

@dptam
Copy link

dptam commented Nov 19, 2023

Hi,

Thank you for the very interesting work and releasing the code. It is very helpful!

Is there a way I can get the machine-translated prompts per task?

For example, how would I get the Spanish (es) prompt for Paws-x only?

  • In the HuggingFace repo bigscience/xP3mt seems to contains the input, output pairs in Spanish for all the training tasks. Is there a way I can get the input, output pairs for Paws-x only?
  • In the creation script data/xp3/prepare_xp3_train.py, setting USE_ENGLISH_PROMPTS to False seems to load prompts in different languages from PromptSource, but PromptSource only has prompts in English for Paws-x (https://github.com/bigscience-workshop/promptsource/tree/main/promptsource/templates/paws-x)

Also, more generally, how do you do machine-translation for prompts if the language is from right-to-left instead of left-to-right or has different ordering like subject-object-verb instead of subject-verb-object? Would the target come before the input or would you reorder the sentences in the input (i.e premise or hypothesis) in the prompt? And if the target comes before the input, how would the model work since it generates from left to right?

Thank you,
Derek

@Muennighoff
Copy link
Collaborator

Is there a way I can get the machine-translated prompts per task? / For example, how would I get the Spanish (es) prompt for Paws-x only?

The prompts are here: https://github.com/Muennighoff/promptsource/blob/xp3mt/promptsource/templates/paws-x/es/templates.yaml

Is there a way I can get the input, output pairs for Paws-x only?

You can just download the paws-x files: https://huggingface.co/datasets/bigscience/xP3mt/tree/main/es e.g. https://huggingface.co/datasets/bigscience/xP3mt/blob/main/es/xp3_paws-x_es_train_task_description-no-label_esmt.jsonl

Also see the usage guidelines here that may help: https://huggingface.co/datasets/Muennighoff/xP3x#usage

Also, more generally, how do you do machine-translation for prompts if the language is from right-to-left instead of left-to-right or has different ordering like subject-object-verb instead of subject-verb-object? Would the target come before the input or would you reorder the sentences in the input (i.e premise or hypothesis) in the prompt? And if the target comes before the input, how would the model work since it generates from left to right?

We use Google Machine Translate to translate the prompts and then just put them in the same place for all languages. For right-to-left languages like Arabic everything is the same (i.e. they are processed from beginning of sentence to the end). Usually browsers handle displaying it as right-to-left so we can treat it as left-to-right in the modelling phase.

@dptam
Copy link
Author

dptam commented Nov 19, 2023

Thank you for the quick response and the pointer. It is very helpful.

In the templates https://github.com/Muennighoff/promptsource/blob/xp3mt/promptsource/templates/paws-x/es/templates.yaml, I saw the metrics was Null, but it seems they have answer choices and the original prompt https://github.com/Muennighoff/promptsource/blob/xp3mt/promptsource/templates/paws/labeled_final/templates.yaml has Accuracy as a metric? Could I still use Accuracy as a metric for the machine-translated prompts?

@Muennighoff
Copy link
Collaborator

Yes you can use accuracy. The metric field in that file is never used.

@dptam
Copy link
Author

dptam commented Nov 19, 2023

Great. Thank you for all your help!

@dptam dptam closed this as completed Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants