<img align="left" style="padding-right:10px;" width=100, height=100, src="https://ldsl.rub.de/assets/images/brand/logo.svg">
<br></br>
<br></br>

# Linguistic Data Science Lab advanced course
## Good Practices for Annotation (050831-WS 23/24)
Instructor: Dr. Claudia Roch M.A. // winter term 2023/2024

31. Jan, 2024.

# Translation Quality Assessment Experiment

## Prompting GPT

In this notebook, we attempt to leverage GPT capabilities to rate Translation Quality.

The concept
for translation quality chosen in this study is the degree of meaning correspondence or
equivalence between a pair of sentences as defined in the Cross Lingual Semantic Textual
Similarity (XSTS) metric proposed by Licht et al. (2022). The metric involves a five-point
scale ranging from not equivalent (1) to completely equivalent (5).

The prompts we want to use look the following and are inspired by Kocmi & Federmann (2023), who claim Large Language Models to be "State-of-the-Art
Evaluators of Translation Quality". Their proposed templates for other metrics were refined for the task at hand:



```
Output label one of "1 not equivalent", "2 little equivalent", "3 mostly equivalent", "4 nearly
equivalent", "5 completely equivalent".

Classify the quality of translation in terms of semantic equivalence from {source_lang} to {target_lang} into one of following classes on a five-point scale: "1 not equivalent", "2 little equivalent", "3 mostly equivalent",
"4 nearly equivalent", "5 completely equivalent". Provide the chain of reasoning and explain the choice of the label in a maximum of 100 words.

{source_lang} source: "{source_seg}"
{target_lang} translation: "{target_seg}"

Return the label and reasoning in the output as valid JSON.

label:
reasoning:

```




Note: The instruction does not involve a description of the classes with examples which may yield more accurate results. The prompt was abbreviated to use less tokens in the experiment.

# Libraries

In [None]:
import pandas as pd
import json

In [None]:
# Install openai for access to API
# !pip install openai

!pip install --upgrade openai
# some initial problems with depencies of typing extensions
# uninstalling did not solve it

"""
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.9.0 which is incompatible.
"""

# restart runtime


# Defining function to generate a prompt schema

Function takes name of source and target translation language.

Additionally, we have two more arguments that are the individual sentence segments from the translation pairs.

The instruction is close to the proposed above and explicitly asks to provide a reason for the labeling. I add one sentence to the instruction to return the output as json (actually, this is only relevant when setting the response format to a json object which may otherwise yield an infinite loop cf. https://platform.openai.com/docs/guides/text-generation/json-mode)


In [None]:
# function takes arguments of source and target language name, and source and target translation
def prompt(source_lang, target_lang,source_seg,target_seg):
    prompt_template = """Output label one of "1 not equivalent", "2 little equivalent", "3 mostly equivalent", "4 nearly equivalent", "5 completely equivalent".
    Classify the quality of translation in terms of semantic equivalence from English to """+target_lang+"""
    into one of following classes on a five-point scale: "1 not equivalent", "2 little equivalent", "3 mostly equivalent", "4 nearly equivalent", "5 completely equivalent" Provide the chain of reasoning and explain the choice of the label in a maximum of 100 words.
    English source: """+source_seg+"\n"+ target_lang +" translation: "+target_seg+"""
    Return the label and reasoning in the output as valid JSON."""
    return prompt_template


# Read translation dataset


The English source translations have been collected from the [Flores-200 devtest dataset](https://github.com/facebookresearch/flores/blob/main/flores200/README.md) while
corresponding [machine translations](https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/evaluation#downloading-the-translations) of the NLLB-200 (MoE, 54.5B) model into Bengali,
German, Russian and Spanish are taken from the FLORES-200 benchmark dataset released
by facebookresearch. From the languages involved in the study, only the EN-DEU translation
pairs have previously been assessed by the NLLB Team (2022).



We read the data from a previously curated excel file into a pandas DataFrame with
with columns of English source translation
and machine translations into the four target languages German, Spanish, Bengali and Russian.
The DataFrame also has pre-existing columns to feed the GPT labels and Reasoning.

In [None]:
 # using only cols with source and target segments as well as columns for GPT's scoring results
 tq_data = pd.read_excel("/content/TQA_ChatGPT.xlsx", header=0, usecols= [9,10,11,12,13,14,15,16,17,18,19,20,21])
 tq_data

Unnamed: 0,eng [eng_Latn.devtest],eng-deu [flores200-eng_Latn-deu_Latn-devtest],GPT_DE_XSTS-score,GPT_DE_Reasoning,eng-spa [flores200-eng_Latn-spa_Latn-devtest],GPT_SP_XSTS-score,GPT_SP_Reasoning,eng-ben [flores200-eng_Latn-ben_Beng-devtest],GPT_BEN_XSTS-score,GPT_BEN_Reasoning,eng-rus [flores200-eng_Latn-rus_Cyrl-devtest],GPT_RU_XSTS-score,GPT_RU_Reasoning
0,U.S. President George W. Bush arrived in Singa...,US-Präsident George W. Bush kam am Morgen des ...,3 mostly equivalent,,El presidente de los Estados Unidos George W. ...,5 completely equivalent,,মার্কিন প্রেসিডেন্ট জর্জ ডব্লিউ বুশ ১৬ই নভেম্ব...,3 mostly equivalent,,Президент США Джордж Буш прибыл в Сингапур утр...,5 completely equivalent,
1,He was greeted by Singapore's Deputy Prime Min...,Er wurde vom stellvertretenden Premierminister...,3 mostly equivalent,,Fue recibido por el viceprimer ministro de Sin...,5 completely equivalent,,সিঙ্গাপুরের উপ-প্রধানমন্ত্রী ওং কান সেং তাঁকে ...,3 mostly equivalent,,Его приветствовал заместитель премьер-министра...,3 mostly equivalent,
2,After a week of losses in the midterm election...,Nach einer Woche der Niederlage bei den Zwisch...,3 mostly equivalent,,Después de una semana de pérdidas en las elecc...,5 completely equivalent,,"মধ্যবর্তী নির্বাচনে পরাজয়ের এক সপ্তাহ পর, বুশ...",3 mostly equivalent,,После недели поражений на промежуточных выбора...,3 mostly equivalent,
3,"After the dam was built in 1963, the seasonal ...","Nachdem der Damm 1963 gebaut worden war, wurde...",,,"Después de que se construyó la presa en 1963, ...",,,"১৯৬৩ সালে বাঁধটি নির্মিত হওয়ার পরে, মৌসুমী বন...",,,"После того, как в 1963 году была построена пло...",,
4,This sediment was necessary for creating sandb...,"Dieses Sediment war notwendig, um Sandbänke un...",,,Este sedimento fue necesario para crear bancos...,,,এই পলল বন্যপ্রাণীর আবাসস্থল হিসেবে কাজ করে এমন...,,,Этот осадок был необходим для создания песчаны...,,
5,"As a result, two fish species have become exti...",Infolgedessen sind zwei Fischarten ausgestorbe...,1 not equivalent,,"Como resultado, dos especies de peces se han e...",3 mostly equivalent,,"ফলস্বরূপ, দুটি মাছের প্রজাতি বিলুপ্ত হয়ে গেছে...",3 mostly equivalent,,"В результате два вида рыб вымерли, а два други...",5 completely equivalent,
6,Although the water level will only rise a few ...,Obwohl der Wasserspiegel nach der Flut nur ein...,,,Aunque el nivel del agua solo aumentará unos p...,,,যদিও বন্যার পর পানির স্তর মাত্র কয়েক ফুট বৃদ্...,,,Хотя после наводнения уровень воды поднимется ...,,
7,Before The Simpsons Simon had worked on severa...,Vor den Simpsons hatte Simon in verschiedenen ...,,,Antes de The Simpsons Simon había trabajado en...,,,দ্য সিম্পসনসের আগে সাইমন বিভিন্ন পজিশনে বেশ কয...,,,"До ""Симпсонов"" Саймон работал на нескольких шо...",,
8,During the 1980s he worked on shows such as Ta...,In den 1980er Jahren arbeitete er an Shows wie...,,,Durante la década de 1980 trabajó en programas...,,,"১৯৮০ এর দশকে তিনি ট্যাক্সি, চিয়ার্স এবং দ্য ট...",,,"В 1980-х годах он работал в таких шоу, как Tax...",,
9,In 1989 he helped create The Simpsons with Bro...,1989 half er bei der Erstellung der Simpsons m...,,,En 1989 ayudó a crear Los Simpson con Brooks y...,,,১৯৮৯ সালে তিনি ব্রুকস এবং গ্রোনিংয়ের সাথে দ্য...,,,В 1989 году он помог создать Симпсонов с Брукс...,,


# Generating the prompts for all language pairs

In [None]:
# Generating columns in the dataframe with the prompts for all language pairs calling the prompt function

# German
# getting source and target sentences from DataFrame
source_seg = tq_data['eng [eng_Latn.devtest]']
target_seg = tq_data['eng-deu [flores200-eng_Latn-deu_Latn-devtest]']

# define a list for prompts
eng_deu=[]
# loop through zipped source and target segments
for src,trg in zip(source_seg,target_seg):
        # generate the prompt for each row calling prompt function and append to result list
         eng_deu.append(prompt("English", "German",src,trg))
# assign list as new column in DataFrame
tq_data['eng-deu-prompt'] = eng_deu

# Spanish
source_seg = tq_data['eng [eng_Latn.devtest]']
target_seg = tq_data['eng-spa [flores200-eng_Latn-spa_Latn-devtest]']

eng_spa=[]
for src,trg in zip(source_seg,target_seg):
         eng_spa.append(prompt("English", "Spanish",src,trg))
tq_data['eng-spa-prompt'] = eng_spa

# Bengali

source_seg = tq_data['eng [eng_Latn.devtest]']
target_seg = tq_data['eng-ben [flores200-eng_Latn-ben_Beng-devtest]']

eng_ben=[]
for src,trg in zip(source_seg,target_seg):
         eng_ben.append(prompt("English", "Bengali",src,trg))
tq_data['eng-ben-prompt'] = eng_ben

# Russian
source_seg = tq_data['eng [eng_Latn.devtest]']
target_seg = tq_data['eng-rus [flores200-eng_Latn-rus_Cyrl-devtest]']

eng_rus=[]
for src,trg in zip(source_seg,target_seg):
         eng_rus.append(prompt("English", "Russian",src,trg))
tq_data['eng-rus-prompt'] = eng_rus


In [None]:
# check the generated prompts for a language pair
for t in tq_data['eng-deu-prompt']:
    print(t)

(Write to excel)

In [None]:
# write prompts to excel
tq_data.to_excel("/content/TQA_ChatGPT_prompts_sec.xlsx")


For intermediate testing create a very small test set

In [None]:
# create small testset of all languages with the prompts in extending list
test_data = tq_data['eng-deu-prompt'][5:6].tolist()
test_data.extend(tq_data['eng-spa-prompt'][5:6].tolist())
test_data.extend(tq_data['eng-ben-prompt'][5:6].tolist())
test_data.extend(tq_data['eng-rus-prompt'][5:6].tolist())
test_data

['Output label one of "1 not equivalent", "2 little equivalent", "3 mostly equivalent", "4 nearly equivalent", "5 completely equivalent".\n    Classify the quality of translation in terms of semantic equivalence from English to German\n    into one of following classes on a five-point scale: "1 not equivalent", "2 little equivalent", "3 mostly equivalent", "4 nearly equivalent", "5 completely equivalent" Provide the chain of reasoning and explain the choice of the label in a maximum of 100 words.\n    English source: As a result, two fish species have become extinct, and two others have become endangered, including the humpback chub.\nGerman translation: Infolgedessen sind zwei Fischarten ausgestorben und zwei weitere gefährdet, darunter der Buckelwale.\n    Return the label and reasoning in the output as valid JSON.',
 'Output label one of "1 not equivalent", "2 little equivalent", "3 mostly equivalent", "4 nearly equivalent", "5 completely equivalent".\n    Classify the quality of tr

Later use subset datasets for each language

In [None]:
# create subsets for each language pair with the prompts as list
deu_data = tq_data['eng-deu-prompt'].tolist()
spa_data = tq_data['eng-spa-prompt'].tolist()
ben_data = tq_data['eng-ben-prompt'].tolist()
rus_data = tq_data['eng-rus-prompt'].tolist()
rus_data

['Output label one of "1 not equivalent", "2 little equivalent", "3 mostly equivalent", "4 nearly equivalent", "5 completely equivalent".\n    Classify the quality of translation in terms of semantic equivalence from English to Russian\n    into one of following classes on a five-point scale: "1 not equivalent", "2 little equivalent", "3 mostly equivalent", "4 nearly equivalent", "5 completely equivalent" Provide the chain of reasoning and explain the choice of the label in a maximum of 100 words.\n    English source: U.S. President George W. Bush arrived in Singapore the morning of November 16, beginning a week-long tour of Asia.\nRussian translation: Президент США Джордж Буш прибыл в Сингапур утром 16 ноября, начав недельный тур по Азии.\n    Return the label and reasoning in the output as valid JSON.',
 'Output label one of "1 not equivalent", "2 little equivalent", "3 mostly equivalent", "4 nearly equivalent", "5 completely equivalent".\n    Classify the quality of translation in t

# OPEN AI


# Calling OpenAI "Chat Completion" API

You first need to get an account, your [API-Keys](https://platform.openai.com/api-keys) and may need to go on a [paid plan](https://platform.openai.com/account/billing/overview). Consider [pricing](https://openai.com/pricing). Consider [usage policies](https://openai.com/policies/usage-policies).

[The whole experiment with those data did cost me about $0,35]


> Chat models take a list of messages as input and return a model-generated message as output.

Cf. docs of [chat completion API](https://platform.openai.com/docs/guides/text-generation/chat-completions-api).

The below code when applied to a language subset of 50 prompts has a running time around ~3min.

In [1]:
# @title Insert your API KEY
OPENAI_API_KEY = ""

In [None]:
from openai import OpenAI
import json
# passing key to client
client = OpenAI(api_key = OPENAI_API_KEY)

# list for results with all prompt , label dictionaries
results=[]

# for prompts in language-specific lists
# for tq_prompt in deu_data/spa_data/ben_data/rus_data:
for tq_prompt in test_data:
    # completion task
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        # system developer who can internally give some instructions for the conversation
        messages=
         [
             {"role": "system", "content": tq_prompt}
            ],
        # maximum number of tokens in response
        max_tokens=600,
        # how many chat completion choices to generate for each input message
        n=1,
        # sampling temperature to use controlling the randomness of the text generated ,
        #the higher more random, the lower more focused -> here more focused
        temperature=0.2
        )
    # decoding json object
    # strict to false to allow control character in string
    try:
        js_obj =(json.loads(completion.choices[0].message.content, strict=False))
        # if the maximum of tokens is exceeded, the truncated message produces
        # a Json decoding error, therefore the following exception handeling is implmented
    except:
        print("Exception\n",tq_prompt)
        print(completion.choices[0].message.content)
        ex_output = {"prompt":tq_prompt, "label":pd.NA,"reasoning": pd.NA}
        results.append(ex_output)
        continue

    # creating a dictionary with prompt, label and reasoning
    output = {"prompt":tq_prompt, "label":js_obj['label'],"reasoning": js_obj['reasoning']}
    # append to result list
    results.append(output)


In [None]:
#print(len(results))
for r in results:
    print(r['prompt'],"\n")
    print(r['label'],"\n")
    print(r['reasoning'],"\n")


Output label one of "1 not equivalent", "2 little equivalent", "3 mostly equivalent", "4 nearly equivalent", "5 completely equivalent".
    Classify the quality of translation in terms of semantic equivalence from English to German
    into one of following classes on a five-point scale: "1 not equivalent", "2 little equivalent", "3 mostly equivalent", "4 nearly equivalent", "5 completely equivalent" Provide the chain of reasoning and explain the choice of the label in a maximum of 100 words.
    English source: As a result, two fish species have become extinct, and two others have become endangered, including the humpback chub.
German translation: Infolgedessen sind zwei Fischarten ausgestorben und zwei weitere gefährdet, darunter der Buckelwale.
    Return the label and reasoning in the output as valid JSON. 

3 mostly equivalent 

The translation captures the main idea of the English source accurately. It correctly states that two fish species have become extinct and two others have

# Feeding results back to DataFrame

In [None]:
# get a copy
result_df = tq_data.copy()
# assign labels to existing columns
result_df['GPT_DE_XSTS-score'] = [r['label'] for r in deu_results]
# assign reasoning to existing columns
result_df['GPT_DE_Reasoning'] = [r['reasoning'] for r in deu_results]

result_df


In [None]:
# get a copy
result_df_1 = result_df.copy()
# assign labels to existing columns
result_df_1['GPT_SP_XSTS-score'] = [r['label'] for r in spa_results]
result_df_1['GPT_SP_Reasoning'] = [r['reasoning'] for r in spa_results]

result_df_1


In [None]:
# get a copy
result_df_2 = result_df_1.copy()
# assign labels to existing columns
result_df_2['GPT_BEN_XSTS-score'] = [r['label'] for r in ben_results]
result_df_2['GPT_BEN_Reasoning'] = [r['reasoning'] for r in ben_results]

result_df_2

In [None]:
# get a copy
result_df_3 = result_df_2.copy()
# assign labels to existing columns
result_df_3['GPT_RU_XSTS-score'] = [r['label'] for r in (rus_results or [])]
result_df_3['GPT_RU_Reasoning'] = [r['reasoning'] for r in (rus_results or [])]

result_df_3

Unnamed: 0,eng [eng_Latn.devtest],eng-deu [flores200-eng_Latn-deu_Latn-devtest],GPT_DE_XSTS-score,GPT_DE_Reasoning,eng-spa [flores200-eng_Latn-spa_Latn-devtest],GPT_SP_XSTS-score,GPT_SP_Reasoning,eng-ben [flores200-eng_Latn-ben_Beng-devtest],GPT_BEN_XSTS-score,GPT_BEN_Reasoning,eng-rus [flores200-eng_Latn-rus_Cyrl-devtest],GPT_RU_XSTS-score,GPT_RUS_Reasoning,eng-deu-prompt,eng-spa-prompt,eng-ben-prompt,eng-rus-prompt
0,U.S. President George W. Bush arrived in Singa...,US-Präsident George W. Bush kam am Morgen des ...,4 nearly equivalent,The translation is nearly equivalent because i...,El presidente de los Estados Unidos George W. ...,4 nearly equivalent,The translation is nearly equivalent because i...,মার্কিন প্রেসিডেন্ট জর্জ ডব্লিউ বুশ ১৬ই নভেম্ব...,5 completely equivalent,The translation is completely equivalent to th...,Президент США Джордж Буш прибыл в Сингапур утр...,4 nearly equivalent,The translation is nearly equivalent because i...,"Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit..."
1,He was greeted by Singapore's Deputy Prime Min...,Er wurde vom stellvertretenden Premierminister...,4 nearly equivalent,The translation is nearly equivalent because i...,Fue recibido por el viceprimer ministro de Sin...,4 nearly equivalent,The translation is nearly equivalent because i...,সিঙ্গাপুরের উপ-প্রধানমন্ত্রী ওং কান সেং তাঁকে ...,5 completely equivalent,The Bengali translation accurately conveys the...,Его приветствовал заместитель премьер-министра...,4 nearly equivalent,The translation is nearly equivalent because i...,"Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit..."
2,After a week of losses in the midterm election...,Nach einer Woche der Niederlage bei den Zwisch...,3 mostly equivalent,The translation captures the main idea of the ...,Después de una semana de pérdidas en las elecc...,3 mostly equivalent,The translation is mostly equivalent because i...,"মধ্যবর্তী নির্বাচনে পরাজয়ের এক সপ্তাহ পর, বুশ...",5 completely equivalent,The Bengali translation accurately conveys the...,После недели поражений на промежуточных выбора...,2 little equivalent,The translation is not completely accurate. Th...,"Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit..."
3,"After the dam was built in 1963, the seasonal ...","Nachdem der Damm 1963 gebaut worden war, wurde...",4 nearly equivalent,The translation accurately captures the meanin...,"Después de que se construyó la presa en 1963, ...",4 nearly equivalent,The translation accurately conveys the meaning...,"১৯৬৩ সালে বাঁধটি নির্মিত হওয়ার পরে, মৌসুমী বন...",4 nearly equivalent,The translation accurately captures the main i...,"После того, как в 1963 году была построена пло...",4 nearly equivalent,The translation accurately conveys the meaning...,"Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit..."
4,This sediment was necessary for creating sandb...,"Dieses Sediment war notwendig, um Sandbänke un...",4 nearly equivalent,The translation accurately conveys the meaning...,Este sedimento fue necesario para crear bancos...,4 nearly equivalent,The translation is nearly equivalent because i...,এই পলল বন্যপ্রাণীর আবাসস্থল হিসেবে কাজ করে এমন...,4 nearly equivalent,The translation accurately captures the meanin...,Этот осадок был необходим для создания песчаны...,4 nearly equivalent,The translation accurately conveys the meaning...,"Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit..."
5,"As a result, two fish species have become exti...",Infolgedessen sind zwei Fischarten ausgestorbe...,3 mostly equivalent,The translation is mostly equivalent because i...,"Como resultado, dos especies de peces se han e...",4 nearly equivalent,The translation accurately conveys the meaning...,"ফলস্বরূপ, দুটি মাছের প্রজাতি বিলুপ্ত হয়ে গেছে...",4 nearly equivalent,The Bengali translation accurately conveys the...,"В результате два вида рыб вымерли, а два други...",4 nearly equivalent,The translation accurately conveys the meaning...,"Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit..."
6,Although the water level will only rise a few ...,Obwohl der Wasserspiegel nach der Flut nur ein...,3 mostly equivalent,The translation captures the main idea of the ...,Aunque el nivel del agua solo aumentará unos p...,3 mostly equivalent,The translation accurately conveys the main id...,যদিও বন্যার পর পানির স্তর মাত্র কয়েক ফুট বৃদ্...,4 nearly equivalent,The Bengali translation captures the main idea...,Хотя после наводнения уровень воды поднимется ...,3 mostly equivalent,The translation accurately conveys the main id...,"Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit..."
7,Before The Simpsons Simon had worked on severa...,Vor den Simpsons hatte Simon in verschiedenen ...,4 nearly equivalent,The translation is nearly equivalent because i...,Antes de The Simpsons Simon había trabajado en...,4 nearly equivalent,The translation is nearly equivalent because i...,দ্য সিম্পসনসের আগে সাইমন বিভিন্ন পজিশনে বেশ কয...,4 nearly equivalent,The translation accurately conveys the meaning...,"До ""Симпсонов"" Саймон работал на нескольких шо...",4 nearly equivalent,The translation is nearly equivalent because i...,"Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit..."
8,During the 1980s he worked on shows such as Ta...,In den 1980er Jahren arbeitete er an Shows wie...,4 nearly equivalent,The translation is nearly equivalent because i...,Durante la década de 1980 trabajó en programas...,4 nearly equivalent,The translation is nearly equivalent because i...,"১৯৮০ এর দশকে তিনি ট্যাক্সি, চিয়ার্স এবং দ্য ট...",4 nearly equivalent,The translation accurately conveys the meaning...,"В 1980-х годах он работал в таких шоу, как Tax...",4 nearly equivalent,The translation is nearly equivalent because i...,"Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit..."
9,In 1989 he helped create The Simpsons with Bro...,1989 half er bei der Erstellung der Simpsons m...,4 nearly equivalent,The translation accurately conveys the main id...,En 1989 ayudó a crear Los Simpson con Brooks y...,5 completely equivalent,The translation accurately conveys the meaning...,১৯৮৯ সালে তিনি ব্রুকস এবং গ্রোনিংয়ের সাথে দ্য...,5 completely equivalent,The Bengali translation accurately conveys the...,В 1989 году он помог создать Симпсонов с Брукс...,4 nearly equivalent,The translation accurately conveys the meaning...,"Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit...","Output label one of ""1 not equivalent"", ""2 lit..."


In [None]:
# write result to excel
result_df_3.to_excel("/content/TQA_GPT_XSTSscores_sec.xlsx")

# References

Kocmi, T. & Federmann, C. (2023). Large Language Models Are State-of-the-Art
Evaluators of Translation Quality. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pp. 193–203, Tampere, Finland. https://aclanthology.org/2023.eamt-1.19.pdf

Licht, D. et al. (2022). Consistent Human Evaluation of Machine Translation across Language Pairs. Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas, Orlando, USA. https://aclanthology.org/2022.amta-research.24.pdf

NLLB Team (2022). No Language Left Behind: Scaling Human-Centered Machine Translation. https://arxiv.org/abs/2207.04672

Savenkov, K. (Jan 29, 2023). GPT-3 Translation Capabilities. In *Medium*. Retrieved from: https://medium.com/intento/gpt-3-translation-capabilities-8a9290731a45