# Evaluate Luxembourgish instruction tuning set

In this notebook, we evaluate the to English translated Luxembourgish instruction tuning set. We already got the scores for each row from Nvidia Nemotron 340b reward. We want to filter out low quality instruction-answer pairs based on correctness, helpfulness and coherence. We retain these 3 attributes to be the most important ones, as correct information is crucial for fine-tuning LLMs to prevent the generation of misleading or factual inaccurate responses. This will ensure the quality and reliablity of the fine-tuned model. The other important attribute being helpfulness, as the main purpose of LLM's is to assist users. Unhelpful answers, even if correct, are not contributing to the fine-tuning process. Filtering out unhelpful answers ensures that the model learns to generate useful and informative responses. Finally, ensuring coherence makes it that the model not only generates accurate and helpful, but also well-structured and easily followed answers, leading to a better user experience.

In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import pandas as pd

# read in the sampled translated csv file
df = pd.read_csv('/content/drive/My Drive/Master Thesis/sampled_instruction_tuning_set_reward.csv')
df

Unnamed: 0,instruction,response,text,model_response
0,What is the brightest star in Dorado?,"α Doradus, the brightest star in Dorado, is 20...",The following provides an instruction in the f...,"helpfulness:1.84375,correctness:1.6640625,cohe..."
1,"What happened to the Edict of Nantes, after He...","The Edict of Nantes was unfortunately not ""per...",The following provides an instruction in the f...,"helpfulness:2.75,correctness:2.890625,coherenc..."
2,When did the devolution war start and how did ...,The war of devolution began in the spring of 1...,The following provides an instruction in the f...,"helpfulness:2.578125,correctness:2.75,coherenc..."
3,When and where did Max Ophüls die?,"Max Ophüls died on March 26, 1957 in Hamburg.",The following provides an instruction in the f...,"helpfulness:3.171875,correctness:3.21875,coher..."
4,What happened to the Schumacher building after...,The police museum of the non-profit organizati...,The following provides an instruction in the f...,"helpfulness:2.890625,correctness:3.078125,cohe..."
...,...,...,...,...
195,Are there other opportunities to become politi...,"Yes, there are other opportunities to become p...",The following provides an instruction in the f...,"helpfulness:2.328125,correctness:2.53125,coher..."
196,When and where does the City Night Bus CN1 go?,The City Night Bus CN1 goes from the Upper Tow...,The following provides an instruction in the f...,"helpfulness:0.94140625,correctness:1.0390625,c..."
197,What was the main topic of the consultations o...,"The main topic was the refugee issue, especial...",The following provides an instruction in the f...,"helpfulness:2.671875,correctness:2.671875,cohe..."
198,What is the role of the hereditary grand ducal...,We really live in very complicated times and y...,The following provides an instruction in the f...,"helpfulness:1.171875,correctness:1.2734375,coh..."


# Convert scores to rows in dataframe

In [3]:
def extract_scores(s):
  scores = dict(x.split(':') for x in s.split(','))
  return float(scores['helpfulness']), float(scores['correctness']), float(scores['coherence']), float(scores['complexity']), float(scores['verbosity'])

In [4]:
df[['helpfulness', 'correctness', 'coherence', 'complexity', 'verbosity']] = df['model_response'].apply(extract_scores).apply(pd.Series)
df.drop(columns=['model_response'], inplace=True)
df

Unnamed: 0,instruction,response,text,helpfulness,correctness,coherence,complexity,verbosity
0,What is the brightest star in Dorado?,"α Doradus, the brightest star in Dorado, is 20...",The following provides an instruction in the f...,1.843750,1.664062,3.546875,1.007812,0.730469
1,"What happened to the Edict of Nantes, after He...","The Edict of Nantes was unfortunately not ""per...",The following provides an instruction in the f...,2.750000,2.890625,3.765625,1.117188,0.476562
2,When did the devolution war start and how did ...,The war of devolution began in the spring of 1...,The following provides an instruction in the f...,2.578125,2.750000,3.687500,1.117188,0.753906
3,When and where did Max Ophüls die?,"Max Ophüls died on March 26, 1957 in Hamburg.",The following provides an instruction in the f...,3.171875,3.218750,3.984375,0.667969,0.484375
4,What happened to the Schumacher building after...,The police museum of the non-profit organizati...,The following provides an instruction in the f...,2.890625,3.078125,3.750000,1.140625,1.164062
...,...,...,...,...,...,...,...,...
195,Are there other opportunities to become politi...,"Yes, there are other opportunities to become p...",The following provides an instruction in the f...,2.328125,2.531250,3.578125,0.933594,0.535156
196,When and where does the City Night Bus CN1 go?,The City Night Bus CN1 goes from the Upper Tow...,The following provides an instruction in the f...,0.941406,1.039062,3.218750,0.511719,0.337891
197,What was the main topic of the consultations o...,"The main topic was the refugee issue, especial...",The following provides an instruction in the f...,2.671875,2.671875,3.812500,0.667969,0.613281
198,What is the role of the hereditary grand ducal...,We really live in very complicated times and y...,The following provides an instruction in the f...,1.171875,1.273438,3.281250,0.695312,0.816406


In [5]:
#df.to_csv('/content/drive/My Drive/Master Thesis/sampled_instruction_tuning_set_reward_scores.csv', index=False)

# Filter out rows based on helpfulness, correctness and coherence

According to Nvidia NIM (https://build.nvidia.com/nvidia/nemotron-4-340b-reward), the scores are as follows:

* < 0.5 = very low
* 0.5 - 1.5 = low
* 1.5 - 2.5 = moderate
* 2.5 - 3.5 = high
* \> 3.5 = very high

Hence, we want to get rid of all rows, who's helpfulness and correctness scores are lower than 2.5, as we only want keep high scored samples.

In [6]:
TRESHOLD_CORRECTNESS = 2.5
TRESHOLD_HELPFULNESS = 2.5
TRESHOLD_COHERENCE = 3.5

In [7]:
df_filtered = df[(df['correctness'] > TRESHOLD_CORRECTNESS) & (df['helpfulness'] > TRESHOLD_HELPFULNESS) & (df['coherence'] > TRESHOLD_COHERENCE)]
df_filtered

Unnamed: 0,instruction,response,text,helpfulness,correctness,coherence,complexity,verbosity
1,"What happened to the Edict of Nantes, after He...","The Edict of Nantes was unfortunately not ""per...",The following provides an instruction in the f...,2.750000,2.890625,3.765625,1.117188,0.476562
2,When did the devolution war start and how did ...,The war of devolution began in the spring of 1...,The following provides an instruction in the f...,2.578125,2.750000,3.687500,1.117188,0.753906
3,When and where did Max Ophüls die?,"Max Ophüls died on March 26, 1957 in Hamburg.",The following provides an instruction in the f...,3.171875,3.218750,3.984375,0.667969,0.484375
4,What happened to the Schumacher building after...,The police museum of the non-profit organizati...,The following provides an instruction in the f...,2.890625,3.078125,3.750000,1.140625,1.164062
5,What are the essential amino acids? List the 9...,"The essential amino acids are: histidine, isol...",The following provides an instruction in the f...,3.468750,3.562500,3.859375,1.078125,0.769531
...,...,...,...,...,...,...,...,...
188,Enter the top 3 of the Eurovision Song Contest...,The top 3 were: 1. Sweden (Loreen) with 372 po...,The following provides an instruction in the f...,3.421875,3.375000,3.968750,0.812500,1.000000
189,How many hectares of forest have burned in Por...,"Since the beginning of 2022, more than 30,000 ...",The following provides an instruction in the f...,2.562500,2.625000,3.843750,0.750000,0.574219
192,Give me the name of the first solo album of RO...,The first solo album of ROB THOMAS is SOMETHIN...,The following provides an instruction in the f...,3.718750,3.781250,3.875000,0.742188,0.871094
194,What is the opinion of the initiative Nee2015-...,The Nee2015-Wee2050 initiative is of the opini...,The following provides an instruction in the f...,2.921875,2.953125,3.703125,1.273438,0.941406


# Filter human evaluated data

Now we want to check on our human evaluated dataset, which was evaluated based on helpfulness and coherence only. The scores were again given in a range from 0 to 4, but retaining integer scores for simplicity.

In [8]:
!pip install openpyxl==3.1.2

Collecting openpyxl==3.1.2
  Downloading openpyxl-3.1.2-py2.py3-none-any.whl.metadata (2.5 kB)
Downloading openpyxl-3.1.2-py2.py3-none-any.whl (249 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.0/250.0 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: openpyxl
  Attempting uninstall: openpyxl
    Found existing installation: openpyxl 3.1.5
    Uninstalling openpyxl-3.1.5:
      Successfully uninstalled openpyxl-3.1.5
Successfully installed openpyxl-3.1.2


In [9]:
df = pd.read_excel('/content/drive/My Drive/Master Thesis/human_evaluated_instruction_tuning_set_sampled.xlsx')
df

Unnamed: 0,instruction,response,helpfulness,coherence
0,What is the brightest star in Dorado?,"α Doradus, the brightest star in Dorado, is 20...",3,3
1,"What happened to the Edict of Nantes, after He...","The Edict of Nantes was unfortunately not ""per...",1,0
2,When did the devolution war start and how did ...,The war of devolution began in the spring of 1...,2,2
3,When and where did Max Ophüls die?,"Max Ophüls died on March 26, 1957 in Hamburg.",4,4
4,What happened to the Schumacher building after...,The police museum of the non-profit organizati...,3,4
...,...,...,...,...
195,Are there other opportunities to become politi...,"Yes, there are other opportunities to become p...",4,4
196,When and where does the City Night Bus CN1 go?,The City Night Bus CN1 goes from the Upper Tow...,4,4
197,What was the main topic of the consultations o...,"The main topic was the refugee issue, especial...",3,2
198,What is the role of the hereditary grand ducal...,We really live in very complicated times and y...,0,1


In [10]:
df_filtered = df[(df['helpfulness'] > TRESHOLD_HELPFULNESS-0.5) & (df['coherence'] > TRESHOLD_COHERENCE-0.5)]
df_filtered

Unnamed: 0,instruction,response,helpfulness,coherence
3,When and where did Max Ophüls die?,"Max Ophüls died on March 26, 1957 in Hamburg.",4,4
4,What happened to the Schumacher building after...,The police museum of the non-profit organizati...,3,4
6,Give me the names of the former municipalities...,"The former municipalities of Beho, Bovigny, Ch...",4,4
7,Give me the area of the Polvermillen and what ...,"The Polvermillen has an area of 24.82 ha, whic...",4,4
9,How many films does the Cinémathèque de la Vil...,The Cinémathèque de la Ville de Luxembourg pro...,4,4
...,...,...,...,...
192,Give me the name of the first solo album of RO...,The first solo album of ROB THOMAS is SOMETHIN...,4,4
194,What is the opinion of the initiative Nee2015-...,The Nee2015-Wee2050 initiative is of the opini...,4,4
195,Are there other opportunities to become politi...,"Yes, there are other opportunities to become p...",4,4
196,When and where does the City Night Bus CN1 go?,The City Night Bus CN1 goes from the Upper Tow...,4,4


In [11]:
df = pd.read_excel('/content/drive/My Drive/Master Thesis/sampled_instruction_tuning_set_human_eval_Lux.xlsx')
df

Unnamed: 0,human eval,instruction,response,text
0,"I=1; R=1 (versteh aber nicht, warum 1 mal ""de""...",Wat ass de hellste Stär am Dorado?,"Den α Doradus, den hellste Stär am Dorado, as...",The following provides an instruction in form ...
1,I=1; R=0,"Wat ass mam Edikt vun Nantes geschitt, nodeem ...","Den Edikt vun Nantes gouf leider net ""perpétue...",The following provides an instruction in form ...
2,"I=1 (aber es heißt nicht ""wann"", sondern ""wéin...",Wann huet de Devolutiounskrich ugefaangen a wé...,De Devolutiounskrich huet am Fréijouer 1667 ug...,The following provides an instruction in form ...
3,"I=1 (idem ""wéini"") R=1",Wann an wou ass de Max Ophüls gestuerwen?,De Max Ophüls ass de 26. Mäerz 1957 zu Hamburg...,The following provides an instruction in form ...
4,I=1; R=1,"Wat ass mam Gebai vum Schumacher geschitt, nod...",De Policemusée vun der Asbl 'Le musée de la po...,The following provides an instruction in form ...
...,...,...,...,...
195,I=1; R=1,Gëtt et och aner Méiglechkeeten fir politesch ...,"Jo, et gëtt och aner Méiglechkeeten fir polite...",The following provides an instruction in form ...
196,I=1; R=1,Wann a wou fiert de City Night Bus CN1?,De City Night Bus CN1 geet vun der Uewerstad i...,The following provides an instruction in form ...
197,I=1 (falscher Artikel); R=1 (falscher Artikel),Wat war den Haaptthema vun de Berodunge vum Co...,"Den Haaptthema war d'Flüchtlingsproblematik, b...",The following provides an instruction in form ...
198,I=1; R=1,Wat ass d'Rôle vun der ierfgroussherzoglecher ...,Mir liewen wierklech a ganz komplizéierten Zäi...,The following provides an instruction in form ...


We want to see, how many examples are deemed high quality by the human evaluator, where I=1 and R=1

In [13]:
df_filtered = df[df['human eval'].str.contains('I=1') == True]
df_filtered = df_filtered[df_filtered['human eval'].str.contains('R=1') == True]
df_filtered

Unnamed: 0,human eval,instruction,response,text
0,"I=1; R=1 (versteh aber nicht, warum 1 mal ""de""...",Wat ass de hellste Stär am Dorado?,"Den α Doradus, den hellste Stär am Dorado, as...",The following provides an instruction in form ...
2,"I=1 (aber es heißt nicht ""wann"", sondern ""wéin...",Wann huet de Devolutiounskrich ugefaangen a wé...,De Devolutiounskrich huet am Fréijouer 1667 ug...,The following provides an instruction in form ...
3,"I=1 (idem ""wéini"") R=1",Wann an wou ass de Max Ophüls gestuerwen?,De Max Ophüls ass de 26. Mäerz 1957 zu Hamburg...,The following provides an instruction in form ...
4,I=1; R=1,"Wat ass mam Gebai vum Schumacher geschitt, nod...",De Policemusée vun der Asbl 'Le musée de la po...,The following provides an instruction in form ...
5,"I=1 (aber ""nimm op"" würde ich so nicht sagen);...",Wat sinn déi essenziell Aminosaieren? Nimm déi...,"Déi essenziell Aminosaieren sinn: Histidin, Is...",The following provides an instruction in form ...
...,...,...,...,...
195,I=1; R=1,Gëtt et och aner Méiglechkeeten fir politesch ...,"Jo, et gëtt och aner Méiglechkeeten fir polite...",The following provides an instruction in form ...
196,I=1; R=1,Wann a wou fiert de City Night Bus CN1?,De City Night Bus CN1 geet vun der Uewerstad i...,The following provides an instruction in form ...
197,I=1 (falscher Artikel); R=1 (falscher Artikel),Wat war den Haaptthema vun de Berodunge vum Co...,"Den Haaptthema war d'Flüchtlingsproblematik, b...",The following provides an instruction in form ...
198,I=1; R=1,Wat ass d'Rôle vun der ierfgroussherzoglecher ...,Mir liewen wierklech a ganz komplizéierten Zäi...,The following provides an instruction in form ...
