# Blue Scores

In eerste instantie hebben we het model geëvalueerd met klassieke metriek zoals:

- Exact Match (EM): geeft alleen een score van 1 als het modelantwoord exact gelijk is aan de referentie. In ons geval was deze 0%.

- Word Overlap: gemiddeld slechts 28.76%, wat suggereert dat het model vaak andere woorden gebruikte dan in de referenties.

- BLEU: een populaire metriek in machinevertaling, maar de score was gemiddeld 0.48%, wat erg laag is.

- ROUGE-1 (F1): had een gemiddelde van slechts 6.55%, wat wijst op beperkte n-gram overlap.

Deze scores gaven ons het signaal dat het model misschien inhoudelijk goed was, maar niet in exact dezelfde bewoordingen als de referentie.

BERTScore biedt een alternatief dat semantiek begrijpt, in plaats van puur naar woord-overlap te kijken. Het gebruikt de werking van BERT (een voorgetraind taalmodel van Google) om te meten hoe semantisch gelijk twee zinnen zijn.

### Voordelen van BERTScore:

1. Begrijpt de betekenis achter verschillende woorden.

2. Houdt rekening met synoniemen en parafrases.

3. Past goed bij open vragen of natuurlijke taal antwoorden.


### Wat doen we in dit deel?
We berekenen de BERT F1-score tussen het modelantwoord en elk van de 5 referenties afzonderlijk. Daarna pakken we de hoogste score per rij, en berekenen daar het gemiddelde van voor een beter algemeen beeld van de prestaties.

Zo kunnen we eerlijker beoordelen hoe goed het model inhoudelijk antwoord geeft, los van exacte bewoording.

In [2]:

import csv
import os
import pandas as pd

df = pd.read_csv(r"C:\Users\Nima\OneDrive - Reza Company\Desktop\Nima+\School\Datalab-V\tests\evaluation_results\evaluation_csv.csv")

In [3]:
df

Unnamed: 0,id,question,model_answer,reference_1,reference_2,reference_3,reference_4,reference_5,bleu_1,bleu_2,...,rouge1_f1_2,rouge1_f1_3,rouge1_f1_4,rouge1_f1_5,bertscore_f1,bertscore_ref1,bertscore_ref2,bertscore_ref3,bertscore_ref4,bertscore_ref5
0,1,What is the European Green Deal and what are i...,The European Green Deal is a comprehensive str...,The European Green Deal is the EU's master pla...,"It is a roadmap covering energy, transport, in...","Under the Deal, every EU policy must contribut...",A key aim is to slash emissions at least 55 % ...,The package also seeks to mobilise €1 trillion...,9.0,0.9,...,10.4,23.8,18.6,10.4,0.695411,0.695411,0.545027,0.591295,0.602468,0.548973
1,2,How does the 'Fit for 55' package contribute t...,The 'Fit for 55' package is a set of proposals...,‘Fit for 55’ is the EU’s 2021 legislative mega...,"It tightens the Emissions Trading System, exte...",The package raises the Renewable Energy Direct...,By upgrading everything from fuel taxation to ...,"Taken together, these new and updated laws loc...",4.0,0.3,...,13.2,8.3,11.4,6.6,0.662858,0.662858,0.537642,0.530465,0.565504,0.474317
2,3,What are the key principles of the EU's strate...,Key principles of the EU's strategy for clean ...,The strategy follows the ‘energy-efficiency-fi...,"It aims to massively scale renewables, targeti...",Grid modernisation and digitalisation are cent...,Affordability and security of supply remain co...,"Empowering citizens—through smart meters, pros...",0.3,0.5,...,11.4,2.4,9.2,3.5,0.522448,0.429814,0.5177,0.439663,0.522448,0.431097
3,4,How can I reduce my carbon footprint at home?,1. Use energy-efficient appliances and light b...,Upgrade your insulation and windows so your he...,Switch to a certified green electricity tariff...,Choose A+++-rated appliances and run them only...,Lower your thermostat by 1-2 °C and use progra...,"Cut waste: repair instead of replace, recycle ...",0.3,0.6,...,8.5,4.3,12.4,10.6,0.536837,0.501377,0.536837,0.418191,0.488502,0.523224
4,5,What are some simple ways to save energy in my...,Some simple ways to save energy in daily life ...,"Take shorter, cooler showers and install low-f...",Air-dry clothes and dishes instead of using el...,Use lids on pots and match burner sizes to coo...,Enable power-saving modes and unplug chargers ...,Let natural daylight in and switch to motion-s...,0.1,0.6,...,5.7,5.6,3.4,5.6,0.531289,0.454937,0.531289,0.467365,0.500149,0.440154
5,6,What is the role of renewable energy sources i...,Renewable energy sources play a crucial role i...,"Renewables displace fossil fuels, cutting CO₂ ...","Because wind and solar costs have fallen, they...","They diversify the energy mix, improving secur...",Deploying renewables creates local jobs in man...,Decentralised renewables empower communities t...,0.8,0.9,...,11.3,15.9,6.5,6.5,0.55298,0.53629,0.529222,0.552979,0.478439,0.47601
6,7,How does the European Green Deal aim to make t...,The European Green Deal aims to make transport...,"EU funds rail, metro and bus projects to make ...",It phases in stricter CO₂ limits so only zero-...,Alternative-fuels infrastructure regulation wi...,The ReFuelEU and FuelEU proposals oblige aviat...,Urban mobility plans encourage walking and cyc...,0.5,0.2,...,6.0,6.1,6.0,6.2,0.529699,0.528174,0.474826,0.497339,0.510225,0.529699
7,8,What is the 'Farm to Fork' strategy and how do...,The 'Farm to Fork' strategy is a European Unio...,Targets a 50 % reduction in chemical pesticide...,Aims to boost EU organic farming share to 25 %...,"Sets measures to improve animal welfare, inclu...",Encourages shorter supply chains and fairer re...,Seeks to halve per-capita food waste at retail...,0.7,0.5,...,7.2,9.1,5.5,10.8,0.529644,0.512044,0.485557,0.528032,0.517278,0.529644
8,9,How can I contribute to a circular economy?,You can contribute to a circular economy by:\r...,Choose products with eco-design or repairabili...,Lease or rent items such as tools or electroni...,Share or swap rarely used goods within neighbo...,Return used products to take-back schemes so m...,Compost organic waste to cycle nutrients back ...,0.2,0.2,...,2.0,0.0,4.0,3.0,0.415053,0.415053,0.392595,0.367306,0.398094,0.388449
9,12,What are some examples of sustainable living p...,Mention at least 3 examples and explain why th...,"Adopt a plant-forward diet, limiting red meat ...","Carry a reusable water bottle, coffee cup and ...","Buy second-hand goods, repair broken items and...",Collect rainwater for gardening and install lo...,Use renewable tariffs and switch off standby p...,0.2,0.2,...,7.3,3.0,1.5,5.9,0.488722,0.488722,0.451123,0.45387,0.441951,0.486575


In [None]:
import pandas as pd
from bert_score import score
from transformers import AutoTokenizer, AutoModelForMaskedLM
import numpy as np

# 🧠 Laad jouw model/tokenizer
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name)


def compute_all_bertscores(row):
    model_answer = row["model_answer"]
    
    for i in range(1, 6):
        ref = row.get(f"reference_{i}", None)
        if pd.notna(ref):
            F1 = score(
                [model_answer],
                [ref],
                model_type=model_name,
                lang="en",
                verbose=False,
                rescale_with_baseline=False
            )[2]
            row[f"bertscore_ref{i}"] = F1[0].item()
        else:
            row[f"bertscore_ref{i}"] = np.nan
    
    return row

# Pas toe op elke rij
df = df.apply(compute_all_bertscores, axis=1)
df


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Unnamed: 0,id,question,model_answer,reference_1,reference_2,reference_3,reference_4,reference_5,bleu_1,bleu_2,...,rouge1_f1_2,rouge1_f1_3,rouge1_f1_4,rouge1_f1_5,bertscore_f1,bertscore_ref1,bertscore_ref2,bertscore_ref3,bertscore_ref4,bertscore_ref5
0,1,What is the European Green Deal and what are i...,The European Green Deal is a comprehensive str...,The European Green Deal is the EU's master pla...,"It is a roadmap covering energy, transport, in...","Under the Deal, every EU policy must contribut...",A key aim is to slash emissions at least 55 % ...,The package also seeks to mobilise €1 trillion...,9.0,0.9,...,10.4,23.8,18.6,10.4,0.695411,0.695411,0.545027,0.591295,0.602468,0.548973
1,2,How does the 'Fit for 55' package contribute t...,The 'Fit for 55' package is a set of proposals...,‘Fit for 55’ is the EU’s 2021 legislative mega...,"It tightens the Emissions Trading System, exte...",The package raises the Renewable Energy Direct...,By upgrading everything from fuel taxation to ...,"Taken together, these new and updated laws loc...",4.0,0.3,...,13.2,8.3,11.4,6.6,0.662858,0.662858,0.537642,0.530465,0.565504,0.474317
2,3,What are the key principles of the EU's strate...,Key principles of the EU's strategy for clean ...,The strategy follows the ‘energy-efficiency-fi...,"It aims to massively scale renewables, targeti...",Grid modernisation and digitalisation are cent...,Affordability and security of supply remain co...,"Empowering citizens—through smart meters, pros...",0.3,0.5,...,11.4,2.4,9.2,3.5,0.522448,0.429814,0.5177,0.439663,0.522448,0.431097
3,4,How can I reduce my carbon footprint at home?,1. Use energy-efficient appliances and light b...,Upgrade your insulation and windows so your he...,Switch to a certified green electricity tariff...,Choose A+++-rated appliances and run them only...,Lower your thermostat by 1-2 °C and use progra...,"Cut waste: repair instead of replace, recycle ...",0.3,0.6,...,8.5,4.3,12.4,10.6,0.536837,0.501377,0.536837,0.418191,0.488502,0.523224
4,5,What are some simple ways to save energy in my...,Some simple ways to save energy in daily life ...,"Take shorter, cooler showers and install low-f...",Air-dry clothes and dishes instead of using el...,Use lids on pots and match burner sizes to coo...,Enable power-saving modes and unplug chargers ...,Let natural daylight in and switch to motion-s...,0.1,0.6,...,5.7,5.6,3.4,5.6,0.531289,0.454937,0.531289,0.467365,0.500149,0.440154
5,6,What is the role of renewable energy sources i...,Renewable energy sources play a crucial role i...,"Renewables displace fossil fuels, cutting CO₂ ...","Because wind and solar costs have fallen, they...","They diversify the energy mix, improving secur...",Deploying renewables creates local jobs in man...,Decentralised renewables empower communities t...,0.8,0.9,...,11.3,15.9,6.5,6.5,0.55298,0.53629,0.529222,0.552979,0.478439,0.47601
6,7,How does the European Green Deal aim to make t...,The European Green Deal aims to make transport...,"EU funds rail, metro and bus projects to make ...",It phases in stricter CO₂ limits so only zero-...,Alternative-fuels infrastructure regulation wi...,The ReFuelEU and FuelEU proposals oblige aviat...,Urban mobility plans encourage walking and cyc...,0.5,0.2,...,6.0,6.1,6.0,6.2,0.529699,0.528174,0.474826,0.497339,0.510225,0.529699
7,8,What is the 'Farm to Fork' strategy and how do...,The 'Farm to Fork' strategy is a European Unio...,Targets a 50 % reduction in chemical pesticide...,Aims to boost EU organic farming share to 25 %...,"Sets measures to improve animal welfare, inclu...",Encourages shorter supply chains and fairer re...,Seeks to halve per-capita food waste at retail...,0.7,0.5,...,7.2,9.1,5.5,10.8,0.529644,0.512044,0.485557,0.528032,0.517278,0.529644
8,9,How can I contribute to a circular economy?,You can contribute to a circular economy by:\r...,Choose products with eco-design or repairabili...,Lease or rent items such as tools or electroni...,Share or swap rarely used goods within neighbo...,Return used products to take-back schemes so m...,Compost organic waste to cycle nutrients back ...,0.2,0.2,...,2.0,0.0,4.0,3.0,0.415053,0.415053,0.392595,0.367306,0.398094,0.388449
9,12,What are some examples of sustainable living p...,Mention at least 3 examples and explain why th...,"Adopt a plant-forward diet, limiting red meat ...","Carry a reusable water bottle, coffee cup and ...","Buy second-hand goods, repair broken items and...",Collect rainwater for gardening and install lo...,Use renewable tariffs and switch off standby p...,0.2,0.2,...,7.3,3.0,1.5,5.9,0.488722,0.488722,0.451123,0.45387,0.441951,0.486575


In [4]:
# save as evaluation_csv.csv
df.to_csv(r"C:\Users\Nima\OneDrive - Reza Company\Desktop\Nima+\School\Datalab-V\tests\evaluation_results\evaluation_csv.csv", index=False)

df = pd.read_csv(r"C:\Users\Nima\OneDrive - Reza Company\Desktop\Nima+\School\Datalab-V\tests\evaluation_results\evaluation_csv.csv")

In [5]:
# Bereken hoogste BERTScore per rij
df["bertscore_max"] = df[[f"bertscore_ref{i}" for i in range(1, 6)]].max(axis=1)

# Bereken gemiddelde van hoogste scores
avg_bertscore_f1 = df["bertscore_max"].mean() *100
print("Gemiddelde BERTScore F1:", avg_bertscore_f1)


Gemiddelde BERTScore F1: 53.05254623293877


In [7]:
# Laad bestaande summary_stats
summary = pd.read_csv(r"c:\Users\Nima\OneDrive - Reza Company\Desktop\Nima+\School\Datalab-V\tests\evaluation_results\summary_stats.csv")

# Voeg nieuwe rij toe
summary = pd.concat([summary, pd.DataFrame([{
    "metric": "avg_bertscore_f1",
    "value": round(avg_bertscore_f1, 4) 
}])], ignore_index=True)


# Opslaan
summary.to_csv("summary_stats.csv", index=False)

# Controle
print(summary)


             metric     value
0       total_pairs  100.0000
1   avg_exact_match    0.0000
2  avg_word_overlap   28.7600
3          avg_bleu    0.4800
4     avg_rouge1_f1    6.5500
5  avg_bertscore_f1   53.0525



### Conclusie van 53% BERTScore F1:

Het model scoort gemiddeld 53% semantische overeenkomst met de beste referentie per vraag.

Dit betekent dat meer dan de helft van de betekenis van het referentieantwoord wordt overgebracht, zelfs als exacte woorden verschillen.

Dit toont aan dat het model inhoudelijk nuttige en relevante antwoorden kan geven, ook als het afwijkt van de oorspronkelijke
formulering.