## Προσθέτουμε τις απαραίτητες βιβλιοθήκες

## Σημείωση : Η sentence_transformers χρησιμοποιεί pre-trained μοντέλα για την παραγωγή embeddings

In [22]:
from bert_score import score
from sentence_transformers import SentenceTransformer
import numpy as np
import pandas as pd
import torch


## Κώδικας για να ανοίξουμε και να συλλέξουμε το αρχικό κείμενο αλλά και τα αποτελέσματα των pipelines

In [23]:
def take_files(path):  
    try:
        with open(path, 'r', encoding='utf-8') as file:
            print("File content successfully loaded.")
            return file.read()
    except FileNotFoundError:
        print("Error: The file  was not found. Please make sure the file exists and the path is correct.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

In [24]:
paths=["../text_files/text1.txt","../text_files/first_pipeline_text1.txt","../text_files/second_pipeline_text1.txt","../text_files/third_pipeline_text1.txt"]
texts=[]
for path in paths:
    texts.append(take_files(path))
original_text=texts[0]
pipeline1=texts[1]
pipeline2=texts[2]
pipeline3=texts[3]



File content successfully loaded.
File content successfully loaded.
File content successfully loaded.
File content successfully loaded.


## Αποθήκευση των ground truth's με βάση τα οποία θα γίνει η σύγκριση


In [25]:
gemini="""Happy Dragon Boat Festival! I hope you're celebrating safely and enjoying a wonderful time.

Thank you for relaying our message to the doctor regarding his contract review. I actually received the approved message from the professor a couple of days ago. I'm very grateful for the professor's full support with our Springer proceedings publication."""

In [26]:
deepseek="""Today is our Dragon Boat Festival, a cherished celebration in Chinese culture. I wish you safety, joy, and prosperity during this time—may you enjoy the festivities as deeply as I hope for you.

Thank you for sharing the message regarding the doctor’s contract review. I’ve already received the approved notice from the professor a few days ago and sincerely appreciate their unwavering support for our Springer proceedings publication."""


In [27]:
chat_gpt="""Today is our Dragon Boat Festival, a special celebration in our Chinese culture. It is a time to honor and wish for safety and prosperity in our lives. I hope you also enjoy this festival, as I send you my warmest wishes.

Thank you for your message and for conveying our words to the doctor regarding his upcoming contract review. This is important for all of us.

I received this message to confirm the approval. Actually, the professor shared this with me a couple of days ago. I truly appreciate the professor’s full support for our Springer proceedings publication."""

In [28]:
pipelines = {
    "pipeline1": pipeline1,
    "pipeline2": pipeline2,
    "pipeline3": pipeline3,
}

llms = {
    "chatgpt":  chat_gpt,
    "gemini":   gemini,
    "deepseek": deepseek,
}

## Το bert_model υπολογίζει bert score και το embed_model χρησιμοποιεί έτοιμα embeddings και cosine similarity 

In [29]:
bert_model  = "bert-base-uncased"                  
embed_model = SentenceTransformer("all-MiniLM-L6-v2")  

## Φτιάχνουμε ένα λεξικό embeddings,όπου key το όνομα του αντικειμένου και value η κωδικοποίηση του σε embedding.

In [30]:
all_texts = { **pipelines, **llms }
embeddings = {
    name: embed_model.encode(text, convert_to_tensor=True)
    for name, text in all_texts.items()
}

## Υπολογισμός Σκορ

In [31]:
rows = []
for l_name, l_text in llms.items():
    for p_name, p_text in pipelines.items():
        # BERTScore (wrap in lists)
        P, R, F1 = score([p_text], [l_text],
                         lang="en",
                         model_type=bert_model,
                         verbose=False,
                         device='cuda' if torch.cuda.is_available() else 'cpu')
        bP, bR, bF1 = float(P.mean()), float(R.mean()), float(F1.mean())

        # Embedding cosine similarity
        pe = embeddings[p_name]
        le = embeddings[l_name]
        cos = (pe @ le) / (pe.norm() * le.norm())

        rows.append({
            "Pipeline":       p_name,
            "LLM":            l_name,
            "BERTScore_P":    bP,
            "BERTScore_R":    bR,
            "BERTScore_F1":   bF1,
            "Embed_Cosine":   float(cos)
        })

df = pd.DataFrame(rows)


In [32]:
df

Unnamed: 0,Pipeline,LLM,BERTScore_P,BERTScore_R,BERTScore_F1,Embed_Cosine
0,pipeline1,chatgpt,0.776396,0.771578,0.77398,0.962389
1,pipeline2,chatgpt,0.777927,0.775032,0.776476,0.954267
2,pipeline3,chatgpt,0.578537,0.707876,0.636704,0.794937
3,pipeline1,gemini,0.656933,0.767745,0.708029,0.878521
4,pipeline2,gemini,0.665038,0.783617,0.719474,0.876309
5,pipeline3,gemini,0.508684,0.716024,0.594803,0.718989
6,pipeline1,deepseek,0.667285,0.700117,0.683307,0.936935
7,pipeline2,deepseek,0.667135,0.701651,0.683958,0.935804
8,pipeline3,deepseek,0.511285,0.639596,0.568288,0.806194


### Σύντομη Επεξήγηση Μετρικών

- **BERTScore Precision (P)**  
  Από τα token που παράγει το pipeline, ποιο ποσοστό βρίσκει “αντίστοιχα” token μέσα στην έξοδο του LLM. Μετράει πόσο «στοχευμένη» είναι η παραγωγή του pipeline σε σχέση με την αναφορά(ground truth).

- **BERTScore Recall (R)**  
  Από τα token που έχει το LLM, ποιο ποσοστό καλύπτεται από τα token του pipeline. Δείχνει πόσο πλήρες είναι το pipeline σε σχέση με την αναφορά(ground truth).

- **BERTScore F1**  
  Ο  μέσος του Precision και του Recall. Ισορροπεί την ποιότητα (P) με την κάλυψη (R) σε ένα ενιαίο σκορ 0-1.

- **Cosine Similarity (Embed_Cosine)**  
  Συνητινιμική ομοιότητα ανάμεσα στα ολόκληρα embeddings (διάνυσμα) δύο κειμένων. Τιμή 1 σημαίνει ταυτόσημη κατεύθυνση, 0 σημαίνει ορθογώνια (καμιά σχέση).  
