## Προσθέτουμε τις απαραίτητες βιβλιοθήκες

## Σημείωση : Η sentence_transformers χρησιμοποιεί pre-trained μοντέλα για την παραγωγή embeddings

In [1]:
from bert_score import score
from sentence_transformers import SentenceTransformer
import numpy as np
import pandas as pd
import torch


## Κώδικας για να ανοίξουμε και να συλλέξουμε το αρχικό κείμενο αλλά και τα αποτελέσματα των pipelines

In [2]:
def take_files(path):  
    try:
        with open(path, 'r', encoding='utf-8') as file:
            print("File content successfully loaded.")
            return file.read()
    except FileNotFoundError:
        print("Error: The file  was not found. Please make sure the file exists and the path is correct.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

In [3]:
paths=["../text_files/text2.txt","../text_files/first_pipeline_text2.txt","../text_files/second_pipeline_text2.txt","../text_files/third_pipeline_text2.txt"]
texts=[]
for path in paths:
    texts.append(take_files(path))
original_text=texts[0]
pipeline1=texts[1]
pipeline2=texts[2]
pipeline3=texts[3]



File content successfully loaded.
File content successfully loaded.
File content successfully loaded.
File content successfully loaded.


## Αποθήκευση των ground truth's με βάση τα οποία θα γίνει η σύγκριση


In [4]:
gemini = """During our last discussion, I mentioned the new submission we have been waiting for since last autumn, noting that the recent updates were confusing because they appeared to be missing the full feedback from either the reviewer or the editor. Despite some recent delays and a lack of communication, I firmly believe the team tried their absolute best and cooperated effectively, and we should all be grateful for their efforts which led to the paper's acceptance, finalized by the Springer link that arrived last week. Regarding the final steps, could you please remind me if the doctor still plans to edit the acknowledgments section before sending the paper again? I have not seen a final version of that part, though I apologize if I simply missed it. Let's ensure everyone is doing well, celebrate this successful outcome, and begin planning our future targets."""

In [5]:
deepseek = """During our final discussion, I informed him about the new paper submission—the one we had been anticipating since last autumn. However, the recent updates regarding this submission were confusing, as they did not appear to include the reviewers' full feedback, or possibly the editor's comments. Despite the recent delays and reduced communication from the team, I believe they genuinely put forth their best effort for the paper and our collaboration. Consequently, we should all be grateful for the paper's acceptance and the team's persistent work, especially now that the Springer link has finally become available last week. Also, please remind me if the doctor still plans to make edits to the acknowledgments section before he resubmits it, as I haven't seen the final version of that part yet—I apologize if I missed it. Overall, let's ensure everything is finalized, and then celebrate this outcome with strong coffee and discussions about future targets."""

In [6]:
chat_gpt="""During our final discussion, I updated him on the new submission we’ve been awaiting since last autumn. Although the most recent update was confusing—because it did not include the full reviewer or editor feedback—I believe our team, despite some delays and reduced communication lately, has been fully committed to the paper and to cooperating throughout the process. We should all be grateful for their hard work and for the acceptance, and I was pleased to see the Springer link finally go live last week. Please remind me whether the doctor still plans to edit the acknowledgments section before sending the final version again, as I have not yet seen it; I apologize if I missed it. Overall, let’s make sure everything is in order, celebrate this success with strong coffee, and set our sights on future goals."""

In [7]:
pipelines = {
    "pipeline1": pipeline1,
    "pipeline2": pipeline2,
    "pipeline3": pipeline3,
}

llms = {
    "chatgpt":  chat_gpt,
    "gemini":   gemini,
    "deepseek": deepseek,
}

## Το bert_model υπολογίζει bert score και το embed_model χρησιμοποιεί έτοιμα embeddings και cosine similarity 

In [8]:
bert_model  = "bert-base-uncased"                  
embed_model = SentenceTransformer("all-MiniLM-L6-v2")  

## Φτιάχνουμε ένα λεξικό embeddings,όπου key το όνομα του αντικειμένου και value η κωδικοποίηση του σε embedding.

In [9]:
all_texts = { **pipelines, **llms }
embeddings = {
    name: embed_model.encode(text, convert_to_tensor=True)
    for name, text in all_texts.items()
}

## Υπολογισμός Σκορ

In [10]:
rows = []
for l_name, l_text in llms.items():
    for p_name, p_text in pipelines.items():
        # BERTScore (wrap in lists)
        P, R, F1 = score([p_text], [l_text],
                         lang="en",
                         model_type=bert_model,
                         verbose=False,
                         device='cuda' if torch.cuda.is_available() else 'cpu')
        bP, bR, bF1 = float(P.mean()), float(R.mean()), float(F1.mean())

        # Embedding cosine similarity
        pe = embeddings[p_name]
        le = embeddings[l_name]
        cos = (pe @ le) / (pe.norm() * le.norm())

        rows.append({
            "Pipeline":       p_name,
            "LLM":            l_name,
            "BERTScore_P":    bP,
            "BERTScore_R":    bR,
            "BERTScore_F1":   bF1,
            "Embed_Cosine":   float(cos)
        })

df = pd.DataFrame(rows)


Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

In [11]:
df

Unnamed: 0,Pipeline,LLM,BERTScore_P,BERTScore_R,BERTScore_F1,Embed_Cosine
0,pipeline1,chatgpt,0.761817,0.77337,0.76755,0.944707
1,pipeline2,chatgpt,0.759986,0.772196,0.766042,0.94567
2,pipeline3,chatgpt,0.594978,0.682979,0.635949,0.825477
3,pipeline1,gemini,0.738055,0.749095,0.743534,0.914268
4,pipeline2,gemini,0.737574,0.747127,0.74232,0.902502
5,pipeline3,gemini,0.588306,0.662066,0.62301,0.8174
6,pipeline1,deepseek,0.77719,0.765865,0.771486,0.937202
7,pipeline2,deepseek,0.770333,0.760873,0.765574,0.938816
8,pipeline3,deepseek,0.613579,0.693041,0.650894,0.829554


### Σύντομη Επεξήγηση Μετρικών

- **BERTScore Precision (P)**  
  Από τα token που παράγει το pipeline, ποιο ποσοστό βρίσκει “αντίστοιχα” token μέσα στην έξοδο του LLM. Μετράει πόσο «στοχευμένη» είναι η παραγωγή του pipeline σε σχέση με την αναφορά(ground truth).

- **BERTScore Recall (R)**  
  Από τα token που έχει το LLM, ποιο ποσοστό καλύπτεται από τα token του pipeline. Δείχνει πόσο πλήρες είναι το pipeline σε σχέση με την αναφορά(ground truth).

- **BERTScore F1**  
  Ο  μέσος του Precision και του Recall. Ισορροπεί την ποιότητα (P) με την κάλυψη (R) σε ένα ενιαίο σκορ 0-1.

- **Cosine Similarity (Embed_Cosine)**  
  Συνητινιμική ομοιότητα ανάμεσα στα ολόκληρα embeddings (διάνυσμα) δύο κειμένων. Τιμή 1 σημαίνει ταυτόσημη κατεύθυνση, 0 σημαίνει ορθογώνια (καμιά σχέση).  
