## Homework: Evaluation and Monitoring

In [1]:
### Getting the data

In [4]:
import pandas as pd

url = "https://raw.githubusercontent.com/DataTalksClub/llm-zoomcamp/main/04-monitoring/data/results-gpt4o-mini.csv"
df = pd.read_csv(url)


In [6]:
df = df.iloc[:300]

In [7]:
df

Unnamed: 0,answer_llm,answer_orig,document,question,course
0,You can sign up for the course by visiting the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Where can I sign up for the course?,machine-learning-zoomcamp
1,You can sign up using the link provided in the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Can you provide a link to sign up?,machine-learning-zoomcamp
2,"Yes, there is an FAQ for the Machine Learning ...",Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Is there an FAQ for this Machine Learning course?,machine-learning-zoomcamp
3,The context does not provide any specific info...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Does this course have a GitHub repository for ...,machine-learning-zoomcamp
4,To structure your questions and answers for th...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,How can I structure my questions and answers f...,machine-learning-zoomcamp
...,...,...,...,...,...
295,An alternative way to load the data using the ...,Above users showed how to load the dataset dir...,8d209d6d,What is an alternative way to load the data us...,machine-learning-zoomcamp
296,You can directly download the dataset from Git...,Above users showed how to load the dataset dir...,8d209d6d,How can I directly download the dataset from G...,machine-learning-zoomcamp
297,You can fetch data for homework using the `req...,Above users showed how to load the dataset dir...,8d209d6d,Could you share a method to fetch data for hom...,machine-learning-zoomcamp
298,If the status code is 200 when downloading dat...,Above users showed how to load the dataset dir...,8d209d6d,What should I do if the status code is 200 whe...,machine-learning-zoomcamp


### Q1. Getting the embeddings model

In [10]:
from sentence_transformers import SentenceTransformer

# Note: Normalized Embeddings - false
model_name = "multi-qa-mpnet-base-dot-v1"
embedding_model = SentenceTransformer(model_name)

In [15]:
answer_llm = df.iloc[0].answer_llm
print(f"Answer to Q1: {embedding_model.encode(answer_llm)[0]}")

Answer to Q1: -0.42244675755500793


### Q2. Computing the dot product

In [25]:
evaluations = []

In [35]:
from tqdm.auto import tqdm
from pprint import pprint

for idx, row in tqdm(df.iterrows()):
    embeddings_answer_llm = embedding_model.encode(row["answer_llm"])
    embeddings_answer_orig = embedding_model.encode(row["answer_orig"])
    cosine = embeddings_answer_llm.dot(embeddings_answer_orig)

    df.at[idx, "cosine"] = cosine


0it [00:00, ?it/s]

You can sign up for the course by visiting the course page at [http://mlzoomcamp.com/](http://mlzoomcamp.com/).
17.515997





In [37]:
df["cosine"].describe()

count    300.000000
mean      27.495996
std        6.384744
min        4.547925
25%       24.307844
50%       28.336864
75%       31.674311
max       39.476021
Name: cosine, dtype: float64

In [38]:
print("Answer to Q2 is '31.674311'")

Answer to Q2 is '31.674311'


In [36]:
df

Unnamed: 0,answer_llm,answer_orig,document,question,course,cosine
0,You can sign up for the course by visiting the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Where can I sign up for the course?,machine-learning-zoomcamp,17.515997
1,You can sign up using the link provided in the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Can you provide a link to sign up?,machine-learning-zoomcamp,13.418410
2,"Yes, there is an FAQ for the Machine Learning ...",Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Is there an FAQ for this Machine Learning course?,machine-learning-zoomcamp,25.313251
3,The context does not provide any specific info...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Does this course have a GitHub repository for ...,machine-learning-zoomcamp,12.147420
4,To structure your questions and answers for th...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,How can I structure my questions and answers f...,machine-learning-zoomcamp,18.747726
...,...,...,...,...,...,...
295,An alternative way to load the data using the ...,Above users showed how to load the dataset dir...,8d209d6d,What is an alternative way to load the data us...,machine-learning-zoomcamp,34.001770
296,You can directly download the dataset from Git...,Above users showed how to load the dataset dir...,8d209d6d,How can I directly download the dataset from G...,machine-learning-zoomcamp,33.690865
297,You can fetch data for homework using the `req...,Above users showed how to load the dataset dir...,8d209d6d,Could you share a method to fetch data for hom...,machine-learning-zoomcamp,34.491531
298,If the status code is 200 when downloading dat...,Above users showed how to load the dataset dir...,8d209d6d,What should I do if the status code is 200 whe...,machine-learning-zoomcamp,27.538353


### Q3. Computing the cosine
From Q2, we can see that the results are not within the [0, 1] range. It's because the vectors coming from this model are not normalized.

So we need to normalize them.

To do it, we:
- Compute the norm of a vector
- Divide each element by this norm

So, for vector `v`, it'll be `v / ||v||`

In [46]:
import numpy as np

def get_normalized_value(vector):
    norm = np.sqrt((vector * vector).sum())
    vector_norm = vector / norm
    
    return vector_norm  

Let's test it first:

In [49]:
embeddings_answer_llm = embedding_model.encode(df.iloc[0]["answer_llm"])
embeddings_answer_orig = embedding_model.encode(df.iloc[0]["answer_orig"])

normilized_answer_llm = get_normalized_value(embeddings_answer_llm)
normilized_answer_orig = get_normalized_value(embeddings_answer_orig)

cosine_norm = normilized_answer_llm.dot(normilized_answer_orig)

In [50]:
cosine_norm

0.5067543

In [51]:
for idx, row in tqdm(df.iterrows()):
    embeddings_answer_llm = embedding_model.encode(row["answer_llm"])
    embeddings_answer_orig = embedding_model.encode(row["answer_orig"])

    normilized_answer_llm = get_normalized_value(embeddings_answer_llm)    
    normilized_answer_orig = get_normalized_value(embeddings_answer_orig)

    cosine_norm = normilized_answer_llm.dot(normilized_answer_orig)
    df.at[idx, "cosine_norm"] = cosine_norm

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.at[idx, "cosine_norm"] = cosine_norm
300it [01:05,  4.59it/s]


In [52]:
df["cosine_norm"].describe()

count    300.000000
mean       0.728392
std        0.157755
min        0.125357
25%        0.651273
50%        0.763761
75%        0.836235
max        0.958796
Name: cosine_norm, dtype: float64

In [53]:
print("Answer to Q3 is '0.836235'")

Answer to Q3 is '0.836235'


### Q4. Rouge
Now we will explore an alternative metric - the ROUGE score.

This is a set of metrics that compares two answers based on the overlap of n-grams, word sequences, and word pairs.

It can give a more nuanced view of text similarity than just cosine similarity alone.

In [None]:
!pip install rouge

In [63]:
row_to_test_rouge = df.iloc[10]
row_to_test_rouge

answer_llm     Yes, all sessions are recorded, so if you miss...
answer_orig    Everything is recorded, so you won’t miss anyt...
document                                                5170565b
question                    Are sessions recorded if I miss one?
course                                 machine-learning-zoomcamp
cosine                                                 32.344711
cosine_norm                                             0.777956
Name: 10, dtype: object

In [58]:
from rouge import Rouge
rouge_scorer = Rouge()

scores = rouge_scorer.get_scores(row_to_test_rouge["answer_llm"], row_to_test_rouge["answer_orig"])

In [59]:
scores
# r (recall), p (precision), f (f1-score)
# ROUGE-L F1-score = 2 * (precision * recall) / (precision + recall) = 0.3939...

[{'rouge-1': {'r': 0.45454545454545453,
   'p': 0.45454545454545453,
   'f': 0.45454544954545456},
  'rouge-2': {'r': 0.21621621621621623,
   'p': 0.21621621621621623,
   'f': 0.21621621121621637},
  'rouge-l': {'r': 0.3939393939393939,
   'p': 0.3939393939393939,
   'f': 0.393939388939394}}]

In [60]:
# Note that numbers below are not the same if we swap hypothesis/reference
rouge_scorer.get_scores(row_to_test_rouge["answer_orig"], row_to_test_rouge["answer_llm"])

[{'rouge-1': {'r': 0.45454545454545453,
   'p': 0.45454545454545453,
   'f': 0.45454544954545456},
  'rouge-2': {'r': 0.21621621621621623,
   'p': 0.21621621621621623,
   'f': 0.21621621121621637},
  'rouge-l': {'r': 0.42424242424242425,
   'p': 0.42424242424242425,
   'f': 0.42424241924242434}}]

In [62]:
print(f"Answer to Q4 is {scores[0]['rouge-1']['f']}")

Answer to Q4 is 0.45454544954545456


### Q5. Average rouge score
Let's compute the average between rouge-1, rouge-2 and rouge-l for the same record from Q4

In [66]:
average = (scores[0]['rouge-1']['f'] + scores[0]['rouge-2']['f'] + scores[0]['rouge-l']['f']) / 3

print(f"Answer to Q5 is {average}")

Answer to Q5 is 0.35490034990035496


### Q6. Average rouge score for all the data points
Now let's compute the score for all the records

In [72]:
for idx, row in tqdm(df.iterrows()):
    scores = rouge_scorer.get_scores(row["answer_orig"], row["answer_llm"])
    rouge_l_f1 = scores[0]["rouge-l"]["f"]
    df.at[idx, "rouge_l_f1"] = rouge_l_f1    

300it [00:01, 185.11it/s]


In [73]:
df

Unnamed: 0,answer_llm,answer_orig,document,question,course,cosine,cosine_norm,rouge_l_f1
0,You can sign up for the course by visiting the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Where can I sign up for the course?,machine-learning-zoomcamp,17.515997,0.506754,0.095238
1,You can sign up using the link provided in the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Can you provide a link to sign up?,machine-learning-zoomcamp,13.418410,0.388549,0.093750
2,"Yes, there is an FAQ for the Machine Learning ...",Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Is there an FAQ for this Machine Learning course?,machine-learning-zoomcamp,25.313251,0.718599,0.363636
3,The context does not provide any specific info...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Does this course have a GitHub repository for ...,machine-learning-zoomcamp,12.147420,0.337266,0.135135
4,To structure your questions and answers for th...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,How can I structure my questions and answers f...,machine-learning-zoomcamp,18.747726,0.521792,0.120219
...,...,...,...,...,...,...,...,...
295,An alternative way to load the data using the ...,Above users showed how to load the dataset dir...,8d209d6d,What is an alternative way to load the data us...,machine-learning-zoomcamp,34.001770,0.914175,0.618182
296,You can directly download the dataset from Git...,Above users showed how to load the dataset dir...,8d209d6d,How can I directly download the dataset from G...,machine-learning-zoomcamp,33.690865,0.902190,0.573770
297,You can fetch data for homework using the `req...,Above users showed how to load the dataset dir...,8d209d6d,Could you share a method to fetch data for hom...,machine-learning-zoomcamp,34.491531,0.904734,0.637168
298,If the status code is 200 when downloading dat...,Above users showed how to load the dataset dir...,8d209d6d,What should I do if the status code is 200 whe...,machine-learning-zoomcamp,27.538353,0.726781,0.304762


In [78]:
number_of_rows = len(df)

sum_of_rouge_l_f1 = 0
sum_of_rouge_l_f1 = 0
sum_of_rouge_l_f1 = 0

for idx, row in tqdm(df.iterrows()):
    scores = rouge_scorer.get_scores(row["answer_orig"], row["answer_llm"])
    rouge_l_f1 = scores[0]["rouge-l"]["f"]
    rouge_l_f1 = scores[0]["rouge-l"]["f"]
    rouge_l_f1 = scores[0]["rouge-l"]["f"]
    df.at[idx, "rouge_l_f1"] = rouge_l_f1    
    

average = sum_of_rouge_l_f1 / number_of_rows

300it [00:00, 10260.46it/s]


In [79]:
print(f"Answer to Q6: {average}")

Answer to Q6: 0.3556599846297521
