<center><H2>Homework 4</H2></center>
<center><H4>Date: Aug 10, 2024</H4></center>

## Getting the Data

In [78]:
import pandas as pd
from sentence_transformers import SentenceTransformer
import numpy as np
from rouge import Rouge
import pprint
import statistics

In [2]:
github_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/04-monitoring/data/results-gpt4o-mini.csv'
url = f'{github_url}?raw=1'
df = pd.read_csv(url)

In [3]:
df = df.iloc[:300]
df.shape

(300, 5)

In [4]:
df.head()

Unnamed: 0,answer_llm,answer_orig,document,question,course
0,You can sign up for the course by visiting the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Where can I sign up for the course?,machine-learning-zoomcamp
1,You can sign up using the link provided in the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Can you provide a link to sign up?,machine-learning-zoomcamp
2,"Yes, there is an FAQ for the Machine Learning ...",Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Is there an FAQ for this Machine Learning course?,machine-learning-zoomcamp
3,The context does not provide any specific info...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Does this course have a GitHub repository for ...,machine-learning-zoomcamp
4,To structure your questions and answers for th...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,How can I structure my questions and answers f...,machine-learning-zoomcamp


## Q1. Getting the embeddings model

In [5]:
from sentence_transformers import SentenceTransformer

model_name = 'multi-qa-mpnet-base-dot-v1'
model = SentenceTransformer(model_name)

In [6]:
!pip list | grep transformer

sentence-transformers     3.0.1
transformers              4.42.4


### Embedding for the first LLM answer 

In [7]:
answer_llm = df.iloc[0].answer_llm

In [8]:
 answer_llm_embedding = model.encode(answer_llm)

In [10]:
answer_llm_embedding[0]

-0.42244682

#### Example dot product of two embeddings

#### Iterative approach

In [27]:
def iterative_approach(df, model):
    evaluations = []
    for index, row in df.iterrows():
        embedding_llm = model.encode(row['answer_llm'])
        embedding_orig = model.encode(row['answer_orig'])
        score = np.dot(embedding_llm, embedding_orig)
        evaluations.append(score)
        
    percentile_75 = np.percentile(dot_products, 75) 
    print(f"75th percentile of the scores: {round(percentile_75,2)}")
    
    return percentile_75

# Measure runtime for the iterative approach
%timeit iterative_approach(df, model)

75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
10 s ± 275 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


#### Vectorized approach

In [21]:
def vectorized_approach(df, model):
    embeddings_llm = model.encode(df['answer_llm'].tolist(), convert_to_tensor=True)
    embeddings_orig = model.encode(df['answer_orig'].tolist(), convert_to_tensor=True)
    dot_products = (embeddings_llm * embeddings_orig).sum(dim=1).cpu().numpy()

    percentile_75 = np.percentile(dot_products, 75) 
    print(f"75th percentile of the scores: {round(percentile_75,2)}")
    
    return percentile_75

# Measure runtime for the vectorized approach
%timeit vectorized_approach(df, model)

75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
75th percentile of the scores: 31.67
6.52 s ± 18.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Q3. Computing the cosine

#### Vector NORM calculation

norm = np.sqrt((v * v).sum())

v_norm = v / norm

In [29]:
v = np.array([1,2,3])
norm = np.sqrt((v * v).sum())
norm

3.7416573867739413

In [34]:
v_norm = v / norm
v_norm

array([0.26726124, 0.53452248, 0.80178373])

In [35]:
def get_norm_vector(v):
    norm = np.sqrt((v * v).sum())
    v_norm = v / norm
    return v_norm

get_norm_vector(v)

array([0.26726124, 0.53452248, 0.80178373])

In [49]:
def iterative_approach(df, model):
    evaluations = []
    for index, row in df.iterrows():
        embedding_llm = get_norm_vector(model.encode(row['answer_llm']))
        embedding_orig = get_norm_vector(model.encode(row['answer_orig']))
        score = np.dot(embedding_llm, embedding_orig)
        evaluations.append(score)
        
    percentile_75 = np.percentile(evaluations, 75) 
    print(f"75th percentile of the scores: {round(percentile_75,3)}")
    
    return round(percentile_75,3)

# Measure runtime for the iterative approach
iterative_approach(df, model)

75th percentile of the scores: 0.836


0.836

### Q4. Rouge

In [50]:
!pip install rouge

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl.metadata (4.1 kB)
Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1


In [65]:
rouge_scorer = Rouge()

df_i10 = df.iloc[10]

scores = rouge_scorer.get_scores(df_i10['answer_llm'], df_i10['answer_orig'])[0]
pprint.pprint(scores)

{'rouge-1': {'f': 0.45454544954545456,
             'p': 0.45454545454545453,
             'r': 0.45454545454545453},
 'rouge-2': {'f': 0.21621621121621637,
             'p': 0.21621621621621623,
             'r': 0.21621621621621623},
 'rouge-l': {'f': 0.393939388939394,
             'p': 0.3939393939393939,
             'r': 0.3939393939393939}}


In [68]:
round(scores['rouge-1']['f'],3)

0.455

### Q5. Average rouge score

In [75]:
r1 = round(scores['rouge-1']['f'],3)
r2 = round(scores['rouge-2']['f'],3)
rl = round(scores['rouge-l']['f'],3)


print(f'the average F-score between rouge-1, rouge-2 and rouge-l: { (r1+r2+rl)/3 }')

the average F-score between rouge-1, rouge-2 and rouge-l: 0.355


### Q6. Average rouge score for all the data points

In [93]:
def avg_rough_score(df_param):
    rouge_2_f_scores = []
    for index, row in df_param.iterrows():
        scores = rouge_scorer.get_scores(row['answer_llm'], row['answer_orig'])[0]

        rouge_2_f = scores['rouge-2']['f']
        rouge_2_f_scores.append(rouge_2_f)
        
    average = statistics.mean(rouge_2_f_scores)
    print(f"Average f-score in rought-2 for the dataset: {round(average,3)}")
    
    return round(average,3)

# Measure rouge score 
avg_rough_score(df) 

Average f-score in rought-2 for the dataset: 0.207


0.207