# Homework Module 3 LLM Zoomcamp

## Required Libraries

In [1]:
%pip install -U minsearch qdrant_client fastembed


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Evaluation data
For this homework, we will use the same dataset we generated in the videos.

Let's get them:

```py
import requests
import pandas as pd

url_prefix = 'https://raw.githubusercontent.com/DataTalksClub/llm-zoomcamp/main/03-evaluation/'
docs_url = url_prefix + 'search_evaluation/documents-with-ids.json'
documents = requests.get(docs_url).json()

ground_truth_url = url_prefix + 'search_evaluation/ground-truth-data.csv'
df_ground_truth = pd.read_csv(ground_truth_url)
ground_truth = df_ground_truth.to_dict(orient='records')
```

Here, `documents` contains the documents from the FAQ database with unique IDs, and ground_truth contains generated question-answer pairs.

Also, we will need the code for evaluating retrieval:

```py
from tqdm.auto import tqdm

def hit_rate(relevance_total):
    cnt = 0

    for line in relevance_total:
        if True in line:
            cnt = cnt + 1

    return cnt / len(relevance_total)

def mrr(relevance_total):
    total_score = 0.0

    for line in relevance_total:
        for rank in range(len(line)):
            if line[rank] == True:
                total_score = total_score + 1 / (rank + 1)

    return total_score / len(relevance_total)

def evaluate(ground_truth, search_function):
    relevance_total = []

    for q in tqdm(ground_truth):
        doc_id = q['document']
        results = search_function(q)
        relevance = [d['id'] == doc_id for d in results]
        relevance_total.append(relevance)

    return {
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }
```

In [2]:
import requests
import pandas as pd

url_prefix = 'https://raw.githubusercontent.com/DataTalksClub/llm-zoomcamp/main/03-evaluation/'
docs_url = url_prefix + 'search_evaluation/documents-with-ids.json'
documents = requests.get(docs_url).json()

ground_truth_url = url_prefix + 'search_evaluation/ground-truth-data.csv'
df_ground_truth = pd.read_csv(ground_truth_url)
ground_truth = df_ground_truth.to_dict(orient='records')

In [3]:
from tqdm.auto import tqdm

def hit_rate(relevance_total):
    cnt = 0

    for line in relevance_total:
        if True in line:
            cnt = cnt + 1

    return cnt / len(relevance_total)

def mrr(relevance_total):
    total_score = 0.0

    for line in relevance_total:
        for rank in range(len(line)):
            if line[rank] == True:
                total_score = total_score + 1 / (rank + 1)

    return total_score / len(relevance_total)

def evaluate(ground_truth, search_function):
    relevance_total = []

    for q in tqdm(ground_truth):
        doc_id = q['document']
        results = search_function(q)
        relevance = [d['id'] == doc_id for d in results]
        relevance_total.append(relevance)

    return {
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }

## Q1
Now let's evaluate our usual minsearch approach, indexing documents with:
```
text_fields=["question", "section", "text"],
keyword_fields=["course", "id"]
```
but tweak the parameters for search. Let's use the following boosting params:
```
boost = {'question': 1.5, 'section': 0.1}
```
What's the hitrate for this approach?
- 0.64
- 0.74
- **0.84**
- 0.94

In [4]:
df_ground_truth

Unnamed: 0,question,course,document
0,When does the course begin?,data-engineering-zoomcamp,c02e79ef
1,How can I get the course schedule?,data-engineering-zoomcamp,c02e79ef
2,What is the link for course registration?,data-engineering-zoomcamp,c02e79ef
3,How can I receive course announcements?,data-engineering-zoomcamp,c02e79ef
4,Where do I join the Slack channel?,data-engineering-zoomcamp,c02e79ef
...,...,...,...
4622,How should I destroy infrastructure created us...,mlops-zoomcamp,886d1617
4623,What is the first step to destroy AWS infrastr...,mlops-zoomcamp,886d1617
4624,Can I destroy infrastructure created with GitH...,mlops-zoomcamp,886d1617
4625,What command initializes Terraform with specif...,mlops-zoomcamp,886d1617


In [5]:
import minsearch

index = minsearch.Index(
    text_fields=["question", "section", "text"],
    keyword_fields=["course", "id"]
)

index.fit(documents)

<minsearch.minsearch.Index at 0x72b082fa5d90>

In [6]:
def minsearch_search(query, course):
    boost = {'question': 1.5, 'section': 0.1}

    results = index.search(
        query=query,
        filter_dict={'course': course},
        boost_dict=boost,
        num_results=5
    )

    return results

In [7]:
relevance_total = []

for q in tqdm(ground_truth):
    doc_id = q['document']
    results = minsearch_search(query=q['question'], course=q['course'])
    relevance = [d['id'] == doc_id for d in results]
    relevance_total.append(relevance)

  0%|          | 0/4627 [00:00<?, ?it/s]

In [8]:
hit_rate(relevance_total)

0.848714069591528

## Embeddings

The latest version of minsearch also supports vector search. We will use it:
```
from minsearch import VectorSearch
```
We will also use TF-IDF and Singular Value Decomposition to create embeddings from texts. You can refer to our "Create Your Own Search Engine" workshop if you want to know more about it.
```
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.pipeline import make_pipeline
```
Let's create embeddings for the "question" field:
```
texts = []

for doc in documents:
    t = doc['question']
    texts.append(t)

pipeline = make_pipeline(
    TfidfVectorizer(min_df=3),
    TruncatedSVD(n_components=128, random_state=1)
)
X = pipeline.fit_transform(texts)
```

In [9]:
from minsearch import VectorSearch

In [10]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.pipeline import make_pipeline

In [11]:
texts = []

for doc in documents:
    t = doc['question']
    texts.append(t)

pipeline = make_pipeline(
    TfidfVectorizer(min_df=3),
    TruncatedSVD(n_components=128, random_state=1)
)
X = pipeline.fit_transform(texts)

## Q2
Now let's index these embeddings with minsearch:
```
vindex = VectorSearch(keyword_fields={'course'})
vindex.fit(X, documents)
```
Evaluate this seach method. What's MRR for it?
- 0.25
- **0.35**
- 0.45
- 0.55

In [12]:
vindex = VectorSearch(keyword_fields={'course'})
vindex.fit(X, documents)

<minsearch.vector.VectorSearch at 0x72b081970b00>

In [13]:
# Create the query vector from the ground truth
query_texts = []

for doc in ground_truth:
    t = doc['question']
    query_texts.append(t)

questions = pipeline.transform(query_texts)

In [14]:
# Run the evaluation and return the evaluation metrics
relevance_total = []

for i, q in enumerate(tqdm(ground_truth)):
    doc_id = q['document']
    results = vindex.search(query_vector=questions[i],
                            filter_dict={'course': q['course']},
                            num_results=5
                           )
    relevance = [d['id'] == doc_id for d in results]
    relevance_total.append(relevance)

eval_q2 = {
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total)
}
eval_q2    

  0%|          | 0/4627 [00:00<?, ?it/s]

{'hit_rate': 0.48173762697212014, 'mrr': 0.3572833369353793}

## Q3

We only used question in Q2. We can use both question and answer:
```
texts = []

for doc in documents:
    t = doc['question'] + ' ' + doc['text']
    texts.append(t)
```
Using the same pipeline (`min_df=3` for TF-IDF vectorizer and `n_components=128` for SVD), evaluate the performance of this approach

What's the hitrate?

- 0.62
- 0.72
- **0.82**
- 0.92

In [15]:
texts = []

for doc in documents:
    t = doc['question'] + ' ' + doc['text']
    texts.append(t)

pipeline3 = make_pipeline(
    TfidfVectorizer(min_df=3),
    TruncatedSVD(n_components=128, random_state=1)
)
X3 = pipeline3.fit_transform(texts)

In [16]:
vindex3 = VectorSearch(keyword_fields={'course'})
vindex3.fit(X3, documents)

<minsearch.vector.VectorSearch at 0x72b081953fb0>

In [17]:
# Create the query vector from the ground truth
query_texts = []

for doc in ground_truth:
    t = doc['question']
    query_texts.append(t)

questions3 = pipeline3.transform(query_texts)

In [18]:
# Run the evaluation and return the evaluation metrics
relevance_total3 = []

for i, q in enumerate(tqdm(ground_truth)):
    doc_id = q['document']
    results = vindex3.search(query_vector=questions3[i],
                            filter_dict={'course': q['course']},
                            num_results=5
                           )
    relevance = [d['id'] == doc_id for d in results]
    relevance_total3.append(relevance)

eval_q3 = {
        'hit_rate': hit_rate(relevance_total3),
        'mrr': mrr(relevance_total3)
}
eval_q3

  0%|          | 0/4627 [00:00<?, ?it/s]

{'hit_rate': 0.8210503566025502, 'mrr': 0.6717347453353508}

## Q4

Now let's evaluate the following settings in Qdrant:

- `text = doc['question'] + ' ' + doc['text']`
- `model_handle = "jinaai/jina-embeddings-v2-small-en"`
- `limit = 5`

What's the MRR?

- 0.65
- 0.75
- **0.85**
- 0.95

In [19]:
from qdrant_client import QdrantClient, models
from fastembed import TextEmbedding
import numpy as np

In [20]:
# Initialize the client
client = QdrantClient("http://localhost:6333") #connecting to local Qdrant instanceb

In [21]:
# Create a collection
query = 'I just discovered the course. Can I join now?'
model_handle = "jinaai/jina-embeddings-v2-small-en"
collection_name = "HW3"
EMBEDDING_DIMENSIONALITY = 512

In [22]:
# Delete the collection
try:
    client.delete_collection(collection_name=collection_name)
    print(f"Collection '{collection_name}' deleted successfully.")
except Exception as e:
    print(f"Error deleting collection '{collection_name}': {e}")

Collection 'HW3' deleted successfully.


In [23]:
# Create the collection with specified vector parameters
# Run only ONCE

client.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=EMBEDDING_DIMENSIONALITY,  # Dimensionality of the vectors
        distance=models.Distance.COSINE  # Distance metric for similarity search
    )
)

True

In [24]:
documents[0]

{'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.",
 'section': 'General course-related questions',
 'question': 'Course - When will the course start?',
 'course': 'data-engineering-zoomcamp',
 'id': 'c02e79ef'}

### Upsert the documents to Qdrant client

In [25]:
points = []

for i, doc in enumerate(documents):
    point = models.PointStruct(
        id = i,
        vector = models.Document(text=doc['question'] + ' ' + doc['text'],
                                 model=model_handle),
        payload = {
            "doc_id": doc['id']
        }
    )
    points.append(point)

In [26]:
client.upsert(
    collection_name=collection_name,
    points=points
)

UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

In [49]:
def qdrant_search(query, limit=5):

    results = client.query_points(
        collection_name=collection_name,
        query=models.Document( #embed the query text locally with "jinaai/jina-embeddings-v2-small-en"
            text=query,
            model=model_handle 
        ),
        limit=limit, # top closest matches
        with_payload=True #to get metadata in the results
    )

    return results

In [50]:
# [d for d in qdrant_search(query)]

[('points',
  [ScoredPoint(id=449, version=0, score=0.8620738, payload={'doc_id': 'ee58a693'}, vector=None, shard_key=None, order_value=None),
   ScoredPoint(id=2, version=0, score=0.8514544, payload={'doc_id': '7842b56a'}, vector=None, shard_key=None, order_value=None),
   ScoredPoint(id=7, version=0, score=0.8436594, payload={'doc_id': 'a482086d'}, vector=None, shard_key=None, order_value=None),
   ScoredPoint(id=0, version=0, score=0.84082866, payload={'doc_id': 'c02e79ef'}, vector=None, shard_key=None, order_value=None),
   ScoredPoint(id=452, version=0, score=0.83894074, payload={'doc_id': '0a278fb2'}, vector=None, shard_key=None, order_value=None)])]

In [51]:
# [j.payload['doc_id'] for d in qdrant_search(query) for j in d[1]]

['ee58a693', '7842b56a', 'a482086d', 'c02e79ef', '0a278fb2']

In [47]:
def evaluate(ground_truth, search_function):
    relevance_total = []

    for q in tqdm(ground_truth):
        doc_id = q['document']
        results = search_function(q)
        relevance = [j.payload['doc_id'] == doc_id for d in results for j in d[1]]
        relevance_total.append(relevance)

    return {
        'hit_rate': hit_rate(relevance_total),
        'mrr': mrr(relevance_total),
    }

In [53]:
evaluate(ground_truth, lambda q: qdrant_search(q['question']))

  0%|          | 0/4627 [00:00<?, ?it/s]

{'hit_rate': 0.9118219148476334, 'mrr': 0.8247172393919758}

## Q5

In the second part of the module, we looked at evaluating the entire RAG approach. In particular, we looked at comparing the answer generated by our system with the actual answer from the FAQ.

One of the ways of doing it is using the cosine similarity. Let's see how to calculate it.

Cosine similarity is a dot product between two normalized vectors. In geometrical sense, it's the cosine of the angle between the vectors. Look up "cosine similarity geometry" if you want to learn more about it.

For us, it means that we need two things:

First, we normalize each of the vectors
Then, compute the dot product
So, we get this:
```
def cosine(u, v):
    u = normalize(u)
    v = normalize(v)
    return u.dot(v)
```
For normalization, we first compute the vector norm (its length), and then divide the vector by it:
```
def normalize(u):
    norm = np.sqrt(u.dot(u))
    return u / norm
```
(where np is import numpy as np)

Or we can simplify it:
```
def cosine(u, v):
    u_norm = np.sqrt(u.dot(u))
    v_norm = np.sqrt(v.dot(v))
    return u.dot(v) / (u_norm * v_norm)
```
Now let's use this function to compute the A->Q->A cosine similarity.

We will use the results from our gpt-4o-mini evaluations:
```
results_url = url_prefix + 'rag_evaluation/data/results-gpt4o-mini.csv'
df_results = pd.read_csv(results_url)
```
When creating embeddings, we will use a simple way - the same we used in the Embeddings section:
```
pipeline = make_pipeline(
    TfidfVectorizer(min_df=3),
    TruncatedSVD(n_components=128, random_state=1)
)
```
Let's fit the vectorizer on all the text data we have:
```
pipeline.fit(df_results.answer_llm + ' ' + df_results.answer_orig + ' ' + df_results.question)
```
Now use the transform methon of the pipeline to create the embeddings and calculate the cosine similarity between each pair.

What's the average cosine?

- 0.64
- 0.74
- **0.84**
- 0.94

This is how you do it:

- For each answer pair, compute
    - v_llm for the answer from the LLM
    - v_orig for the original answer
    - then compute the cosine between them
- At the end, take the average

In [54]:
results_url = url_prefix + 'rag_evaluation/data/results-gpt4o-mini.csv'
df_results = pd.read_csv(results_url)

In [59]:
df_results

Unnamed: 0,answer_llm,answer_orig,document,question,course
0,You can sign up for the course by visiting the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Where can I sign up for the course?,machine-learning-zoomcamp
1,You can sign up using the link provided in the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Can you provide a link to sign up?,machine-learning-zoomcamp
2,"Yes, there is an FAQ for the Machine Learning ...",Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Is there an FAQ for this Machine Learning course?,machine-learning-zoomcamp
3,The context does not provide any specific info...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Does this course have a GitHub repository for ...,machine-learning-zoomcamp
4,To structure your questions and answers for th...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,How can I structure my questions and answers f...,machine-learning-zoomcamp
...,...,...,...,...,...
1825,Some suggested titles for listing the Machine ...,I’ve seen LinkedIn users list DataTalksClub as...,c6a22665,What are some suggested titles for listing the...,machine-learning-zoomcamp
1826,It is best advised that you do not list the Ma...,I’ve seen LinkedIn users list DataTalksClub as...,c6a22665,Should I list the Machine Learning Zoomcamp ex...,machine-learning-zoomcamp
1827,You can incorporate your Machine Learning Zoom...,I’ve seen LinkedIn users list DataTalksClub as...,c6a22665,In which LinkedIn sections can I incorporate m...,machine-learning-zoomcamp
1828,The advice on including a project link in a CV...,I’ve seen LinkedIn users list DataTalksClub as...,c6a22665,Who gave advice on including a project link in...,machine-learning-zoomcamp


In [57]:
pipeline = make_pipeline(
    TfidfVectorizer(min_df=3),
    TruncatedSVD(n_components=128, random_state=1)
)
X = pipeline.fit_transform(df_results.answer_llm + ' ' + df_results.answer_orig + ' ' + df_results.question)

In [65]:
X_llm = pipeline.transform(df_results.answer_llm)
X_llm

array([[ 0.15549859,  0.11219644, -0.12744873, ...,  0.02800749,
        -0.00089034,  0.01145811],
       [ 0.14894279,  0.17679214, -0.16144508, ...,  0.02846803,
        -0.00530882, -0.02534412],
       [ 0.2624874 ,  0.14431318, -0.1935808 , ...,  0.03976003,
        -0.00854878, -0.02342486],
       ...,
       [ 0.1333508 ,  0.08705736, -0.09715167, ...,  0.03925748,
        -0.01131956, -0.01923416],
       [ 0.10213099,  0.01932364, -0.020402  , ...,  0.03760978,
        -0.00202926,  0.01866861],
       [ 0.07329588, -0.00118814, -0.00599569, ...,  0.03757046,
        -0.01849048,  0.03805752]], shape=(1830, 128))

In [69]:
X_orig = pipeline.transform(df_results.answer_orig)
X_orig

array([[ 0.22746773,  0.12079642, -0.17785901, ...,  0.08439231,
        -0.03839994, -0.05823001],
       [ 0.22746773,  0.12079642, -0.17785901, ...,  0.08439231,
        -0.03839994, -0.05823001],
       [ 0.22746773,  0.12079642, -0.17785901, ...,  0.08439231,
        -0.03839994, -0.05823001],
       ...,
       [ 0.18375337,  0.05955752, -0.09660538, ...,  0.05882618,
        -0.01491182, -0.00757821],
       [ 0.18375337,  0.05955752, -0.09660538, ...,  0.05882618,
        -0.01491182, -0.00757821],
       [ 0.18375337,  0.05955752, -0.09660538, ...,  0.05882618,
        -0.01491182, -0.00757821]], shape=(1830, 128))

In [70]:
def cosine(u, v):
    u_norm = np.sqrt(u.dot(u))
    v_norm = np.sqrt(v.dot(v))
    return u.dot(v) / (u_norm * v_norm)

In [79]:
all_cosine = [cosine(u,v) for u,v in zip(X_llm, X_orig)]
all_cosine[:5]

[np.float64(0.4635262016002998),
 np.float64(0.7815651064829406),
 np.float64(0.8891577173455293),
 np.float64(0.6149615816691357),
 np.float64(0.6240861551352463)]

In [80]:
np.mean(all_cosine)

np.float64(0.8415841233490402)

## Q6

And alternative way to see how two texts are similar is ROUGE.

This is a set of metrics that compares two answers based on the overlap of n-grams, word sequences, and word pairs.

It can give a more nuanced view of text similarity than just cosine similarity alone.

We don't need to implement it ourselves, there's a python package for it:
```
pip install rouge
```
(The latest version at the moment of writing is 1.0.1)

Let's compute the ROUGE score between the answers at the index 10 of our dataframe (doc_id=5170565b)
```
from rouge import Rouge
rouge_scorer = Rouge()

r = df_results.iloc[10]
scores = rouge_scorer.get_scores(r.answer_llm, r.answer_orig)[0]
scores
```
There are three scores: rouge-1, rouge-2 and rouge-l, and precision, recall and F1 score for each.

rouge-1 - the overlap of unigrams,
rouge-2 - bigrams,
rouge-l - the longest common subsequence
For the 10th document, Rouge-1 F1 score is 0.45

Let's compute it for the pairs in the entire dataframe. What's the average Rouge-1 F1?

- 0.25
- **0.35**
- 0.45
- 0.55

In [81]:
%pip install rouge

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl.metadata (4.1 kB)
Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [88]:
from rouge import Rouge
rouge_scorer = Rouge()

In [89]:
r = df_results.iloc[10]
scores = rouge_scorer.get_scores(r.answer_llm, r.answer_orig)[0]
scores

{'rouge-1': {'r': 0.45454545454545453,
  'p': 0.45454545454545453,
  'f': 0.45454544954545456},
 'rouge-2': {'r': 0.21621621621621623,
  'p': 0.21621621621621623,
  'f': 0.21621621121621637},
 'rouge-l': {'r': 0.3939393939393939,
  'p': 0.3939393939393939,
  'f': 0.393939388939394}}

In [93]:
df_results

Unnamed: 0,answer_llm,answer_orig,document,question,course
0,You can sign up for the course by visiting the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Where can I sign up for the course?,machine-learning-zoomcamp
1,You can sign up using the link provided in the...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Can you provide a link to sign up?,machine-learning-zoomcamp
2,"Yes, there is an FAQ for the Machine Learning ...",Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Is there an FAQ for this Machine Learning course?,machine-learning-zoomcamp
3,The context does not provide any specific info...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,Does this course have a GitHub repository for ...,machine-learning-zoomcamp
4,To structure your questions and answers for th...,Machine Learning Zoomcamp FAQ\nThe purpose of ...,0227b872,How can I structure my questions and answers f...,machine-learning-zoomcamp
...,...,...,...,...,...
1825,Some suggested titles for listing the Machine ...,I’ve seen LinkedIn users list DataTalksClub as...,c6a22665,What are some suggested titles for listing the...,machine-learning-zoomcamp
1826,It is best advised that you do not list the Ma...,I’ve seen LinkedIn users list DataTalksClub as...,c6a22665,Should I list the Machine Learning Zoomcamp ex...,machine-learning-zoomcamp
1827,You can incorporate your Machine Learning Zoom...,I’ve seen LinkedIn users list DataTalksClub as...,c6a22665,In which LinkedIn sections can I incorporate m...,machine-learning-zoomcamp
1828,The advice on including a project link in a CV...,I’ve seen LinkedIn users list DataTalksClub as...,c6a22665,Who gave advice on including a project link in...,machine-learning-zoomcamp


In [99]:
fscore_rouge1 = pd.DataFrame.from_dict(rouge_scorer.get_scores(df_results.answer_llm, df_results.answer_orig))['rouge-1'].apply(lambda x: x['f'])
fscore_rouge1

0       0.095238
1       0.125000
2       0.415584
3       0.216216
4       0.142076
          ...   
1825    0.336134
1826    0.453782
1827    0.442748
1828    0.191489
1829    0.147368
Name: rouge-1, Length: 1830, dtype: float64

In [100]:
fscore_rouge1.describe()

count    1830.000000
mean        0.351695
std         0.158905
min         0.000000
25%         0.238887
50%         0.356300
75%         0.460133
max         0.950000
Name: rouge-1, dtype: float64