# Evaluating RAG

> _Note_: Point to note here is I don't have a question/answer comparison. Based on the description given, I tend to generate category. So, for similar descriptions in Original vs Ground Truth Dataset - I generate `Category` which should be same. While evaluating the RAG, unless the LLM generates an incorrect response, the Cosine similarity should always be 1 or 0.

In [41]:
import pandas as pd
from app import rag_vector_ollama
import json
from src.constants import *
from sentence_transformers import SentenceTransformer

In [4]:
ground_truth_data = pd.read_csv('./data/cloud_service_provider_ground_truth_dataset.csv')
original_data = pd.read_csv('./data/Cloud_Provider_Services_Hashed_random.csv')

In [5]:
original_data.shape, ground_truth_data.shape

((25, 5), (125, 5))

In [11]:
original_data_dict = original_data.to_dict(orient='records')
original_doc_index = {doc['id']: doc for doc in original_data_dict}
original_doc_index['dbfef36b']

{'cloud_provider': 'azure',
 'category': 'Analytics',
 'service': 'Application Insights',
 'description': 'Offers application performance management and monitoring.',
 'id': 'dbfef36b'}

### Taking a sample of 5 records for evaluation

In [9]:
ground_truth_data = ground_truth_data.sample(5)
ground_truth_data.head(10)

Unnamed: 0,description,category,service,cloud_provider,id
76,Data transport solutions at petabyte scale off...,Storage,AWS Snowball,aws,7dcc7a26
97,Cloud integration services provided.,Service,paas_svc_plan_cloud_integration,ibmcloud,d659ca4a
46,Scalable storage for objects.,Storage,object_storage,ibmcloud,991a00e1
107,Service offering virtual firewall with threat ...,Security,VM-Series Virtual NextGen Firewall w/ Threat P...,aws,d31d2df1
103,Workspaces for operational insights provided.,Analytics,microsoft.operationalinsights/workspaces,azure,d7498c6e


## 1. LLM as a Judge (with gemma2:2b)
 - We generate the response for the ground_truth_data and compare the LLM response against the original dataset

In [17]:
# Replace occurrences of "```json" and "```" with an empty string
def remove_backticks(text):
    return text.replace("```json", "").replace("```", "").strip()

In [36]:
answers = []
for i, rec in ground_truth_data.iterrows():
    if i in answers:
        continue
    answer_llm = json.loads(remove_backticks(rag_vector_ollama(rec['description'])))['category']
    answer_orig = original_doc_index[rec['id']]['category']
    answers.append({
        'answer_llm': answer_llm,
        'answer_orig': answer_orig,
        'is_correct': answer_llm == answer_orig,
        'id': rec['id']
    })
    

<src.lance_db.lanceDB object at 0x324a45f70>
Getting context from the table
table name: vector_db
first 5 data:                                          description  \
0  An interactive query service that makes it eas...   
1  A managed service in the AWS Cloud that makes ...   
2  A web service that makes it easy to process la...   
3  A managed service that makes it easy to deploy...   
4  A cloud big data platform for processing vast ...   

                                  vector_description cloud_provider  \
0  [-0.036750447, -0.038994916, -0.09674644, 0.01...            aws   
1  [-0.032546062, 0.014457229, -0.067944124, 0.03...            aws   
2  [-0.10146084, 0.0067654448, -0.056411035, 0.00...            aws   
3  [0.03613138, -0.0007996917, -0.017581925, 0.06...            aws   
4  [0.032586943, 0.045619603, 0.0398248, -0.04592...            aws   

                        service   category        id  
0                 Amazon Athena  Analytics  95825a76  
1            A



**************************************

prompt: You're a cloud asset category finder. Answer the DESCRIPTION based on the CONTEXT.
Use only the facts from the CONTEXT when answering the DESCRIPTION. Generate a short answer in JSON format with "category" and "service" as fields.
Your response should **only** include the JSON object itself. Do not include any code blocks, backticks, language tags, or additional formatting like "```json".

DESCRIPTION: Data transport solutions at petabyte scale offered.

CONTEXT: 
answer: Storage
service: AWS Snowball
description: Offers petabyte-scale data transport solutions.
cloud_provider: aws

answer: Analytics
service: Data Lake Store
description: Offers scalable data storage.
cloud_provider: azure

answer: Storage
service: AWS Snowmobile
description: Enables exabyte-scale data transfer to AWS.
cloud_provider: aws

answer: Analytics
service: BigQuery
description: Data warehouse for large-scale data analytics.
cloud_provider: gcp

answer: Service
ser



**************************************

prompt: You're a cloud asset category finder. Answer the DESCRIPTION based on the CONTEXT.
Use only the facts from the CONTEXT when answering the DESCRIPTION. Generate a short answer in JSON format with "category" and "service" as fields.
Your response should **only** include the JSON object itself. Do not include any code blocks, backticks, language tags, or additional formatting like "```json".

DESCRIPTION: Cloud integration services provided.

CONTEXT: 
answer: Service
service: paas_svc_plan_cloud_integration
description: Cloud integration services
cloud_provider: ibmcloud

answer: Analytics
service: Azure Data Factory v2
description: Provides cloud-based data integration service.
cloud_provider: azure

answer: Compute
service: Cloud Services
description: Provides cloud-based services.
cloud_provider: azure

answer: Service
service: App Connect
description: Application integration services
cloud_provider: ibmcloud

answer: Storage
service: Wi



**************************************

prompt: You're a cloud asset category finder. Answer the DESCRIPTION based on the CONTEXT.
Use only the facts from the CONTEXT when answering the DESCRIPTION. Generate a short answer in JSON format with "category" and "service" as fields.
Your response should **only** include the JSON object itself. Do not include any code blocks, backticks, language tags, or additional formatting like "```json".

DESCRIPTION: Scalable storage for objects.

CONTEXT: 
answer: Storage
service: object_storage
description: Scalable object storage
cloud_provider: ibmcloud

answer: Storage
service: Amazon Simple Storage Service
description: Provides scalable object storage.
cloud_provider: aws

answer: Service
service: paas_svc_plan_feat_cloud_object_storage
description: Cloud Object Storage service for scalable storage.
cloud_provider: ibmcloud

answer: Service
service: paas_svc_plan_object_storage
description: Object Storage is a scalable storage service for storing 



**************************************

prompt: You're a cloud asset category finder. Answer the DESCRIPTION based on the CONTEXT.
Use only the facts from the CONTEXT when answering the DESCRIPTION. Generate a short answer in JSON format with "category" and "service" as fields.
Your response should **only** include the JSON object itself. Do not include any code blocks, backticks, language tags, or additional formatting like "```json".

DESCRIPTION: Service offering virtual firewall with threat prevention.

CONTEXT: 
answer: Security
service: VM-Series Virtual NextGen Firewall w/ Threat Prevention - Bundle1 AWS
description: Virtual firewall with threat prevention
cloud_provider: aws

answer: Network
service: firewall
description: Firewall service
cloud_provider: ibmcloud

answer: Security
service: VM-Series Next-Generation Firewall Bundle 1
description: Virtual firewall with security features
cloud_provider: aws

answer: Network
service: GCP Firewall
description: Service for managing f



**************************************

prompt: You're a cloud asset category finder. Answer the DESCRIPTION based on the CONTEXT.
Use only the facts from the CONTEXT when answering the DESCRIPTION. Generate a short answer in JSON format with "category" and "service" as fields.
Your response should **only** include the JSON object itself. Do not include any code blocks, backticks, language tags, or additional formatting like "```json".

DESCRIPTION: Workspaces for operational insights provided.

CONTEXT: 
answer: Analytics
service: microsoft.operationalinsights/workspaces
description: Offers operational insights workspaces.
cloud_provider: azure

answer: Service
service: paas_svc_plan_presence_insights
description: Presence Insights offers tools for analyzing and optimizing physical space usage.
cloud_provider: ibmcloud

answer: Analytics
service: microsoft.insights/components
description: Provides insights components.
cloud_provider: azure

answer: Service
service: paas_svc_plan_insig

In [37]:
answers

[{'answer_llm': 'Storage',
  'answer_orig': 'Storage',
  'is_correct': True,
  'id': '7dcc7a26'},
 {'answer_llm': 'Service',
  'answer_orig': 'Service',
  'is_correct': True,
  'id': 'd659ca4a'},
 {'answer_llm': 'storage',
  'answer_orig': 'Storage',
  'is_correct': False,
  'id': '991a00e1'},
 {'answer_llm': 'Security',
  'answer_orig': 'Security',
  'is_correct': True,
  'id': 'd31d2df1'},
 {'answer_llm': 'Analytics',
  'answer_orig': 'Analytics',
  'is_correct': True,
  'id': 'd7498c6e'}]

### Dump the results into `data/rag_evaluation_cosine_gemma2b.csv`

In [63]:
df = pd.DataFrame(answers)
df.to_csv('./data/rag_evaluation_cosine_gemma2b.csv')

## Compute Cosine Similarity

In [43]:
model_name = EMBEDDINGS_DICT.get('model_name')
device = EMBEDDINGS_DICT.get('device')
model = SentenceTransformer(model_name, device=device)



In [44]:
def compute_cosine_similarity(rec):
    answer_llm = model.encode(rec['answer_llm'])
    answer_orig = model.encode(rec['answer_orig'])
    return answer_llm.dot(answer_orig)

In [48]:
from tqdm.auto import tqdm
df = pd.read_csv('./data/rag_evaluation_cosine.csv')
data = df.to_dict(orient='records')
similarity = []
for rec in tqdm(data):
    sim = compute_cosine_similarity(rec)
    similarity.append(sim)

100%|██████████| 5/5 [00:00<00:00, 47.14it/s]


In [49]:
df['cosine'] = similarity

In [50]:
df.head()

Unnamed: 0.1,Unnamed: 0,answer_llm,answer_orig,is_correct,id,cosine
0,0,Storage,Storage,True,7dcc7a26,1.0
1,1,Service,Service,True,d659ca4a,1.0
2,2,storage,Storage,False,991a00e1,1.0
3,3,Security,Security,True,d31d2df1,1.0
4,4,Analytics,Analytics,True,d7498c6e,1.0


In [51]:
df.to_csv('./data/rag_evaluation_cosine_with_dot.csv')

In [52]:
df['cosine'].describe()

count    5.000000e+00
mean     9.999999e-01
std      1.604904e-07
min      9.999998e-01
25%      9.999998e-01
50%      1.000000e+00
75%      1.000000e+00
max      1.000000e+00
Name: cosine, dtype: float64

## 2. LLM as a Judge (qwen2b: Qwen2 is a new series of large language models from Alibaba group)

> Note: To run qwen2, follow the below steps

1. Make sure you must pull `qwen2:0.5b` model before running this. 
    - exec into Ollama docker container by running `docker exec -it <ollama container id> bash`
    - Run `ollama pull qwen2:0.5b`. More about [qwen2](https://ollama.com/library/qwen2:0.5b)
2. app.py -> ollama_llm() -> Change the model to `qwen2:0.5b`

In [62]:
answers_qwen2 = []
for i, rec in ground_truth_data.iterrows():
    if i in answers_qwen2:
        continue
    answer_llm = json.loads(remove_backticks(rag_vector_ollama(rec['description'])))['category']
    answer_orig = original_doc_index[rec['id']]['category']
    answers_qwen2.append({
        'answer_llm': answer_llm,
        'answer_orig': answer_orig,
        'is_correct': answer_llm == answer_orig,
        'id': rec['id']
    })

<src.lance_db.lanceDB object at 0x3f731c640>
Getting context from the table
table name: vector_db
first 5 data:                                          description  \
0  An interactive query service that makes it eas...   
1  A managed service in the AWS Cloud that makes ...   
2  A web service that makes it easy to process la...   
3  A managed service that makes it easy to deploy...   
4  A cloud big data platform for processing vast ...   

                                  vector_description cloud_provider  \
0  [-0.036750447, -0.038994916, -0.09674644, 0.01...            aws   
1  [-0.032546062, 0.014457229, -0.067944124, 0.03...            aws   
2  [-0.10146084, 0.0067654448, -0.056411035, 0.00...            aws   
3  [0.03613138, -0.0007996917, -0.017581925, 0.06...            aws   
4  [0.032586943, 0.045619603, 0.0398248, -0.04592...            aws   

                        service   category        id  
0                 Amazon Athena  Analytics  95825a76  
1            A



**************************************

prompt: You're a cloud asset category finder. Answer the DESCRIPTION based on the CONTEXT.
Use only the facts from the CONTEXT when answering the DESCRIPTION. Generate a short answer in JSON format with "category" and "service" as fields.
Your response should **only** include the JSON object itself. Do not include any code blocks, backticks, language tags, or additional formatting like "```json".

DESCRIPTION: Data transport solutions at petabyte scale offered.

CONTEXT: 
answer: Storage
service: AWS Snowball
description: Offers petabyte-scale data transport solutions.
cloud_provider: aws

answer: Analytics
service: Data Lake Store
description: Offers scalable data storage.
cloud_provider: azure

answer: Storage
service: AWS Snowmobile
description: Enables exabyte-scale data transfer to AWS.
cloud_provider: aws

answer: Analytics
service: BigQuery
description: Data warehouse for large-scale data analytics.
cloud_provider: gcp

answer: Service
ser



**************************************

prompt: You're a cloud asset category finder. Answer the DESCRIPTION based on the CONTEXT.
Use only the facts from the CONTEXT when answering the DESCRIPTION. Generate a short answer in JSON format with "category" and "service" as fields.
Your response should **only** include the JSON object itself. Do not include any code blocks, backticks, language tags, or additional formatting like "```json".

DESCRIPTION: Cloud integration services provided.

CONTEXT: 
answer: Service
service: paas_svc_plan_cloud_integration
description: Cloud integration services
cloud_provider: ibmcloud

answer: Analytics
service: Azure Data Factory v2
description: Provides cloud-based data integration service.
cloud_provider: azure

answer: Compute
service: Cloud Services
description: Provides cloud-based services.
cloud_provider: azure

answer: Service
service: App Connect
description: Application integration services
cloud_provider: ibmcloud

answer: Storage
service: Wi



**************************************

prompt: You're a cloud asset category finder. Answer the DESCRIPTION based on the CONTEXT.
Use only the facts from the CONTEXT when answering the DESCRIPTION. Generate a short answer in JSON format with "category" and "service" as fields.
Your response should **only** include the JSON object itself. Do not include any code blocks, backticks, language tags, or additional formatting like "```json".

DESCRIPTION: Scalable storage for objects.

CONTEXT: 
answer: Storage
service: object_storage
description: Scalable object storage
cloud_provider: ibmcloud

answer: Storage
service: Amazon Simple Storage Service
description: Provides scalable object storage.
cloud_provider: aws

answer: Service
service: paas_svc_plan_feat_cloud_object_storage
description: Cloud Object Storage service for scalable storage.
cloud_provider: ibmcloud

answer: Service
service: paas_svc_plan_object_storage
description: Object Storage is a scalable storage service for storing 



**************************************

prompt: You're a cloud asset category finder. Answer the DESCRIPTION based on the CONTEXT.
Use only the facts from the CONTEXT when answering the DESCRIPTION. Generate a short answer in JSON format with "category" and "service" as fields.
Your response should **only** include the JSON object itself. Do not include any code blocks, backticks, language tags, or additional formatting like "```json".

DESCRIPTION: Service offering virtual firewall with threat prevention.

CONTEXT: 
answer: Security
service: VM-Series Virtual NextGen Firewall w/ Threat Prevention - Bundle1 AWS
description: Virtual firewall with threat prevention
cloud_provider: aws

answer: Network
service: firewall
description: Firewall service
cloud_provider: ibmcloud

answer: Security
service: VM-Series Next-Generation Firewall Bundle 1
description: Virtual firewall with security features
cloud_provider: aws

answer: Network
service: GCP Firewall
description: Service for managing f



**************************************

prompt: You're a cloud asset category finder. Answer the DESCRIPTION based on the CONTEXT.
Use only the facts from the CONTEXT when answering the DESCRIPTION. Generate a short answer in JSON format with "category" and "service" as fields.
Your response should **only** include the JSON object itself. Do not include any code blocks, backticks, language tags, or additional formatting like "```json".

DESCRIPTION: Workspaces for operational insights provided.

CONTEXT: 
answer: Analytics
service: microsoft.operationalinsights/workspaces
description: Offers operational insights workspaces.
cloud_provider: azure

answer: Service
service: paas_svc_plan_presence_insights
description: Presence Insights offers tools for analyzing and optimizing physical space usage.
cloud_provider: ibmcloud

answer: Analytics
service: microsoft.insights/components
description: Provides insights components.
cloud_provider: azure

answer: Service
service: paas_svc_plan_insig

In [64]:
df = pd.DataFrame(answers_qwen2)
df.to_csv('./data/rag_evaluation_cosine_qwen2.csv')

In [65]:
from tqdm.auto import tqdm
df = pd.read_csv('./data/rag_evaluation_cosine_qwen2.csv')
data = df.to_dict(orient='records')
similarity = []
for rec in tqdm(data):
    sim = compute_cosine_similarity(rec)
    similarity.append(sim)

100%|██████████| 5/5 [00:00<00:00, 37.96it/s]


In [66]:
df['cosine'] = similarity

In [67]:
df.head()

Unnamed: 0.1,Unnamed: 0,answer_llm,answer_orig,is_correct,id,cosine
0,0,Storage,Storage,True,7dcc7a26,1.0
1,1,Service,Service,True,d659ca4a,1.0
2,2,Storage,Storage,True,991a00e1,1.0
3,3,Security,Security,True,d31d2df1,1.0
4,4,Analytics,Analytics,True,d7498c6e,1.0
