# Retrieval Augmented Generation Systems, LangChain & ChromaDB

This notebook walks through building a question/answer system that retrieves information to formulate responses, effectively grounding the LLM with specific information. A pre-trained LLM, or likely even a fine-tuned LLM will not be sufficient (in and of itself) when you want a system that understands specific, possibly private data or information that was not in its training dataset.


Select the kernel `langchain_components_kernel` in the top right before going forward in the notebook.

### Setup

In [41]:
# !cd ~/asl-ml-immersion && make langchain_components_kernel 

In [None]:
import pandas as pd
import scipy
from langchain.chains import ConversationalRetrievalChain, RetrievalQA
from langchain.document_loaders import WikipediaLoader
from langchain_google_vertexai import VertexAIEmbeddings, VertexAI
from langchain.memory import ConversationBufferMemory
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from vertexai.language_models import TextEmbeddingModel, TextGenerationModel
from vertexai.generative_models import (
    GenerationConfig,
    GenerativeModel,
    Image,
    Part,
)
from vertexai.preview import rag
from vertexai.preview.generative_models import GenerativeModel, Tool

### Build a simple retrieval augmented generation system

In [308]:
def split_csv_to_train_test(csv_file_path, train_df_name, test_df_name, split_ratio=0.92):
    full_df = pd.read_csv(csv_file_path)    
    full_df.loc[:, 'text'] = full_df['text'].astype(str)
    full_df.loc[:, 'in_response_to_tweet_text'] = full_df['in_response_to_tweet_text'].astype(str)
    # Calculate the number of rows for the training set based on the split ratio
    train_size = int(split_ratio * len(full_df))
    train_df = full_df.iloc[:train_size].copy()
    test_df = full_df.iloc[train_size:].copy()
    return train_df, test_df

train_df, test_df = split_csv_to_train_test("twcs/by_account2/ChipotleTweets_merged.csv", "train_df", "test_df")

In [309]:
# Display the first few rows of the training set
train_df.info()

Unnamed: 0,tweet_id,author_id,inbound,created_at,in_response_to_tweet_id,in_response_to_tweet_text,text,response_tweet_id
0,64,ChipotleTweets,False,2017-10-31 22:14:28+00:00,66,I don't fit in my Veggie Burrito costume #Hall...,I still think you look great! -Becky,65.0
1,68,ChipotleTweets,False,2017-10-31 22:14:00+00:00,69,messed up today and didn’t give me my $3 burri...,I'm so sorry about that. Please tell us more s...,
2,70,ChipotleTweets,False,2017-10-31 22:13:29+00:00,71,hey wanna come to Mammoth. I'll at least eat ...,Hopefully we'll get there at some point! -Becky,
3,73,ChipotleTweets,False,2017-10-31 22:12:39+00:00,74,I had excellent service tonight too! Plenty of...,Guac on! I'm happy it was such a great experie...,
4,75,ChipotleTweets,False,2017-10-31 20:37:31+00:00,76,When you're the only one in costume #boorito,It's because you're smart. -Tara,74.0


In [310]:
# Display the first few rows of the test set
test_df.info()

Unnamed: 0,tweet_id,author_id,inbound,created_at,in_response_to_tweet_id,in_response_to_tweet_text,text,response_tweet_id
17111,2803785,ChipotleTweets,False,2017-11-22 01:30:37+00:00,2803787,I ordered through the app. First time their f...,That's a bummer. Were you able to let a manage...,2803786.0
17112,2803788,ChipotleTweets,False,2017-11-22 02:12:00+00:00,2803786,Normally the place I go is outstanding. I’m ju...,I hope so too. -Tay,
17113,2803789,ChipotleTweets,False,2017-11-22 01:30:01+00:00,2803791,I haven't had in months and I can say I reall...,That's no goo. Which location did you go to? -Tay,2803790.0
17114,2803792,ChipotleTweets,False,2017-11-22 02:05:52+00:00,2803790,"239 S. Kings Dr. In Charlotte, NC",Bummer. Please be sure to let a manager know o...,
17115,2803793,ChipotleTweets,False,2017-11-22 01:26:00+00:00,2803794,I just experience the world record for fastest...,Sounds like a great day to me. -Tay,


In [311]:
# types of data
training_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 7 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   tweet_id                 50000 non-null  int64  
 1   author_id                50000 non-null  object 
 2   inbound                  50000 non-null  bool   
 3   created_at               50000 non-null  object 
 4   text                     50000 non-null  object 
 5   response_tweet_id        33442 non-null  object 
 6   in_response_to_tweet_id  37084 non-null  float64
dtypes: bool(1), float64(1), int64(1), object(4)
memory usage: 2.3+ MB


# At the core of most retrieval generation systems is a vector database. A vector database stores embedded representations of information.  [Vertex AI text-embeddings API](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings).

In [555]:
# Split documents into chunks 
class Document:
    def __init__(self, content, metadata=None):
        self.page_content = content
        self.metadata = metadata if metadata is not None else {}
        
def process_dataframe_in_batches(df, batch_size):
    """
    Process the DataFrame in batches to avoid memory issues and split the text into chunks.
    
    Parameters:
    df (pd.DataFrame): The input DataFrame with a column named 'text'.
    batch_size (int): The size of each batch to process.
    
    Returns:
    list: A list of text chunks.
    """
    chunks = []

    # Define the text splitter
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=100,
        chunk_overlap=50,
        length_function=len,
    )

    # Process the DataFrame in batches
    for start in range(0, len(df), batch_size):
        end = start + batch_size
        # Extract the batch of text and combine into a single string
        batch_text = " ".join(df['in_response_to_tweet_text'][start:end].tolist())
        # Wrap the combined text in a Document object
        doc = Document(batch_text)
        # Split the combined text into chunks
        batch_chunks = text_splitter.split_documents([doc])
        # Extend the chunks list with the new chunks
        chunks.extend(batch_chunks)
    
    return chunks

# Assuming `training_df` is a DataFrame with a column named 'text'
batch_size = 1000
chunks = process_dataframe_in_batches(train_df, batch_size)

# Convert the first two chunks to a DataFrame
df_chunks = pd.DataFrame([chunk.page_content for chunk in chunks[0:2]], columns=['Chunk'])


In [556]:
# Display the DataFrame
df_chunks.head(11)

Unnamed: 0,Chunk
0,I don't fit in my Veggie Burrito costume #Hall...
1,https://t.co/7tJDVpzLWn messed up today and di...


In [557]:
print(f"Number of documents: {len(training_df)}")
print(f"Number of chunks: {len(chunks)}")

Number of documents: 50000
Number of chunks: 27766


#### Embed Document Chunks 
Now we need to embed the document chunks and store them in a vectorstore. For this, we can use any text embedding model, however we need to be sure to use the same text embedding model when we embed our queries/questions at prediction time. To make things simple we will use the PaLM API for Embeddings. The langchain library provides a nice wrapper class around the PaLM Embeddings API, VertexAIEmbeddings().

Since Vertex AI Vector Search takes awhile (~45 minutes) to create an index, we will use Chroma instead to keep things simple. Of course, in a real-world use case with a large private knowledge-base, you may not be able to fit everything in memory. Langchain has a nice wrapper class for Chroma which allows us to pass in a list of documents, and an embedding class to create the vector store.

In [558]:
VertexAIEmbeddings.update_forward_refs()
embeddings = VertexAIEmbeddings("text-embedding-004")
# set persist directory so the vector store is saved to disk
db = Chroma.from_documents(chunks, embeddings, persist_directory="./vectorstore")

Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issue detected.This can occur due to parallelization.
Gapic client context issu

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

#### Putting it all together 

Now that everything is in place, we can tie it all together with a langchain chain. A langchain chain simply orchestrates the multiple steps required to use an LLM for a specific use case. In this case the process we will chain together first embeds the query/question, then performs a nearest neighbors lookup to find the relevant chunks, then uses the relevant chunks to formulate a response with an LLM. We will use the Chroma database as our vector store and PaLM as our LLM. Langchain provides a wrapper around PaLM, `VertexAI()`.

For this simple Q/A use case we can use langchain's `RetrievalQA` to link together the process.

In [559]:
# vector store
retriever = db.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 20} # number of nearest neighbors to retrieve
)


retriever_gemini = db.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 50} # number of nearest neighbors to retrieve
)

Now that everything is tied together we can send queries and get answers!

In [823]:
#################### prompt #########################################################

def get_prompt(query: str):
    print(query)
    prompt = f"""
    Using only the provided context, answer the question.
    
    You are a helpful customer service representative online on twitter for a fastfood chain Chipotle which serves mexican fastfood in the Tex-Mex style. You are tasked with providing appropriate response to the user based on type of outreach from user. Here is probable list of categories for the customer tweet and guidelines on how you should respond:
        1.question - answer only based on provided context .If you need more info, ask for it.
        2.complaint - apologize and explain the situation
        3.appreciation - Give a warm thank you!
        4.joke or wisecracks - Keep it fun and light-hearted, and thank the customer
       
        Ensure that your response fits within Twitter's character limit (120 characters).
        If a tweet falls into multiple categories, use a combination of the respective guidelines.
    
    Question: {query}
    
    If you cannot answer the question using only the provided context, respond that you do not have the context needed to answer the question.
    """
    return prompt


######################Chain####################################

palm = VertexAI(model_name="text-bison@001", temperature=0.1, top_p=0.95,top_k=5,  max_output_tokens=1024)

retriver_context_palm_qa = RetrievalQA.from_chain_type(
    llm=palm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
)

# gemini API
gemini_model = VertexAI(mode_name="gemini-1.0-pro")

retriver_context_gemini_llm = RetrievalQA.from_chain_type(
    llm=gemini_model,
    chain_type="stuff",
    retriever=retriever_gemini,
    return_source_documents=True,
)


######################Invoke gemini & palm model ####################################
# query gemini model 
def ask_gemini(question: str):
    prompt = get_prompt(question)
    
    retriever_response = retriver_context_gemini_llm({"query": prompt})
    
    final_response = f"{retriever_response['result']}"
    # print(retriever_response)
    
    return final_response

# query palm model 
def ask_palm_question(question: str):    
    prompt = get_prompt(question)
    # print("len of prompt" , len(prompt))
    
    response = retriver_context_palm_qa({"query": prompt})
    # print(response['result'])
    final_palm_response = response['result']
    
    return final_palm_response


######################Invoke gemini & palm response ####################################
# gemini reponse 
def get_gemini_response(response_tweet_id):
        query = test_df.loc[test_df['tweet_id'] == response_tweet_id, 'in_response_to_tweet_text'].values[0]
        response = ask_gemini(query)
        print(response)
        return response 

# plam response
def get_palm_response(response_tweet_id):
    try:
        query = test_df.loc[test_df['tweet_id'] == response_tweet_id, 'in_response_to_tweet_text'].values[0]
        # print(f"Tweet ID : {response_tweet_id}")
        response = ask_palm_question(query)
        print(response)
    except Exception as error:
        print(f"Error occurred: {error}")
        print(f"Query that caused the error: {query if 'query' in locals() else 'N/A'}")
        return response 
       

In [564]:

test_df.loc[test_df['tweet_id'] == 2814766.0, 'in_response_to_tweet_text'].values[0]

'Where’s the Chorizo'

In [565]:

test_df['in_response_to_tweet_text'].describe()

lengths = test_df['in_response_to_tweet_text'].astype(str).apply(len)
# print((lengths.mean()))

# test_df['in_response_to_tweet_text'].astype(str).apply(lambda x: x.str.len().mean())

# average_lengths = test_df['in_response_to_tweet_text'].apply(lambda x: x.str.len().mean())

66.32340425531915


In [826]:
print(test_df.loc[test_df['tweet_id'] == 2814766.0, 'in_response_to_tweet_text'].values[0])
print(get_palm_response(2814766.0))
print(get_gemini_response(2814766.0))

why do we pay extra for Guacamole ???
why do we pay extra for Guacamole ???
Guacamole is a premium ingredient and we want to make sure that we are able to provide the freshest and highest quality guacamole to our customers.
None
why do we pay extra for Guacamole ???
 I'm sorry, but the provided context does not contain the answer to why Chipotle charges extra for guacamole.
 I'm sorry, but the provided context does not contain the answer to why Chipotle charges extra for guacamole.


In [593]:
# # Apply the function to populate the human_response_tweet_text column
test_df['palm_response'] = test_df['tweet_id'].apply(get_palm_response)

In [596]:
# # Apply the function to populate the human_response_tweet_text column
test_df['gemini_llm_response'] = test_df['tweet_id'].apply(get_gemini_response)

In [597]:
test_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1880 entries, 17111 to 391
Data columns (total 14 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   tweet_id                   1488 non-null   float64
 1   author_id                  1488 non-null   object 
 2   inbound                    1488 non-null   object 
 3   created_at                 1488 non-null   object 
 4   in_response_to_tweet_id    1488 non-null   float64
 5   in_response_to_tweet_text  1488 non-null   object 
 6   text                       1488 non-null   object 
 7   response_tweet_id          433 non-null    object 
 8   palm_response              1488 non-null   object 
 9   gemini_llm_response        1488 non-null   object 
 10  palm_score                 314 non-null    float64
 11  palm_justifcation          314 non-null    object 
 12  gemini_1_score             392 non-null    float64
 13  gemini_justifcation        392 non-null    float64

In [598]:
test_df.to_csv("llm_responses.csv",index=False)

In [754]:
# create prompt template 

from langchain import PromptTemplate

template_temp = """
You are a customer service Manager on twitter for a fastfood chain Chipotle which serves mexican fastfood in the Tex-Mex style  

    1. Understandability: response is  easy to understand, any technical terms or jargon are explained adequately.Ignore
    2. Actionability: offer potential solutions, next steps for the customer,  assurances that the issue will be addressed or gather more information
    

GIVEN ANSWER: {llm_response}
TRUE ANSWER: {human_response}

Example 1: 
  Text: 129 Kings street, MountainView CA, 
  Answer is : The provided context does not mention anything about 129 Kings street, MountainView CA, , so I cannot answer this question.
  score is : 2
  
Example 2: 
  Text: The rice was probably undercooked"
  Answer is :  I'm sorry to hear that your rice was crunchy. Unfortunately, I can't find any information in the provided text about what can make rice less watery and have a richer flavor.
  score is : 1
  

SCORE: Assign a single score of 0, 1, or 2 to the given answer.  Evaluate which response is more effective in terms of engaging with the user and addressing their concerns comprehensively.

JUSTIFICATION: (Provide reasoning for your score and explain why the GIVEN answer is scored the way you did, referencing the criteria and examples)

"""

template = """
   you are grading the similarity between two customer service agents. 
   The responses can be worded differently, but evaluate if they are similar to each other. 
   Check if they address the customer's issue.
   The tone, approach and level of problem-solving can vary between the response.
    
    
    Customer issue : {query}
    Customer service agents are below: 
    Customer Service agent 1: {human_response}
    Customer Service agent 2: {llm_response}
    
    
    If customer service agent 2's response is not a valid response to the customer issue, the score is 0.
    If customer service agent 2's response, varies from customer service agent 1's response but varies in tone, approach, wording and level of problem solving, the score is 1.
    
    SCORE: (Provide a score of 0 or 1 only)
    
   JUSTIFICATION: (Provide reasoning for your score)
"""

prompt = PromptTemplate(
    input_variables=["query","llm_response", "human_response"],
    template=template,
)

def create_formatted_prompt(query,llm_response, human_response):
    # Format the prompt with the given responses
    formatted_prompt = prompt.format(query= query,
        llm_response=llm_response,
        human_response=human_response
    )
    return formatted_prompt

In [820]:
import re

def extract_score_and_justification(text_response):
    """
    Extracts the score and justification from the text response.

    Parameters:
    - text_response (str): The response text from the model which includes the score and justification.

    Returns:
    - tuple: (score (int), justification (str))
    """
    # Regex to find the score in the format "SCORE: X" where X is a number
    print(text_response)
    score_match = re.search(r'**SCORE:**\s*(\d+)', text_response)
    # Extract the score if found
    score = int(score_match.group(1)) if score_match else 0

    # Extract the justification text following the "JUSTIFICATION:" label
    justification_match = re.search(r'**JUSTIFICATION**:\s(.*)', text_response, re.DOTALL)
    # Extract the justification if found
    justification = justification_match.group(1).strip() if justification_match else 0

    return score, justification


# invoke gemini for judgement
def evaluate_response(tweet_id,query,llm_response, human_response, model_name="gemini-1.5-pro",
    temperature=0.1,max_output_tokens=256, top_p=0.8,
                      top_k=5,):
    """
    Evaluates a given response against a human-provided response using a generative model.

    Parameters:
    - llm_response (str): The response generated by the language model.
    - human_response (str): The reference response provided by a human expert.
    - model_name (str): The name of the generative model to use.
    - temperature (float): Sampling temperature for response generation.
    - max_output_tokens (int): Maximum number of tokens in the generated response.
    - top_p (float): Nucleus sampling probability.
    - top_k (int): Top-k sampling value.

    Returns:
    - tuple: (score (int), justification (str))
    """
        
    # format prompt 
    prompt_final = create_formatted_prompt(query, llm_response, human_response)
    # intialize model 
    # print(prompt_final)
    print(query,llm_response, human_response)
    print("Tweet ID:",tweet_id)
    model = GenerativeModel(model_name)
    # generative config
    generation_config = GenerationConfig(
        temperature=temperature,
        max_output_tokens=1024,
        top_p=0.5,
        top_k=5
    )
    responses = model.generate_content(
        prompt_final,
        generation_config=generation_config
    )
    # print(responses.candidates[0].content.parts[0].text)
    print(responses)
    score, justification = extract_score_and_justification(responses.candidates[0].content.parts[0].text)

    return score, justification

In [190]:
# responses = evaluate_response( test_tweet_data_df['llm_response'].loc[10] , test_tweet_data_df['human_response_tweet_text'].loc[10])
# # print(responses.candidates[0].content.parts[0].text)
# # print(responses)

In [833]:
# Apply the function to each row in the DataFrame  palm llm
import time 
# Number of rows to apply the function to
num_rows = 8

# Apply the function to the first few rows
for i in range(1,2):
    row = test_df.iloc[i]
    # print(row)
    palm_score, palm_justification = evaluate_response(tweet_id=row['tweet_id'], query=row['in_response_to_tweet_text'], llm_response= row['palm_response'], human_response=row['text'])
    # print(f"Row: {i} , Score: {palm_score} , Justificaton:  {palm_justification}")
    print(f"Row: {i} , Score: {palm_score} ")
    # test_df.at[i, 'palm_score'] = palm_score
    # test_df.at[i, 'palm_justifcation'] = palm_justification
    test_df.iloc[i, test_df.columns.get_loc('palm_score')] = palm_score
    test_df.iloc[i, test_df.columns.get_loc('palm_justifcation')] = palm_justification



Normally the place I go is outstanding. I’m just hoping this is a fluke. I'm sorry to hear that you had a bad experience. We're always looking for ways to improve our service, so please let us know what happened so we can make it right.
 I hope so too. -Tay
Tweet ID: 2803788.0
candidates {
  content {
    role: "model"
    parts {
      text: "**SCORE:** 1\n\n**JUSTIFICATION:** Customer service agent 2\'s response is a valid response to the customer\'s issue. It acknowledges the customer\'s negative experience and shows a willingness to address the situation. \n\nWhile both agents respond to the customer, they do so with different tones, approaches, wording, and levels of problem-solving. \n\n* **Agent 1** offers a simple, empathetic response but doesn\'t probe for more information or offer solutions. \n* **Agent 2** takes a more proactive approach, expressing apology, encouraging feedback, and promising action to make things right. \n \nTherefore, the score is 1. \n"
    }
  }
  finis

error: nothing to repeat at position 0

In [719]:
# test_tweet_data_df.drop('palm_justification', axis=1, inplace=True)
test_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1880 entries, 17111 to 391
Data columns (total 14 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   tweet_id                   1488 non-null   float64
 1   author_id                  1488 non-null   object 
 2   inbound                    1488 non-null   object 
 3   created_at                 1488 non-null   object 
 4   in_response_to_tweet_id    1488 non-null   float64
 5   in_response_to_tweet_text  1488 non-null   object 
 6   text                       1488 non-null   object 
 7   response_tweet_id          433 non-null    object 
 8   palm_response              1488 non-null   object 
 9   gemini_llm_response        1488 non-null   object 
 10  palm_score                 343 non-null    float64
 11  palm_justifcation          343 non-null    object 
 12  gemini_1_score             392 non-null    float64
 13  gemini_justifcation        392 non-null    float64

In [771]:
# Number of rows to apply the function to

# Apply the function to the first few rows
for i in range(100,200):
    row = test_df.iloc[i]
    gemini_1_score, gemini_justifcation = evaluate_response(tweet_id=row['tweet_id'], query=row['in_response_to_tweet_text'], llm_response= row['gemini_llm_response'], human_response=row['text'])
    print(f"Row: {i} , Score: {gemini_1_score}")
    test_df.iloc[i, test_df.columns.get_loc('gemini_1_score')] = gemini_1_score
    test_df.iloc[i, test_df.columns.get_loc('gemini_justifcation')] = gemini_justifcation
    # test_df.at[i, 'gemini_1_score'] = gemini_1_score
    # test_df.at[i, 'gemini_justifcation'] = gemini_justifcation
     

In [743]:
test_df.head()

Unnamed: 0,tweet_id,author_id,inbound,created_at,in_response_to_tweet_id,in_response_to_tweet_text,text,response_tweet_id,palm_response,gemini_llm_response,palm_score,palm_justifcation,gemini_1_score,gemini_justifcation
17111,2803785.0,ChipotleTweets,False,2017-11-22 01:30:37+00:00,2803787.0,I ordered through the app. First time their f...,That's a bummer. Were you able to let a manage...,2803786.0,I'm sorry to hear that your food was cold. We ...,I'm sorry to hear that your food was cold. Un...,,,0.0,0.0
17112,2803788.0,ChipotleTweets,False,2017-11-22 02:12:00+00:00,2803786.0,Normally the place I go is outstanding. I’m ju...,I hope so too. -Tay,,I'm sorry to hear that you had a bad experienc...,"I'm sorry, but the provided text does not men...",,,0.0,0.0
17113,2803789.0,ChipotleTweets,False,2017-11-22 01:30:01+00:00,2803791.0,I haven't had in months and I can say I reall...,That's no goo. Which location did you go to? -Tay,2803790.0,The food was not fresh and the meat was not co...,The user did not state what restaurant they w...,,,,
17114,2803792.0,ChipotleTweets,False,2017-11-22 02:05:52+00:00,2803790.0,"239 S. Kings Dr. In Charlotte, NC",Bummer. Please be sure to let a manager know o...,,I don't know.\n,The provided context does not mention anythin...,,,,
17115,2803793.0,ChipotleTweets,False,2017-11-22 01:26:00+00:00,2803794.0,I just experience the world record for fastest...,Sounds like a great day to me. -Tay,,"The answer is ""I just experience the world rec...","I'm sorry, but the provided text does not con...",,,,


In [798]:
evaluvation_df = test_df[['gemini_1_score', 'palm_score']].head(100)

In [816]:
evaluvation_df.head(100)
average_values = evaluvation_df.mean()
total_rows = len(evaluvation_df)
average_values = (evaluvation_df.sum() / (total_rows)) * 100
print(average_values)

accuracy_df = pd.DataFrame({
    'Model Name': average_values.index,
    'Accuracy (%)': average_values.values
})

accuracy_df.to_csv("model_accuracy.csv",index=False)

gemini_1_score    10.0
palm_score         3.0
dtype: float64
