# Agent Evaluation
In this notebook we evaluate our Agent using using Giskard

[Colab Notebook](https://colab.research.google.com/github/giskard-ai/giskard/blob/main/docs/getting_started/quickstart/quickstart_llm.ipynb)

## Installs and imports

In [1]:
# %pip install "giskard[llm]" --upgrade
# %pip install langchain langchain-openai langchain-community pypdf faiss-cpu openai tiktoken

In [2]:
# load env vars
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

if os.environ.get('OPENAI_API_KEY', ''):
    print('OPENAI_API_KEY is set')

OPENAI_API_KEY is set


## Get our langchain model
Let's create an instance of our langchain model that will be used in the project

In [3]:
# create an agent
from ai_librarian import AiLibrarian
avatar = AiLibrarian()
avatar.create_worker_agent()

# test the agent
avatar.generate_model_answer('Tell me about The Three Musketeers')


  from tqdm.autonotebook import tqdm, trange
  memory = ConversationBufferWindowMemory(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `search_database_for_book_information` with `{'query_text': 'The Three Musketeers'}`


[0m

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[36;1m[1;3m{'documents': [["The Three Musketeers (A Stepping Stone Book(TM)) - In seventeenth-century France, young D'Artagnan befriends three musketeers, Athos, Porthos, and Aramis, and joins them in trying to outwit the enemies of the king and queen. Simultaneous. -  Juvenile Fiction , Action & Adventure , General", 'Classic Starts®: The Three Musketeers (Classic Starts® Series) - All for one and one for all! That’s the rallying cry of the Musketeers—guards of the French King—and the call to adventure for young readers enjoying their first taste of Dumas’ classic swashbuckler. Aramis, Athos, Porthos, and the not-quite-yet Musketeer D’Artagnan use their wits and their swords to battle an evil Cardinal, the traitorous Milady, and other enemies of the French court. -  Juvenile Fiction , Classics', "The Three Musketeers - Records the heroic deeds of Athos, Porthos, Aramis, and d'Artagnan as defenders of Louis XIV -  Fiction , Classics"]], 'metadatas': [[{'Authors': 'By Dumas, Alexandre

'"The Three Musketeers," written by Alexandre Dumas, is a captivating tale set in seventeenth-century France. It follows young d\'Artagnan as he finds camaraderie with the three formidable musketeers – Athos, Porthos, and Aramis. Together, they embark on thrilling adventures, battling treachery and conducting themselves with the spirited motto, "All for one and one for all!" This narrative not only highlights themes of friendship and loyalty but also immerses you in the political intrigue of the French court.'

## Create Evaluator

In [10]:
from langchain_openai import ChatOpenAI
from langchain.docstore.document import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

### Answer Score
Let's rate the answer of our model compared to the user query


In [None]:
def answer_score(query: str, context: str) -> bool:
    
    prompt_text = """
    You are an evaluator for a conversational AI system. The user asked:
    "{query}"
    
    The AI responded with:
    "{context}"
    
    Please grade on a scale from 0 to 5 if the response is 
    - Relevant to the question and answers the question
    - A sensible, non-recursive, and complete answer
    - Grammatically correct
    - Engaging and informative
    - Free of factual errors
    - Free of offensive content

    Answer with a single integer from 0 to 5, 0 being the worst and 5 being the best score.
    """

    prompt = ChatPromptTemplate.from_messages(
        [("system", prompt_text),]
    )

    llm = ChatOpenAI(
        temperature=0,
        model_name='gpt-4o-mini',
        api_key=os.getenv("OPENAI_API_KEY")
    )

    chain = create_stuff_documents_chain(llm, prompt)
    document = Document(page_content=context)
    return chain.invoke({"context": [document], "query": query})

query = 'Tell me about The Three Musketeers'
context = avatar.generate_model_answer(query)
print('ANSWER SCORE:', answer_score(query, context))

context = 'The three musketeers is a cartoon with dogs for children'
print('ANSWER SCORE:', answer_score(query, context))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `search_database_for_book_information` with `{'query_text': 'The Three Musketeers'}`


[0m

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[36;1m[1;3m{'documents': [["The Three Musketeers (A Stepping Stone Book(TM)) - In seventeenth-century France, young D'Artagnan befriends three musketeers, Athos, Porthos, and Aramis, and joins them in trying to outwit the enemies of the king and queen. Simultaneous. -  Juvenile Fiction , Action & Adventure , General", 'Classic Starts®: The Three Musketeers (Classic Starts® Series) - All for one and one for all! That’s the rallying cry of the Musketeers—guards of the French King—and the call to adventure for young readers enjoying their first taste of Dumas’ classic swashbuckler. Aramis, Athos, Porthos, and the not-quite-yet Musketeer D’Artagnan use their wits and their swords to battle an evil Cardinal, the traitorous Milady, and other enemies of the French court. -  Juvenile Fiction , Classics', "The Three Musketeers - Records the heroic deeds of Athos, Porthos, Aramis, and d'Artagnan as defenders of Louis XIV -  Fiction , Classics"]], 'metadatas': [[{'Authors': 'By Dumas, Alexandre

### Sanity Check
A sanity check is a simple check to ensure that the agent’s response is not nonsensical, empty, or wildly irrelevant. For instance, a response should not be blank, too short, or completely unrelated.

In [16]:
def sanity_check(query: str, context: str) -> bool:
    
    prompt_text = """
    You are an evaluator for a conversational AI system. The user asked:
    "{query}"
    
    The AI responded with:
    "{context}"
    
    Please determine if the response is:
    - Relevant to the question
    - A sensible, non-recursive, and complete answer

    Answer with only "True" if it passes the sanity check and "False" if it does not.
    """

    prompt = ChatPromptTemplate.from_messages(
        [("system", prompt_text),]
    )

    llm = ChatOpenAI(
        temperature=0,
        model_name='gpt-4o-mini',
        api_key=os.getenv("OPENAI_API_KEY")
    )

    chain = create_stuff_documents_chain(llm, prompt)
    document = Document(page_content=context)
    return chain.invoke({"context": [document], "query": query})

query = 'Tell me about The Three Musketeers'
context = avatar.generate_model_answer(query)
print('SANITY CHECK:', sanity_check(query, context))

context = 'The three musketeers is a cartoon with dogs for children'
print('SANITY CHECK:', sanity_check(query, context))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mIn "The Three Musketeers," written by Alexandre Dumas, we follow the spirited journey of d’Artagnan, a young man eager to become a musketeer in the royal guards. Rallying alongside the iconic trio—Athos, Porthos, and Aramis—They are soon entangled in intrigues, facing the cunning Milady and navigating the complex political landscape of 17th-century France. Rich in themes of loyalty, bravery, and friendship, the story celebrates the unbreakable bonds between comrades, illuminating both exhilarating adventures and deep-seated motives of honor in their quest for justice. There's something timeless about the story's capacity to evoke excitement and camaraderie that continues to resonate with readers today.[0m

[1m> Finished chain.[0m
SANITY CHECK: True
SANITY CHECK: False


### Response Time
Let's create a method to calculate the response time

In [None]:
import time

def generate_model_answer_timed(query: str) -> tuple[str, float]:
    start = time.time()
    answer = avatar.generate_model_answer(query)
    end = time.time()
    time_diff = end - start
    return answer, time_diff

query = 'Tell me about the Three Musketeers'
answer, gen_time = generate_model_answer_timed(query)
print(answer)
print(f"Response time: {gen_time} seconds")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `search_database_for_book_information` with `{'query_text': 'The Three Musketeers'}`


[0m

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[36;1m[1;3m{'documents': [["The Three Musketeers (A Stepping Stone Book(TM)) - In seventeenth-century France, young D'Artagnan befriends three musketeers, Athos, Porthos, and Aramis, and joins them in trying to outwit the enemies of the king and queen. Simultaneous. -  Juvenile Fiction , Action & Adventure , General", 'Classic Starts®: The Three Musketeers (Classic Starts® Series) - All for one and one for all! That’s the rallying cry of the Musketeers—guards of the French King—and the call to adventure for young readers enjoying their first taste of Dumas’ classic swashbuckler. Aramis, Athos, Porthos, and the not-quite-yet Musketeer D’Artagnan use their wits and their swords to battle an evil Cardinal, the traitorous Milady, and other enemies of the French court. -  Juvenile Fiction , Classics', "The Three Musketeers - Records the heroic deeds of Athos, Porthos, Aramis, and d'Artagnan as defenders of Louis XIV -  Fiction , Classics"]], 'metadatas': [[{'Authors': 'By Dumas, Alexandre

### Evaluate Readability
The Flesch Reading Ease score is a popular readability test used to evaluate how easy a text is to read. It’s particularly useful in ensuring your chatbot's responses are clear and accessible to your target audience.

Score Interpretation
The Flesch Reading Ease Score ranges from 0 to 100, with higher scores indicating easier readability. Here’s a breakdown:

Score Range	Description	Readability Level  
90 - 100	Very Easy	5th grade  
80 - 89	Easy	6th grade  
70 - 79	Fairly Easy	7th grade  
60 - 69	Standard	8th-9th grade  
50 - 59	Fairly Difficult	High school  
30 - 49	Difficult	College  
0 - 29	Very Confusing	College graduate and above  


In [None]:
from textstat import flesch_reading_ease

def readability_score(response: str) -> float:
    # Higher scores indicate easier readability
    return flesch_reading_ease(response)

query = 'Tell me about the Three Musketeers'
answer = avatar.generate_model_answer(query)
score = readability_score(answer)
print(f"Readability score: {score}")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m"The Three Musketeers," a novel by Alexandre Dumas, transports readers to a bustling France during the 17th century and revolves around the determined young d’Artagnan, who dreams of becoming a musketeer. His journey intertwines with the clever and loyal trio—Athos, Porthos, and Aramis—who embrace a philosophy encapsulated in the phrase "All for one, and one for all!" The story explores themes of friendship, honor, and intrigue, showcasing d'Artagnan's missions as he navigates plots against King Louis XIII, faced with characters like Cardinal Richelieu and the cunning Milady de Winter. This dynamic and vivid adventure continues to resonate with those who crave tales of camaraderie and heroism.[0m

[1m> Finished chain.[0m
Readability score: 44.58


### Response length
Let's see how long an answer is in terms of characters, words and tokens

In [None]:
import tiktoken
encoding = tiktoken.get_encoding("cl100k_base")  # for gpt4o models

def answer_length(text: str) -> int:
    # initialise the tokenizer

    characters = len(text)
    words = len(text.split())
    tokens = len(encoding.encode(text))

    return characters, words, tokens

query = 'Tell me about the Three Musketeers'
answer = avatar.generate_model_answer(query)
length = answer_length(answer)
print(f"Character count: {length[0]}")
print(f"Word count: {length[1]}")
print(f"Token count: {length[2]}")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `search_database_for_book_information` with `{'query_text': 'The Three Musketeers'}`


[0m

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[36;1m[1;3m{'documents': [["The Three Musketeers (A Stepping Stone Book(TM)) - In seventeenth-century France, young D'Artagnan befriends three musketeers, Athos, Porthos, and Aramis, and joins them in trying to outwit the enemies of the king and queen. Simultaneous. -  Juvenile Fiction , Action & Adventure , General", 'Classic Starts®: The Three Musketeers (Classic Starts® Series) - All for one and one for all! That’s the rallying cry of the Musketeers—guards of the French King—and the call to adventure for young readers enjoying their first taste of Dumas’ classic swashbuckler. Aramis, Athos, Porthos, and the not-quite-yet Musketeer D’Artagnan use their wits and their swords to battle an evil Cardinal, the traitorous Milady, and other enemies of the French court. -  Juvenile Fiction , Classics', "The Three Musketeers - Records the heroic deeds of Athos, Porthos, Aramis, and d'Artagnan as defenders of Louis XIV -  Fiction , Classics"]], 'metadatas': [[{'Authors': 'By Dumas, Alexandre

### Lexical Overlap

In [51]:
from sentence_transformers import SentenceTransformer, util
from nltk.tokenize import word_tokenize
import numpy as np
from nltk.corpus import stopwords

# Initialize the model
model = SentenceTransformer('all-MiniLM-L6-v2')

def semantic_overlap(query: str, answer: str, stopwords: set) -> float:
    # Tokenize and filter stop words
    query_words = set([word for word in word_tokenize(query.lower()) if word not in stopwords])
    answer_words = set([word for word in word_tokenize(answer.lower()) if word not in stopwords])
    
    # Compute embeddings
    query_embedding = model.encode(' '.join(query_words))
    answer_embedding = model.encode(' '.join(answer_words))
    
    # Calculate cosine similarity
    similarity_score = util.cos_sim(query_embedding, answer_embedding).item()
    
    return similarity_score

# Example usage
stopwords_en = stopwords.words('english')
query = 'Tell me about the Three Musketeers'
answer = avatar.generate_model_answer(query)
semantic_score = semantic_overlap(query, answer, stopwords_en)
print(f"Semantic overlap score: {semantic_score}")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m"The Three Musketeers," written by Alexandre Dumas, tells the story of the young and passionate d'Artagnan who sets off to join the King’s Musketeers in France during the 17th century. He quickly befriends the titular three musketeers—Athos, Porthos, and Aramis. Together, they embark on a series of thrilling adventures, battling political intrigue and upholding their values of loyalty, honor, and friendship. The novel beautifully illustrates the tension between personal ambition and loyalty to friendship, encapsulated in its famous motto: "All for one and one for all!"[0m

[1m> Finished chain.[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Semantic overlap score: 0.5107448101043701


### Entity Agreement
NOT WORKING

In [74]:
import nltk
nltk.download('maxent_ne_chunker_tab')
nltk.download('words')
from nltk import ne_chunk, pos_tag, word_tokenize

def extract_entities(text):
    # Extract named entities from text
    chunks = ne_chunk(pos_tag(word_tokenize(text)))
    print(chunks.leaves())
    entities = [" ".join([token for token, _ in chunk.leaves()])
            for chunk in chunks if hasattr(chunk, 'label')]
    print(f"Extracted entities from '{text}': {entities}")  # Debug output
    return entities

def entity_agreement(query, answer):
    # Extract entities in answer and query
    query_entities = set(extract_entities(query.lower()))
    answer_entities = set(extract_entities(answer.lower()))

    print(f"Entities in query: {query_entities}")  # Debug output
    print(f"Entities in answer: {answer_entities}")  # Debug output

    # Calculate proportion of entities in answer that are also in query
    if not answer_entities:
        return 0
    return len(answer_entities.intersection(query_entities)) / len(answer_entities)

# Example Usage
query = 'Tell me about the Three Musketeers'
answer = 'The Three Musketeers is a novel by Alexandre Dumas.'
# answer = avatar.generate_model_answer(query)
agreement_score = entity_agreement(query, answer)
print(f"Entity Agreement Score: {agreement_score}")


[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     C:\Users\Nath\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker_tab is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     C:\Users\Nath\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!


[('tell', 'VB'), ('me', 'PRP'), ('about', 'IN'), ('the', 'DT'), ('three', 'CD'), ('musketeers', 'NNS')]
Extracted entities from 'tell me about the three musketeers': []
[('the', 'DT'), ('three', 'CD'), ('musketeers', 'NNS'), ('is', 'VBZ'), ('a', 'DT'), ('novel', 'JJ'), ('by', 'IN'), ('alexandre', 'JJ'), ('dumas', 'NN'), ('.', '.')]
Extracted entities from 'the three musketeers is a novel by alexandre dumas.': []
Entities in query: set()
Entities in answer: set()
Entity Agreement Score: 0
