# Evaluation Part

Evaluate LLM responses where there isn't a single "right answer."

## Setup
#### Load the API key and relevant Python libaries.
In this course, we've provided some code that loads the OpenAI API key for you.

In [1]:
import os
import openai
openai.api_key  = ' '

In [2]:
def get_completion_from_messages(messages, model="gpt-3.5-turbo", temperature=0, max_tokens=500):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, 
        max_tokens=max_tokens, 
    )
    return response.choices[0].message["content"]

### Run through the end-to-end system to answer the user query

These helper functions are running the chain of promopts that you saw in the earlier videos.

In [3]:
customer_msg = {
"few_shot_user_1" : "Hi",
"few_shot_user_2" : "Course detail please",
"few_shot_user_3" : "No",
"few_shot_user_4" : "Facebook",
"few_shot_user_5" : "Ramisha",
"few_shot_user_6" : "Yes,I watched webinar",
"few_shot_user_7" : "Ramisha",
"few_shot_user_8" : "No",
"few_shot_user_9" : "Yes",
"few_shot_user_10" : "Do you provide placement assessment?",
"few_shot_user_11" : "What topics will you cover in the course?",
"few_shot_user_12" : "What is the duration to complete the course?",
"few_shot_user_13" : "Is this course recorded or conducted as live classes?",
"few_shot_user_14" : "How can I clarify my doubts?",
"few_shot_user_15" : "Do you offer flexible payment options?",
"few_shot_user_16" : "Will you issue a certificate after completing the course?",
"few_shot_user_17" : "Can we access the course from outside India?" }

In [4]:
context = f"""Welcome to Hope AI, an E-Learning platform committed to providing world-class Artificial Intelligence and Data \ 
          Science courses. Our goal is to empower individuals passionate about advancing their knowledge in these fields.\ 
          Our courses offer comprehensive knowledge and practical skills in AI and Data Science, supported by mentorship from industry experts.\
          Founded in 2019, Hope AI started as a startup under the MSME scheme. Since then, we've been dedicated to offering quality education \
          and training in AI and Data Science. Proudly, we successfully converted to a private limited company in December 2022, demonstrating our \
          commitment to providing high-quality education long-term.

          Our mission is to empower individuals with the knowledge and skills to succeed in the AI and Data Science industry.\
          Through expert mentorship and practical training, we aim to help students achieve their goals and excel in their careers.\
          We believe everyone can become an AI expert, and we're here to help them realize that potential.\
          If anyone asks about course fee ask them to watch webinar in which we have covered about the fees. 

        Company Details:
          - Company Name: Hope Artificial Intelligence Private Limited
          - Branding Name: Hope AI
          - Platform: E-learning for Artificial Intelligence and Data Science in Tamil
          - Learning Format: Self-Paced
          - Target Audience: IT, Non-IT, Career Restart, Freshers
          - Customized Career Paths: Based on work experience and academic history
          - Doubt Clarification: 24/7 support through WhatsApp (Chat, Voice, Call, Screen Sharing)
          - Additional Assistance: Resume preparation, LinkedIn Growth

        Tutor Information:
          - Tutor: Ramisha Rani K
          - Experience: 7+ years in AI and Data Science
          - Expertise: Python, Machine Learning, NLP, Time Series Analysis, Deep Learning, ChatGPT API
          - Project Handling: Business Requirements, AI Integration, Data Collection, Preprocessing, Model Building, Evaluation, Deployment

        Reasons for Online Classes:
          - Accessibility: Learn from anywhere, anytime
          - Flexibility: Fits busy schedules
          - Personalized Mentorship: Tailored guidance for each student
          - Convenience: Learn from home, no travel required

        Syllabus Overview:
          - Python Basics
          - Libraries in Python: Numpy, Pandas, Matplotlib, Sklearn, Seaborn
          - Data Science: Loading Dataset, Missing Data, Categorical Data, Feature Scaling
          - Univariate Analysis in Data Science
          - Bivariate Analysis in Data Science
          - Machine Learning: Regression, Classification, Clustering
          - Advanced ML: Feature Selection, Dimensionality Reduction
          - Artificial Neural Network, Convolutional Neural Network, RNN, LSTMs
          - Deep Learning & Computer Vision
          - Time Series Analysis & Forecasting
          - Natural Language Processing: Tokenization, Stemming, Sentiment Analysis, Chatbot Creation
          - Web Development with Django and AI Model Integration
          - Capstone Portfolio Projects

        Placement Support:
          - Intensive placement assistance
          - Course Duration: 2 hours per day for 6 months
          - Extended Support: 3 years
          - Lifetime Course Access

        Capstone Project:
          - Real-time scenario-based projects based on individual interests

        Detailed Syllabus:

        Python Basics:
        - Control Statements, Functions, Lists, Tuples
        - Libraries: Numpy, Pandas, Matplotlib, Sklearn, Seaborn
        - Data Science Fundamentals: Loading Datasets, Data Cleaning, Feature Scaling

        Data Science & Analysis:
        - Univariate Analysis: Central Tendency, Variance, PDF, CDF, Box Plot, etc.
        - Bivariate Analysis: Covariance, Correlation, Hypothesis Testing

        Machine Learning:
        - Regression: Simple & Multiple Linear Regression, Decision Trees, Random Forest
        - Classification: Logistic Regression, SVM, Naive Bayes, KNN
        - Clustering: K-Means, Hierarchical Clustering

        Advanced ML Techniques:
        - Feature Selection, Dimensionality Reduction: PCA, LDA

        Deep Learning:
        - Neural Networks: ANN, CNN, RNN, LSTMs
        - Computer Vision: Object Detection, Image Generation, Face Recognition
        
        Time Series Analysis:
        - Introduction, Visualization, Stationarity, Forecasting Models, ARIMA
        
        Natural Language Processing:
        - Text Processing: Tokenization, Stemming, Lemmatization
        - Sentiment Analysis, Named Entity Recognition, Chatbot Creation
        
        Web Development & AI Integration:
        - Introduction to Django, AI Model Integration
        
        Capstone Portfolio Projects:
        - Classification, Regression, Clustering, Deep Learning, Time Series Analysis, NLP
        
        Placement Support: We providing intensive placement support
        Duration of the course: per day 2 hours – 6 months
        Extensive Support for 3-years.
        Life Time course access.
        """

In [5]:
assistant_answer = {
"few_shot_assistant_1" : "How can I help you?",
"few_shot_assistant_2" : "Did you attend our free webinar?",
"few_shot_assistant_3" : "How did you Know about us? Facebook,Instagram,Youtube",
"few_shot_assistant_4" : "ok,I would like to know your name.",
"few_shot_assistant_5" : "ok ,Ramisha, I request you to attend our free webinar. You will know about the syllabus, fee details, we provide support for 3 years, life time access and all other information about the course",
"few_shot_assistant_6" : "ok,I would like to know your name.",
"few_shot_assistant_7" : "Okay, Ramisha Rani, during the webinar, I mentioned that you should book a 1:1 call to discuss more about the course. If you encounter any issues while booking the slot, please let me know.",
"few_shot_assistant_8" : "Then, Please book a call",
"few_shot_assistant_9" : "Please share the issue you are facing",
"few_shot_assistant_10" : "Yes,we are providing",
"few_shot_assistant_11" : "Machine learning , Deep learning , Natural language processing , Time series analysis , and Generative AI",
"few_shot_assistant_12" : "Around 6 months. When you put 2 hours per day, you will complete the course",
"few_shot_assistant_13" : "This courses recorded",
"few_shot_assistant_14" : "We have a support team to clear doubts instantly",
"few_shot_assistant_15" : "Yes, we provide various payment plans",
"few_shot_assistant_16" : "Yes ,we provide course completion certificate",
"few_shot_assistant_17" : "Yes, our courses are accessible from outside India" }


### Evaluate the LLM's answer to the user with a rubric, based on the extracted product information

In [6]:
cust_prod_info = {
    'customer_msg': customer_msg,
    'context': context
}

In [7]:
def eval_with_rubric(test_set, assistant_answer):

    cust_msg = test_set['customer_msg']
    context = test_set['context']
    completion = assistant_answer
    
    system_message = f"""\
    You are an assistant that evaluates how well the customer service agent \
    answers a user question by looking at the context that the customer service \
    agent is using to generate its response. 
    """

    user_message = f"""\
    You are evaluating a submitted answer to a question based on the context \
    that the agent uses to answer the question.
    Here is the data:
    [BEGIN DATA]
    ************
    [Question]: {cust_msg}
    ************
    [Context]: {context}
    ************
    [Submission]: {completion}
    ************
    [END DATA]

    Compare the factual content of the submitted answer with the context. \
    Ignore any differences in style, grammar, or punctuation.
    Answer the following questions:
    - Is the Assistant response based only on the context provided? (Y or N)
    - Does the answer include information that is not provided in the context? (Y or N)
    - Is there any disagreement between the response and the context? (Y or N)
    - Count how many questions the user asked. (output a number)
    - For each question that the user asked, is there a corresponding answer to it?
      Question 1: (Y or N)
      Question 2: (Y or N)
      ...
      Question N: (Y or N)
    - Of the number of questions asked, how many of these questions were addressed by the answer? (output a number)
    """

    messages = [
        {'role': 'system', 'content': system_message},
        {'role': 'user', 'content': user_message}
    ]

    response = get_completion_from_messages(messages)
    return response

In [8]:
evaluation_output = eval_with_rubric(cust_prod_info, assistant_answer)
print(evaluation_output)

- Is the Assistant response based only on the context provided? (Y or N)  
    N

- Does the answer include information that is not provided in the context? (Y or N)  
    Y

- Is there any disagreement between the response and the context? (Y or N)  
    Y

- Count how many questions the user asked. (output a number)  
    17

- For each question that the user asked, is there a corresponding answer to it?  
    Question 1: N  
    Question 2: N  
    Question 3: N  
    Question 4: N  
    Question 5: Y  
    Question 6: Y  
    Question 7: Y  
    Question 8: Y  
    Question 9: Y  
    Question 10: Y  
    Question 11: Y  
    Question 12: Y  
    Question 13: Y  
    Question 14: Y  
    Question 15: Y  
    Question 16: Y  
    Question 17: Y  

- Of the number of questions asked, how many of these questions were addressed by the answer? (output a number)  
    11


### Evaluate the LLM's answer to the user based on an "ideal" / "expert" (human generated) answer.

In [9]:
test_set_ideal = {
    'customer_msg': """\
1.Hi
2.Course detail please
3.No
4.Facebook
5.Ramisha
6.Yes,I watched webinar
7.Ramisha 
8.No 
9.Yes
10.Do you provide placement assessment?
11.What topics will you cover in the course?
12.What is the duration to complete the course?
13.Is this course recorded or conducted as live classes?
14.How can I clarify my doubts?
15.Do you offer flexible payment options?
16.Will you issue a certificate after completing the course?
17.Can we access the course from outside India? """,
    
    'ideal_answer':"""\
1.How can I help you?
2.Did you attend our free webinar?
3.How did you Know about us?Facebook,Instagram,Youtube
4.ok,I would like to know your name.
5.ok ,Ramisha, I request you to attend our free webinar. You will know about the syllabus, fee details, we provide support for 3 years, life time access and all other information about the course
6.ok,I would like to know your name.
7.Okay, Ramisha Rani, during the webinar, I mentioned that you should book a 1:1 call to discuss more about the course. If you encounter any issues while booking the slot, please let me know.
8.Then, Please book a call
9.Please share the issue you are facing
10.Yes,we are providing
11.Machine learning , Deep learning , Natural language processing , Time series analysis , and Generative AI
12.Around 6 months. When you put 2 hours per day, you will complete the course this courseis recorded 
13.We have a support team to clear doubts instantly
14.Yes, we provide various payment plans
15.Yes ,we provide course completion certificate
16.Yes, our courses are accessible from outside India """
}

### Check if the LLM's response agrees with or disagrees with the expert answer

This evaluation prompt is from the [OpenAI evals](https://github.com/openai/evals/blob/main/evals/registry/modelgraded/fact.yaml) project.

[BLEU score](https://en.wikipedia.org/wiki/BLEU): another way to evaluate whether two pieces of text are similar or not.

In [10]:
def eval_vs_ideal(test_set, assistant_answer):

    cust_msg = test_set['customer_msg']
    ideal = test_set['ideal_answer']
    completion = assistant_answer
    
    system_message = """\
    You are an assistant that evaluates how well the customer service agent \
    answers a user question by comparing the response to the ideal (expert) response
    Output a single letter and nothing else. 
    """

    user_message = f"""\
    You are comparing a submitted answer to an expert answer on a given question. Here is the data:
    [BEGIN DATA]
    ************
    [Question]: {cust_msg}
    ************
    [Expert]: {ideal}
    ************
    [Submission]: {completion}
    ************
    [END DATA]

    Compare the factual content of the submitted answer with the expert answer. Ignore any differences in style, grammar, or punctuation.
    The submitted answer may either be a subset or superset of the expert answer, or it may conflict with it. Determine which case applies. Answer the question by selecting one of the following options:
    (A) The submitted answer is a subset of the expert answer and is fully consistent with it.
    (B) The submitted answer is a superset of the expert answer and is fully consistent with it.
    (C) The submitted answer contains all the same details as the expert answer.
    (D) There is a disagreement between the submitted answer and the expert answer.
    (E) The answers differ, but these differences don't matter from the perspective of factuality.
    (F) There is a slight difference between the submitted answer and the expert answer but it is acceptable
    choice_strings: ABCDEF
    """

    messages = [
        {'role': 'system', 'content': system_message},
        {'role': 'user', 'content': user_message}
    ]

    response = get_completion_from_messages(messages)
    return response

In [11]:
assistant_answer

{'few_shot_assistant_1': 'How can I help you?',
 'few_shot_assistant_2': 'Did you attend our free webinar?',
 'few_shot_assistant_3': 'How did you Know about us? Facebook,Instagram,Youtube',
 'few_shot_assistant_4': 'ok,I would like to know your name.',
 'few_shot_assistant_5': 'ok ,Ramisha, I request you to attend our free webinar. You will know about the syllabus, fee details, we provide support for 3 years, life time access and all other information about the course',
 'few_shot_assistant_6': 'ok,I would like to know your name.',
 'few_shot_assistant_7': 'Okay, Ramisha Rani, during the webinar, I mentioned that you should book a 1:1 call to discuss more about the course. If you encounter any issues while booking the slot, please let me know.',
 'few_shot_assistant_8': 'Then, Please book a call',
 'few_shot_assistant_9': 'Please share the issue you are facing',
 'few_shot_assistant_10': 'Yes,we are providing',
 'few_shot_assistant_11': 'Machine learning , Deep learning , Natural lang

In [12]:
eval_vs_ideal(test_set_ideal, assistant_answer)

'D'