<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Gen AI Experiments](https://img.shields.io/badge/Gen%20AI%20Experiments-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://github.com/buildfastwithai/gen-ai-experiments)
[![Gen AI Experiments GitHub](https://img.shields.io/github/stars/buildfastwithai/gen-ai-experiments?style=for-the-badge&logo=github&color=gold)](http://github.com/buildfastwithai/gen-ai-experiments)


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1XXSxmmbzlYHjIVZImjadJZ20JW268v6E?usp=sharing)

## Master Generative AI in 8 Weeks
**What You'll Learn:**
- Master cutting-edge AI tools & frameworks
- 6 weeks of hands-on, project-based learning
- Weekly live mentorship sessions

Learn by building. Get expert mentorship and work on real AI projects.
[Start Your Journey](https://www.buildfastwithai.com/genai-course)




# DeepEval



It is an open-source evaluation framework for assessing LLM models across multiple dimensions, including factual consistency, faithfulness, and relevance.

## Setting Up DeepEval

### Installations

In [None]:
!pip install deepeval openai

### Imports

In [60]:
import os
from google.colab import userdata
from deepeval import evaluate
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams, LLMTestCase
from deepeval.dataset import EvaluationDataset
from openai import OpenAI
from deepeval.metrics import HallucinationMetric


### SetUp OpenAI Client

In [38]:
api_key = userdata.get('OPENAI_API_KEY')
os.environ['OPENAI_API_KEY'] = api_key
client = OpenAI(api_key=api_key)

### Simple Chat SetUp  
Model : `gpt-4o`

In [39]:
prompt = "Explain Stock Market"
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ],
    max_tokens=100
)


### LLM Generated Response  

In [40]:
generated_response = response.choices[0].message.content

## Pass Actual Answer

In [41]:
actual_answer ="""The stock market is a marketplace where investors trade shares—units of ownership in publicly listed companies.
Prices move based on supply and demand, influenced by company performance, economic indicators, and market sentiment.
It enables companies to raise capital for growth while offering investors opportunities for profit, though it carries inherent risks."""

## Define Test Case

In [42]:
test_case = LLMTestCase(
    input=prompt,
    actual_output=generated_response ,
    expected_output=actual_answer
)

## **Answer Relevancy :**

* This Metric compares the actual_output of LLM application to the provided input.
* How well the generated response aligns with the intended answer.

In [None]:
relevancy_metric = AnswerRelevancyMetric(threshold=0.5 , model="gpt-4o")
results = evaluate([test_case], metrics=[AnswerRelevancyMetric()])

## GEval Metric
Evaluate LLM responses based on custom criteria like accuracy, completeness, and coherence.

In [51]:
criteria = """Coherence (1-5) - the collective quality of all sentences. We align this dimension with
the DUC quality question of structure and coherence whereby the summary should be
well-structured and well-organized. The summary should not just be a heap of related information, but should build from sentence to sentence to a coherent body of information about a topic."""

coherence_metric = GEval(
    name="Coherence",
    criteria=criteria,
    evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
)

### Test Case With  LLM-Generated Responses

In [58]:
test_case_1 = LLMTestCase(input="Explain stock market ", actual_output="The stock market is basically a big game where people swap company pieces for fun." , expected_output=actual_answer)


### Use G-Eval metric

In [None]:
coherence_metric.measure(test_case_1)
print(coherence_metric.score, coherence_metric.reason)

Score : 0.2 -> GEval identified inaccurate analogy

## Hallucination: Detecting False Information

In [68]:
context = [
    "Google was founded in September 1998 ",
    " Google has become an integral part of our daily lives, serving as the go-to source for information, historical research, entertainment, and communication",

]

In [69]:
# AI Model’s Response (Potential Hallucination)
actual_output = "Google was founded in august 1998 and initially focused on robotics"

In [None]:
# Define the test case
test_case = LLMTestCase(
    input="When was Google founded and what was its focus?",
    actual_output=actual_output,
    context=context
)

# Define the hallucination metric
metric = HallucinationMetric(threshold=0.3)

# Evaluate the response
evaluate(test_cases=[test_case], metrics=[metric])