# Text Generation Example

## Introduction

In this notebook, we'll walk-through a detailed example of how you can use Valor to evaluate LLM's.

For a conceptual introduction to Valor, [check out our project overview](https://striveworks.github.io/valor/). For a higher-level example notebook, [check out our "Getting Started" notebook](https://github.com/Striveworks/valor/blob/main/examples/getting_started.ipynb).

In [1]:
import json
import torch
from transformers import pipeline
from valor_lite.text_generation import Evaluator, QueryResponse, Context, MetricType
from dotenv import load_dotenv

load_dotenv()

  from .autonotebook import tqdm as notebook_tqdm


True

## Set up an LLM using Huggingface.

In [2]:
class LlamaWrapper:

    def __init__(
        self,
        model_name: str = "meta-llama/Llama-3.2-1B-Instruct",
    ) -> None:
        self.model_name = model_name
        self.pipe = pipeline(
            "text-generation", 
            model=model_name, 
            torch_dtype=torch.bfloat16, 
            device_map="auto"
        )

    def __call__(
        self,
        messages: list[dict[str, str]],
    ) -> str:
        output = self.pipe(messages, max_new_tokens=256)
        return output[0]['generated_text'][-1]["content"]

In [3]:
client = LlamaWrapper()

In [4]:
client([{"role": "user", "content": "Who are you?"}])

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


'I\'m an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."'

# Now, lets evaluate a query!

First, lets choose a model to perform the evaluation requests.

In [5]:
evaluator = Evaluator.openai()
# evaluator = Evaluator.mistral()
# evaluator = Evaluator(client=LlamaWrapper())

In [6]:
query = QueryResponse(
    query="Did John Adams get along with Alexander Hamilton?",
    response="Based on the provided context, John Adams and Alexander Hamilton did not get along. John Adams, during his presidency, had grown independent of his cabinet, often making decisions despite opposition from it. Hamilton, who was accustomed to being regularly consulted by Washington, sent Adams a detailed letter with policy suggestions after his inauguration, which Adams dismissively ignored.\n",
    context=Context(
        groundtruth=[
            "John Adams and Alexander Hamilton did not get along. John Adams had grown independent of his cabinet, often making decisions despite opposition from it.\n",
        ],
        prediction=[
            """Although aware of Hamilton\'s influence, Adams was convinced that their retention ensured a smoother succession. Adams maintained the economic programs of Hamilton, who regularly consulted with key cabinet members, especially the powerful Treasury Secretary, Oliver Wolcott Jr. Adams was in other respects quite independent of his cabinet, often making decisions despite opposition from it. Hamilton had grown accustomed to being regularly consulted by Washington. Shortly after Adams was inaugurated, Hamilton sent him a detailed letter with policy suggestions. Adams dismissively ignored it.\n\nFailed peace commission and XYZ affair\nHistorian Joseph Ellis writes that "[t]he Adams presidency was destined to be dominated by a single question of American policy to an extent seldom if ever encountered by any succeeding occupant of the office." That question was whether to make war with France or find peace. Britain and France were at war as a result of the French Revolution. Hamilton and the Federalists strongly favored the British monarchy against what they denounced as the political radicalism and anti-religious frenzy of the French Revolution. Jefferson and the Republicans, with their firm opposition to monarchy, strongly supported the French overthrowing their king. The French had supported Jefferson for president in 1796 and became belligerent at his loss.""",
            """Led by Revolutionary War veteran John Fries, rural German-speaking farmers protested what they saw as a threat to their liberties. They intimidated tax collectors, who often found themselves unable to go about their business. The disturbance was quickly ended with Hamilton leading the army to restore peace.Fries and two other leaders were arrested, found guilty of treason, and sentenced to hang. They appealed to Adams requesting a pardon. The cabinet unanimously advised Adams to refuse, but he instead granted the pardon, arguing the men had instigated a mere riot as opposed to a rebellion. In his pamphlet attacking Adams before the election, Hamilton wrote that \"it was impossible to commit a greater error.\"\n\nFederalist divisions and peace\nOn May 5, 1800, Adams's frustrations with the Hamilton wing of the party exploded during a meeting with McHenry, a Hamilton loyalist who was universally regarded, even by Hamilton, as an inept Secretary of War. Adams accused him of subservience to Hamilton and declared that he would rather serve as Jefferson's vice president or minister at The Hague than be beholden to Hamilton for the presidency. McHenry offered to resign at once, and Adams accepted. On May 10, he asked Pickering to resign.""",
            """Indeed, Adams did not consider himself a strong member of the Federalist Party. He had remarked that Hamilton\'s economic program, centered around banks, would "swindle" the poor and unleash the "gangrene of avarice." Desiring "a more pliant president than Adams," Hamilton maneuvered to tip the election to Pinckney. He coerced South Carolina Federalist electors, pledged to vote for "favorite son" Pinckney, to scatter their second votes among candidates other than Adams. Hamilton\'s scheme was undone when several New England state electors heard of it and agreed not to vote for Pinckney. Adams wrote shortly after the election that Hamilton was a "proud Spirited, conceited, aspiring Mortal always pretending to Morality, with as debauched Morals as old Franklin who is more his Model than any one I know." Throughout his life, Adams made highly critical statements about Hamilton. He made derogatory references to his womanizing, real or alleged, and slurred him as the "Creole bastard.""",
            """The pair\'s exchange was respectful; Adams promised to do all that he could to restore friendship and cordiality "between People who, tho Seperated [sic] by an Ocean and under different Governments have the Same Language, a Similar Religion and kindred Blood," and the King agreed to "receive with Pleasure, the Assurances of the friendly Dispositions of the United States." The King added that although "he had been the last to consent" to American independence, he had always done what he thought was right. He startled Adams by commenting that "There is an Opinion, among Some People, that you are not the most attached of all Your Countrymen, to the manners of France." Adams replied, "That Opinion sir, is not mistaken... I have no Attachments but to my own Country." King George responded, "An honest Man will never have any other."\nAdams was joined by Abigail in London. Suffering the hostility of the King\'s courtiers, they escaped when they could by seeking out Richard Price, minister of Newington Green Unitarian Church and instigator of the debate over the Revolution within Britain.""",
        ],
    )
)

In [7]:
metric = evaluator.compute_answer_correctness(query)
print(json.dumps(metric.to_dict(), indent=4))

{
    "type": "AnswerCorrectness",
    "value": 0.6666666666666666,
    "parameters": {
        "evaluator": "gpt-3.5-turbo",
        "retries": 0
    }
}


In [8]:
metric = evaluator.compute_answer_relevance(query)
print(json.dumps(metric.to_dict(), indent=4))

{
    "type": "AnswerRelevance",
    "value": 0.16666666666666666,
    "parameters": {
        "evaluator": "gpt-3.5-turbo",
        "retries": 0
    }
}


In [9]:
metric = evaluator.compute_bias(query)
print(json.dumps(metric.to_dict(), indent=4))

{
    "type": "Bias",
    "value": 0.0,
    "parameters": {
        "evaluator": "gpt-3.5-turbo",
        "retries": 0
    }
}


In [10]:
metric = evaluator.compute_sentence_bleu(query)
print(json.dumps(metric.to_dict(), indent=4))

{
    "type": "BLEU",
    "value": 0.3502270395690205,
    "parameters": {
        "weights": [
            0.25,
            0.25,
            0.25,
            0.25
        ]
    }
}


In [11]:
metric = evaluator.compute_context_precision(query)
print(json.dumps(metric.to_dict(), indent=4))

{
    "type": "ContextPrecision",
    "value": 0.8333333333333333,
    "parameters": {
        "evaluator": "gpt-3.5-turbo",
        "retries": 0
    }
}


In [12]:
metric = evaluator.compute_context_recall(query)
print(json.dumps(metric.to_dict(), indent=4))

{
    "type": "ContextRecall",
    "value": 0.6666666666666666,
    "parameters": {
        "evaluator": "gpt-3.5-turbo",
        "retries": 0
    }
}


In [13]:
metric = evaluator.compute_faithfulness(query)
print(json.dumps(metric.to_dict(), indent=4))

{
    "type": "Faithfulness",
    "value": 0.8333333333333334,
    "parameters": {
        "evaluator": "gpt-3.5-turbo",
        "retries": 0
    }
}


In [14]:
metric = evaluator.compute_hallucination(query)
print(json.dumps(metric.to_dict(), indent=4))

{
    "type": "Hallucination",
    "value": 0.5,
    "parameters": {
        "evaluator": "gpt-3.5-turbo",
        "retries": 0
    }
}


In [15]:
metrics = evaluator.compute_rouge(query)
for m in metrics:
    print(json.dumps(m.to_dict(), indent=4))

{
    "type": "ROUGE",
    "value": 0.5925925925925926,
    "parameters": {
        "rouge_type": "rouge1",
        "use_stemmer": false
    }
}
{
    "type": "ROUGE",
    "value": 0.5569620253164557,
    "parameters": {
        "rouge_type": "rouge2",
        "use_stemmer": false
    }
}
{
    "type": "ROUGE",
    "value": 0.5925925925925926,
    "parameters": {
        "rouge_type": "rougeL",
        "use_stemmer": false
    }
}
{
    "type": "ROUGE",
    "value": 0.5925925925925926,
    "parameters": {
        "rouge_type": "rougeLsum",
        "use_stemmer": false
    }
}


In [16]:
metric = evaluator.compute_summary_coherence(query)
print(json.dumps(metric.to_dict(), indent=4))

{
    "type": "SummaryCoherence",
    "value": 4,
    "parameters": {
        "evaluator": "gpt-3.5-turbo",
        "retries": 0
    }
}


In [17]:
metric = evaluator.compute_toxicity(query)
print(json.dumps(metric.to_dict(), indent=4))

{
    "type": "Toxicity",
    "value": 0.3333333333333333,
    "parameters": {
        "evaluator": "gpt-3.5-turbo",
        "retries": 0
    }
}
