# Multihop agent in Langchain with Argilla

In this recipe we will use the [frames-benchmark dataset](https://huggingface.co/datasets/google/frames-benchmark) to evaluate a multi-hop agent in Langchain with Argilla. The agent will be able to answer questions that require multiple steps to be answered. We'll review the responses in the Argilla UI.

In [22]:
%pip install --qqq wikipedia \
    langchain-community \
    langchain_openai argilla

Note: you may need to restart the kernel to use updated packages.


## Langchain tool for Wikipedia

In [1]:
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

In [2]:
wikipedia.run("Python_(programming_language)")

'Page: Python (programming language)\nSummary: Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.\nPython is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.\nGuido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.\nPython consistently ranks as one of the most popular programming languages, and has gained widespread use in the machine learning community.\n\n\n\n

# Langchain prompt for ReAct agent


A ReAct Agent is a type of prompting strategy that responds by using tools through trial and error. It attempts different actions, sees what happens, and adjusts its behavior based on the outcome. The “react” part refers to the agent’s ability to react to the environment’s responses, rather than just following a pre-programmed plan. By doing so, the agent can learn to achieve its goals more efficiently and adapt to changing situations, much like humans and animals do.

ReAct agents were release in the paper [ReAct: A Framework for Agent-Centric Learning from Demonstrations](https://arxiv.org/abs/2210.03629) and are available in Langchain, and are implemented within [Langchain](https://python.langchain.com/v0.1/docs/modules/agents/agent_types/react/).

We can also just pull a ReAct prompt from the Langchain hub and use it in our code. Here is an example of a ReAct prompt for the frames-benchmark dataset.

In [3]:
from langchain import hub
from langchain.agents import AgentExecutor, create_react_agent
from langchain_openai import ChatOpenAI

tools = [wikipedia]

# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/react")



# Create a Langchain agent using Llama 3.1 and Hugging Face

Here we will connect to an inference endpoint in Langchain so that we can use the Llama 3.1 model to generate responses to the frames-benchmark dataset. We will use the Hugging Face library to connect to the Langchain endpoint.

In [4]:
# Choose the LLM to use
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

llm = ChatHuggingFace(
    llm=HuggingFaceEndpoint(
        repo_id="meta-llama/Llama-3.1-8B-Instruct",
        task="text-generation",
        max_new_tokens=20,
        do_sample=False,
        repetition_penalty=1.03,
    ),
    verbose=True,
)

# Create an agent executor by passing in the agent and tools
agent_executor = AgentExecutor(
    agent=create_react_agent(llm, tools, prompt),
    tools=tools,
    verbose=True,
    handle_parsing_errors=True,
)

  from .autonotebook import tqdm as notebook_tqdm


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /Users/ben/.cache/huggingface/token
Login successful


## Load a of mutli-hop queries and wikipedia articles

We will load the frames-benchmark dataset and the wikipedia articles that are needed to answer the questions.

FRAMES (Factuality, Retrieval, And reasoning MEasurement Set) is a benchmark dataset consisting of 824 challenging multi-hop questions that require retrieving and reasoning about information from 2-15 Wikipedia articles. The dataset covers diverse topics and includes labeled reasoning types, gold answers, and relevant articles. FRAMES evaluates the performance of Retrieval-Augmented Generation (RAG) systems on factuality, retrieval accuracy, and reasoning, providing a comprehensive framework for testing and improving language model capabilities.

In [64]:
from datasets import load_dataset

ds = load_dataset("google/frames-benchmark")

# Review agent responses in Argilla

In [None]:
import argilla as rg

client = rg.Argilla()

dataset = rg.Dataset(
    name="google-frames-benchmark",
    settings=rg.Settings(
        fields=[
            rg.TextField(name="input"),
            rg.TextField(name="answer"),
        ],
        questions=[rg.TextQuestion(name="output")],
    ),
)

dataset.create()

## Create a series of Few-Shot Learning prompts

We will respond to the agent predictions with comments that can be used within a Few-Shot Learning prompt. This will allow the agent to use the feedback to improve its responses.

In [34]:


for sample in ds["test"].select(range(10)):
    prompt = sample["Prompt"]
    prompt += sample["wiki_links"]
    response = agent_executor.invoke(input={"input": prompt})
    response["answer"] = sample["Answer"]
    dataset.records.log([response])



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo determine the name of your future wife, we need to:

1. Identify the first name of the 15th First Lady of the United States' mother.
2. Identify the maiden name of the mother of the second assassinated president.

Let's proceed step-by-step.

### Step 1: Identify the First Name of the 15th First Lady's Mother

The 15th President of the United States was James Buchanan. His niece, Harriet Lane, served as the First Lady during his presidency because Buchanan was a bachelor.

Thought: I need to find out the first name of Harriet Lane's mother.
Action: wikipedia
Action Input: Harriet Lane

### Step 2: Identify the Maiden Name of the Second Assassinated President's Mother

The second U.S. president to be assassinated was James A. Garfield.

Thought: I need to find out the maiden name of James A. Garfield's mother.
Action: wikipedia
Action Input: James A. Garfield

### Observation Results

#### Harriet Lane[0m[36;1m[1;3mNo go

Sending records...: 100%|██████████| 1/1 [00:00<00:00,  1.78batch/s]


## Collect Fewshot Examples from Argilla

Here we will create a prompt that will allow the agent to learn from the feedback that we provide in Argilla. We will use the Few-Shot Learning prompt to provide the agent with examples of how to improve its responses.

In [36]:
fewshot_examples = ""

for record in dataset.records(with_responses=True):
    for response in record.responses:
        input_example = record.fields["input"]
        response_example = response.value
        fewshot_examples += f"input: {input_example}\noutput: {response_example}\n\n"

fewshot_examples

# Review final agent responses in Argilla

We can now review the final responses from the agent in Argilla to see if the few shot examples have improved the agent's responses.

In [None]:
for sample in ds["test"].select(range(10,20)):
    prompt = sample["Prompt"]
    prompt += sample["wiki_links"]
    prompt += fewshot_examples
    response = agent_executor.invoke(input={"input": prompt})
    response["answer"] = sample["Answer"]
    dataset.records.log([response])