# LangGraph and LangSmith - Agentic RAG Powered by LangChain

In the following notebook we'll complete the following tasks:

- 🤝 Breakout Room #1:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating our Tool Belt
  4. Creating Our State
  5. Creating and Compiling A Graph!
  

- 🤝 Breakout Room #2:
  1. Creating an Evaluation Dataset
  2. Adding Evaluators
  3. Evaluating

# 🤝 Breakout Room #1

## LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effetively allowing us to recreate appliation flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

## Task 1:  Dependencies

We'll first install all our required libraries.

In [14]:
!pip install -qU langchain langchain_openai langgraph arxiv duckduckgo-search


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Task 2: Environment Variables

We'll want to set both our OpenAI API key and our LangSmith environment variables.

In [15]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [16]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE1 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

## Task 3: Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Duck Duck Go Web Search](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/ddg_search)
- [Arxiv](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/arxiv)

####🏗️ Activity #1:

Please add the tools to use into our toolbelt.

> NOTE: Each tool in our toolbelt should be a method.

In [17]:
from langchain_community.tools.ddg_search import DuckDuckGoSearchRun
from langchain_community.tools.arxiv.tool import ArxivQueryRun

tool_belt = [
    DuckDuckGoSearchRun(),
    ArxivQueryRun()
]

In [18]:
## ^ the functions and tools above are local functions

### Actioning with Tools

Now that we've created our tool belt - we need to create a process that will let us leverage them when we need them.

We'll use the built-in [`ToolExecutor`](https://github.com/langchain-ai/langgraph/blob/main/langgraph/prebuilt/tool_executor.py) to do so.

In [19]:
from langgraph.prebuilt import ToolExecutor

tool_executor = ToolExecutor(tool_belt)

### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [20]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0)

In [21]:
#model = ChatOpenAI(model = 'gpt-4-0125-preview', temperature=0)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [22]:
from langchain_core.utils.function_calling import convert_to_openai_function

functions = [convert_to_openai_function(t) for t in tool_belt]
model = model.bind_functions(functions)

In [23]:
#^^ we used to have to write a lot more specific code to define the functions etc. but now this is very low code

#### ❓ Question #1:

How does the model determine which tool to use?

In [24]:
# Answer:
"""

We give it some number of tools we can use and it is going to return the output. I believe a few things are important here. First, I have read that LangChain's ChatOpenAI model has been
specifically trained on function calling. This prior training helps the model to best reasoning on which of the tools (functions) are most appropriate given the user's query.

I also think the convert_to_openai_function is important to help the specific model we are using deploy the functions. I am unclear if this is necessary to help the model to reason as to which function 
is best, or if it simply makes the functions executable with the open AI models.

In short I think the model naming is important and the model (GPT in this case) is made aware of what tools it has and then it 'reasons' to choose the best tool for the job.


"""

"\n\nWe give it some number of tools we can use and it is going to return the output. I believe a few things are important here. First, I have read that LangChain's ChatOpenAI model has been\nspecifically trained on function calling. This prior training helps the model to best reasoning on which of the tools (functions) are most appropriate given the user's query.\n\nI also think the convert_to_openai_function is important to help the specific model we are using deploy the functions. I am unclear if this is necessary to help the model to reason as to which function \nis best, or if it simply makes the functions executable with the open AI models.\n\nIn short I think the model naming is important and the model (GPT in this case) is made aware of what tools it has and then it 'reasons' to choose the best tool for the job.\n\n\n"

## Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [25]:
from typing import TypedDict, Annotated, Sequence
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
  messages: Annotated[Sequence[BaseMessage], operator.add]
  
  ## the operator.add means we are only going to be able to ADD messages

## It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

In [26]:
from langgraph.prebuilt import ToolInvocation
import json
from langchain_core.messages import FunctionMessage

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {"messages" : [response]}

def call_tool(state):
  last_message = state["messages"][-1]

  action = ToolInvocation(
      tool=last_message.additional_kwargs["function_call"]["name"],
      tool_input=json.loads(
          last_message.additional_kwargs["function_call"]["arguments"]
      )
  )

  response = tool_executor.invoke(action)

  function_message = FunctionMessage(content=str(response), name=action.tool)

  return {"messages" : [function_message]}

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `call_tool` is a node which will call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


In [27]:
from langgraph.graph import StateGraph, END

workflow = StateGraph(AgentState)

workflow.add_node("agent", call_model)
workflow.add_node("action", call_tool)

Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [28]:
workflow.set_entry_point("agent")

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [29]:
def should_continue(state):
  last_message = state["messages"][-1]

  if "function_call" not in last_message.additional_kwargs:
    return "end"

  return "continue"

workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "continue" : "action",
        "end" : END
    }
)

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [30]:
workflow.add_edge("action", "agent")

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [31]:
app = workflow.compile()

#### ❓ Question #2:

Is there any specific limit to how many times we can cycle?

If not, how could we impose a limit to the number of cycles?

In [13]:
#Answer:

"""

There does not seem to be any specific limit to the number of cycles this code could execute. This is tricky because it is possible this code could create an infinite loop. Therefor I think
we would want to add some sort of max cycles mechanism to the code.


I think we could do that here:

workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "continue" : "action",
        "end" : END
    }
)

We could add something like this in a sort of case/when syntax. Effectively we set max cycles and do not go beyond that.

    cycle_count += 1
    if cycle_count >= MAX_CYCLES:  # Replace MAX_CYCLES with however many cycles we wish to max out at
        return "end"

    return "continue"



"""

'\n\nThere does not seem to be any specific limit to the number of cycles this code could execute. This is tricky because it is possible this code could create an infinite loop. Therefor I think\nwe would want to add some sort of max cycles mechanism to the code.\n\n\nI think we could do that here:\n\nworkflow.add_conditional_edges(\n    "agent",\n    should_continue,\n    {\n        "continue" : "action",\n        "end" : END\n    }\n)\n\nWe could add something like this in a sort of case/when syntax. Effectively we set max cycles and do not go beyond that.\n\n    cycle_count += 1\n    if cycle_count >= MAX_CYCLES:  # Replace MAX_CYCLES with however many cycles we wish to max out at\n        return "end"\n\n    return "continue"\n\n\n\n'

## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [32]:
from langchain_core.messages import HumanMessage

inputs = {"messages" : [HumanMessage(content="What is RAG in the context of Large Language Models? When did it break onto the scene?")]}

app.invoke(inputs)

{'messages': [HumanMessage(content='What is RAG in the context of Large Language Models? When did it break onto the scene?'),
  AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"query":"RAG in the context of Large Language Models"}', 'name': 'duckduckgo_search'}}, response_metadata={'token_usage': {'completion_tokens': 25, 'prompt_tokens': 171, 'total_tokens': 196}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3bc1b5746c', 'finish_reason': 'function_call', 'logprobs': None}),
  FunctionMessage(content='RAG stands for R etrieval- A ugmented G eneration. RAG enables large language models (LLM) to access and utilize up-to-date information. Hence, it improves the quality of relevance of the response from LLM. Below is a simple diagram of how RAG is implemented. Key Takeaways. RAG is a relatively new artificial intelligence technique that can improve the quality of generative AI by allowing large language model (LLMs) to tap additional data resources wit

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "function_call" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "function_call" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

Now let's look at an example that shows a multiple tool usage - all with the same flow!

In [33]:
inputs = {"messages" : [HumanMessage(content="What is QLoRA in Machine Learning? Are their any papers that could help me understand? Once you have that information, can you look up the bio of the first author on the QLoRA paper?")]}

app.invoke(inputs)

{'messages': [HumanMessage(content='What is QLoRA in Machine Learning? Are their any papers that could help me understand? Once you have that information, can you look up the bio of the first author on the QLoRA paper?'),
  AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"query":"QLoRA in Machine Learning"}', 'name': 'duckduckgo_search'}}, response_metadata={'token_usage': {'completion_tokens': 22, 'prompt_tokens': 193, 'total_tokens': 215}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3bc1b5746c', 'finish_reason': 'function_call', 'logprobs': None}),
  FunctionMessage(content="Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Balancing this tradeoff is a 

####🏗️ Activity #2:

Please write out the steps the agent took to arrive at the correct answer.

In [34]:
#Answer:
"""
First, the state object was populated with the request of our human message asking about any QLoRA in machine learning and asking if there were any papers about this
Next, the state object was passed into the entry agent node and an AImessage was added to the state object, which was then passed along a conditional edge
Next the conditional edge saw that 'function call' was present in the state object so sent the state object on to the action node (as opposed to the end node)
Next, the query of "QLoRA in Machine Learning" was executed by the action node for duck duck go search
The action node then added the duck duck go response to the state object and passed this BACK to the agent

Next the agent created a second AImessage and added this to the state object. This message was passed along the conditional edge
The conditional edge again sees that 'function call' is present and passes the state object on to the action node
The action node queried Arxiv for "QLoRA in machine learning; Arxiv is used as the tool for this action
The action node added the relevant Arxiv context to the state object and passed this back to the agent
The agent then added a 3rd AImessage to the state object and passes this to the conditional edge
The conditional edge again sees the term function call is present and passes the state object back to the action node
The action node queries Tim Dettmer's bio using duck duck go search as the tool
The action node added the relevant Tim Dettmer context to the state object and passed this back to the agent
The agent added in the Tim Dettmer response to the state object and passed it back along the conditional edge
The conditional edge received the state object and this time couldn't find function call in the state object so it was time to be done and return a response!






"""

'\nFirst, the state object was populated with the request of our human message asking about any QLoRA in machine learning and asking if there were any papers about this\nNext, the state object was passed into the entry agent node and an AImessage was added to the state object, which was then passed along a conditional edge\nNext the conditional edge saw that \'function call\' was present in the state object so sent the state object on to the action node (as opposed to the end node)\nNext, the query of "QLoRA in Machine Learning" was executed by the action node for duck duck go search\nThe action node then added the duck duck go response to the state object and passed this BACK to the agent\n\nNext the agent created a second AImessage and added this to the state object. This message was passed along the conditional edge\nThe conditional edge again sees that \'function call\' is present and passes the state object on to the action node\nThe action node queried Arxiv for "QLoRA in machine

### Pre-processing for LangSmith

To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

In [39]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["question"])]}

def parse_output(input_state):
  return input_state["messages"][-1].content

agent_chain = convert_inputs | app | parse_output

In [40]:
agent_chain.invoke({"question" : "What is RAG?"})

"RAG stands for Retrieval-augmented generation (RAG). It is an AI framework that improves the quality of language model-generated responses by grounding the model on external sources of knowledge to supplement the model's internal representation of information. RAG ensures that the model has access to current and reliable facts, enhancing the generative AI models with facts from external sources."

# 🤝 Breakout Room #2

## Task 1: Creating An Evaluation Dataset

Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

```python
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]
```

####🏗️ Activity #3:

Please create a dataset in the above format with at least 5 questions.

In [51]:
questions = [
    "What is machine learning?",
    "Who founded the company apple?",
    "What is a GPU?",
    "What is python language?"

]

answers = [
    {"must_mention" : ["predictive", "model"]},
    {"must_mention" : ["Steve", "Jobs"]},
    {"must_mention" : ["graphical", "processing", "compute"]},
    {"must_mention" : ["programming"]},

]

Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

In [52]:
from langsmith import Client

client = Client()
dataset_name = f"Retrieval Augmented Generation - Evaluation Dataset - {uuid4().hex[0:8]}"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about data science and tech."
)

client.create_examples(
    inputs=[{"question" : q} for q in questions],
    outputs=answers,
    dataset_id=dataset.id,
)

#### ❓ Question #3:

How are the correct answers associated with the questions?

> NOTE: Feel free to indicate if this is problematic or not

In [53]:
#Answer


"""

Pur LLM will create a response to each question and we are using a binary method of evaluating whether or not that answer is correct.

Basically if the exact string from our 'must mention' is present in the response, than it is deemed 'correct'. Else it is deemed 'incorrect'

This does not seem like a great evaluation methodology. At a basic level it might be decent but it would be subject to lots of problems. For instance if I write very relaxed rules, then it will find many things
as 'correct' which may not be. Like if I said 'What is RAG?' and my must have only had 'model' then that isn't a very high bar. On the flip side if you write really stringent rules and not all of the words are present, 
the answer may be deemed 'incorrect' when that is not really true.

Also seems there could be issues with exact string matches vs. words that are highly similar but not exact.

Also also - what about capitalization or upper/lower case sensitivity, abbreviations, etc. Again, doesn't seem like a great eval method.


"""

"\n\nPur LLM will create a response to each question and we are using a binary method of evaluating whether or not that answer is correct.\n\nBasically if the exact string from our 'must mention' is present in the response, than it is deemed 'correct'. Else it is deemed 'incorrect'\n\nThis does not seem like a great evaluation methodology. At a basic level it might be decent but it would be subject to lots of problems. For instance if I write very relaxed rules, then it will find many things\nas 'correct' which may not be. Like if I said 'What is RAG?' and my must have only had 'model' then that isn't a very high bar. On the flip side if you write really stringent rules and not all of the words are present, \nthe answer may be deemed 'incorrect' when that is not really true.\n\nAlso seems there could be issues with exact string matches vs. words that are highly similar but not exact.\n\nAlso also - what about capitalization or upper/lower case sensitivity, abbreviations, etc. Again, do

## Task 2: Adding Evaluators

Now we can add a custom evaluator to see if our responses contain the expected information.

We'll be using a fairly naive exact-match process to determine if our response contains specific strings.

In [54]:
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def must_mention(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output") or ""
    required = example.outputs.get("must_mention") or []
    score = all(phrase in prediction for phrase in required)
    return EvaluationResult(key="must_mention", score=score)

#### ❓ Question #4:

What are some ways you could improve this metric as-is?

> NOTE: Alternatively you can suggest where gaps exist in this method.

In [55]:
#Answer

#exact match an issue, nf4 vs. NF4 example

"""

1) One obvious improvement would be to add a .lower to the output abd the must mentions.
2) Another method would be to give partial credit? like if you set 5 must mention words and 4 are mentioned but the 5th one isn't, that should count as a .8 and not 0
3) Another method would be to give credit for word similarity. In my above example of What is machine learning?, I had 'predictive' and 'model' in my must mention. Well model was mentioned
but 'predictive' was not. The word predictions, however, was mentioned



"""

"\n\nOne obvious improvement would be to add a .lower to the output abd the must mentions.\nAnother method would be to give partial credit? like if you set 5 must mention words and 4 are mentioned but the 5th one isn't, that should count as a .8 and not 0\n\n\n\n"

Now that we have created our custom evaluator - let's initialize our `RunEvalConfig` with it, and a few others:

- `"criteria"` includes the default criteria which, in this case, means "helpfulness"
- `"cot_qa"` includes a criteria that bases whether or not the answer is correct by utilizing a Chain of Thought prompt and the provided context to determine if the response is correct or not.

In [56]:
from langchain.smith import RunEvalConfig, run_on_dataset

eval_config = RunEvalConfig(
    custom_evaluators=[must_mention],
    evaluators=[
        "criteria",
        "cot_qa",
    ],
)

Task 3: Evaluating

All that is left to do is evaluate our agent's response!

In [57]:
client.run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=agent_chain,
    evaluation=eval_config,
    verbose=True,
    project_name=f"RAG Pipeline - Evaluation - {uuid4().hex[0:8]}",
    project_metadata={"version": "1.0.0"},
)

View the evaluation results for project 'RAG Pipeline - Evaluation - 98066a11' at:
https://smith.langchain.com/o/74600b6d-60cd-531f-91ed-9798b6cba9aa/datasets/544c1c0e-e996-4bc7-9852-d8eed8e71e90/compare?selectedSessions=ab17a0f9-2ef0-401b-9026-fabd97381e7d

View all tests for Dataset Retrieval Augmented Generation - Evaluation Dataset - 92b35fc8 at:
https://smith.langchain.com/o/74600b6d-60cd-531f-91ed-9798b6cba9aa/datasets/544c1c0e-e996-4bc7-9852-d8eed8e71e90
[------------------------------------------------->] 4/4

Unnamed: 0,feedback.helpfulness,feedback.COT Contextual Accuracy,feedback.must_mention,error,execution_time,run_id
count,4.0,4.0,4,0.0,4.0,4
unique,,,2,0.0,,4
top,,,False,,,2d2e2c3d-6319-4175-865b-0af8b11d20c5
freq,,,2,,,1
mean,1.0,1.0,,,1.986474,
std,0.0,0.0,,,0.718821,
min,1.0,1.0,,,1.204407,
25%,1.0,1.0,,,1.608403,
50%,1.0,1.0,,,1.91095,
75%,1.0,1.0,,,2.289021,


{'project_name': 'RAG Pipeline - Evaluation - 98066a11',
 'results': {'87651235-e53a-4597-821c-6cab6041675a': {'input': {'question': 'What is machine learning?'},
   'feedback': [EvaluationResult(key='helpfulness', score=1, value='Y', comment='The criterion for this task is "helpfulness". \n\nThe submission provides a clear and concise definition of machine learning, explaining that it is a subset of artificial intelligence that focuses on the development of algorithms and models. It also explains how machine learning works, stating that computers are trained to recognize patterns in data and make intelligent decisions without being explicitly programmed to do so. \n\nThe submission also provides examples of where machine learning can be applied, such as image and speech recognition, natural language processing, healthcare, and finance. This information is helpful for someone who wants to understand what machine learning is and where it can be used.\n\nTherefore, the submission is help