[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb)

In [9]:
!pip install  pinecone-client==2.2.4 
# \
    # langchain==0.0.292 \
    # openai==0.27.4 \
    # datasets==2.10.1 \
    # pinecone-client==2.2.4 \
    # tiktoken==0.5.1

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [1]:
# !pip uninstall openai==0.28.0 --   chromadb, faiss, pinecone, weaviate
from pprint import pprint
from dotenv import dotenv_values
import openai

env_vars = dotenv_values('.env')
openai.api_key = env_vars.get('OPENAI_API_KEY')

### Building a Chatbot (no RAG)

In [1]:
import os
from langchain.chat_models import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or "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"

chat = ChatOpenAI(
    openai_api_key="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",
    model='gpt-3.5-turbo'
)

  warn_deprecated(


Chats with OpenAI's `gpt-3.5-turbo` and `gpt-4` chat models are typically structured (in plain text) like this:

```
System: You are a helpful assistant.

User: Hi AI, how are you today?

Assistant: I'm great thank you. How can I help you?

User: I'd like to understand string theory.

Assistant:
```

The final `"Assistant:"` without a response is what would prompt the model to continue the conversation. In the official OpenAI `ChatCompletion` endpoint these would be passed to the model in a format like:

```python
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi AI, how are you today?"},
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    {"role": "user", "content": "I'd like to understand string theory."}
]
```

In LangChain there is a slightly different format. We use three _message_ objects like so:

In [None]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]

The format is very similar, we're just swapped the role of `"user"` for `HumanMessage`, and the role of `"assistant"` for `AIMessage`.

We generate the next response from the AI by passing these messages to the `ChatOpenAI` object.

In [None]:
res = chat(messages)
res

AIMessage(content='String theory is a theoretical framework in physics that attempts to explain the fundamental nature of particles and forces in the universe. It suggests that the most basic building blocks of the universe are not point-like particles, as traditionally thought, but rather tiny, vibrating strings. These strings can have different vibrational modes, which correspond to different particles and forces.\n\nString theory also proposes that there are additional spatial dimensions beyond the familiar three dimensions of space and one dimension of time. These extra dimensions are compactified, meaning they are curled up and not directly observable at everyday scales.\n\nString theory has not yet been experimentally confirmed, and it remains a highly speculative and complex area of research in theoretical physics. Scientists are still working to develop the theory and understand its implications for the nature of reality.', additional_kwargs={}, example=False)

In response we get another AI message object. We can print it more clearly like so:

In [None]:
print(res.content)

String theory is a theoretical framework in physics that attempts to explain the fundamental nature of particles and forces in the universe. It suggests that the most basic building blocks of the universe are not point-like particles, as traditionally thought, but rather tiny, vibrating strings. These strings can have different vibrational modes, which correspond to different particles and forces.

String theory also proposes that there are additional spatial dimensions beyond the familiar three dimensions of space and one dimension of time. These extra dimensions are compactified, meaning they are curled up and not directly observable at everyday scales.

String theory has not yet been experimentally confirmed, and it remains a highly speculative and complex area of research in theoretical physics. Scientists are still working to develop the theory and understand its implications for the nature of reality.


Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation.

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Why do physicists believe it can produce a 'unified theory'?"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

Physicists believe that string theory has the potential to produce a unified theory of physics because it is a framework that aims to combine all known fundamental forces and particles into a single, coherent description. Currently, there are four fundamental forces in the universe: gravity, electromagnetism, the weak nuclear force, and the strong nuclear force. These forces are described by different theories in physics, such as general relativity for gravity and the Standard Model for the other forces.

One of the main motivations behind string theory is to unify these forces into a single, consistent framework. By positing that the fundamental building blocks of the universe are tiny strings vibrating at different frequencies, string theory offers a way to mathematically describe both quantum mechanics and general relativity within the same framework.

Additionally, string theory has the potential to explain some of the unresolved questions in physics, such as the nature of black ho

We have our chatbot, but as mentioned — the knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. We call this knowledge the _parametric knowledge_ of the model.

By default, LLMs have no access to the external world.

The result of this is very clear when we ask LLMs about more recent information, like about the new (and very popular) Llama 2 LLM.

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="What is so special about Llama 2?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [None]:
print(res.content)

I'm not sure what you are referring to with "Llama 2." Could you please provide more context or clarify your question so I can better assist you?


Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the informaiton, but sometimes an LLM may respond like it _does_ know the answer — and this can be very hard to detect.

OpenAI have since adjusted the behavior for this particular example as we can see below:

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the LLMChain in LangChain?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [None]:
print(res.content)

I apologize for the confusion earlier. As of my last update, I do not have information about a specific concept or technology called "LLMChain" within LangChain. It is possible that it may be a new development or a specialized feature that I am not familiar with. I recommend checking the latest resources or official documentation related to LangChain to get more information about LLMChain. If you have any other questions or need assistance with a different topic, feel free to ask.


There is another way of feeding knowledge into LLMs. It is called _source knowledge_ and it refers to any information fed into the LLM via the prompt. We can try that with the LLMChain question. We can take a description of this object from the LangChain documentation.

In [None]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [None]:
query = "Can you tell me about the LLMChain in LangChain?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now we feed this into our chatbot as we were before.

In [None]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [None]:
print(res.content)

The LLMChain in LangChain is a common type of chain within the LangChain framework for developing applications powered by language models. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. The LLMChain takes multiple input variables, formats them into a prompt using the PromptTemplate, passes that prompt to the model, and then uses the OutputParser (if provided) to parse the output of the language model into a final format.

In the broader context of LangChain, which is designed to create applications that are data-aware and agentic, the LLMChain plays a key role in processing inputs, interacting with language models, and shaping the output for specific use cases. By leveraging the modular components within the LLMChain, developers can create powerful and differentiated applications that integrate language models with other sources of data and enable language models to interact with their environment.


The quality of this answer is phenomenal. This is made possible thanks to the idea of augmented our query with external knowledge (source knowledge). There's just one problem — how do we get this information in the first place?

We learned in the previous chapters about Pinecone and vector databases. Well, they can help us here too. But first, we'll need a dataset.

### Importing the Data

In this task, we will be importing our data. We will be using the Hugging Face Datasets library to load our data. Specifically, we will be using the `"jamescalam/llama-2-arxiv-papers"` dataset. This dataset contains a collection of ArXiv papers which will serve as the external knowledge base for our chatbot.

In [2]:
import PyPDF2
from collections import namedtuple

file_path = '../data/RaptorContract.pdf'

# Load the PDF
with open(file_path, 'rb') as file:
    pdf_reader = PyPDF2.PdfReader(file)
    text = ''
    for page_num in range(len(pdf_reader.pages)):
        page = pdf_reader.pages[page_num]
        text += page.extract_text()

# Split the text into general information and main content
general_info = []
main_content = []
in_main_content = False
for line in text.split('\n'):
    if line.startswith('ARTICLE I'):
        in_main_content = True
    if in_main_content:
        main_content.append(line)
    else:
        general_info.append(line)

# Split the main content into sections
sections = []
current_section = []
for line in main_content:
    if line.startswith('Section'):
        if current_section:
            sections.append('\n'.join(current_section))
        current_section = [line]
    else:
        current_section.append(line)
if current_section:
    sections.append('\n'.join(current_section))

# Define the Page namedtuple
Page = namedtuple("Page", ["id", "page_content", "metadata"])

# Create Page objects for each section
pages = []
for section_num, section in enumerate(sections):
    section_lines = section.split('\n')
    section_metadata = {
        'section_num': section_num,
        # Add any other section-level metadata here
    }
    pages.append(Page(id=section_num, page_content=section, metadata=section_metadata))

In [None]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

In [None]:
dataset[0]

{'doi': '1102.0183',
 'chunk-id': '0',
 'chunk': 'High-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nTechnical Report No. IDSIA-01-11\nJanuary 2011\nIDSIA / USI-SUPSI\nDalle Molle Institute for Arti\x0ccial Intelligence\nGalleria 2, 6928 Manno, Switzerland\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\nTechnical Report No. IDSIA-01-11 1\nHigh-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nJanuary 2011\nAbs

### Task 4: Building the Knowledge Base

In [5]:
import pinecone
import os

# get API key from app.pinecone.io and environment from console
pinecone.init(
    api_key=os.environ.get('PINECONE_API_KEY') or 'c68408ce-46e6-4dfd-831d-b8696b29830e',
    environment=os.environ.get('PINECONE_ENVIRONMENT') or 'gcp-starter'
)

Then we initialize the index. We will be using OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the `dimension` to `1536`.

In [8]:
import time

index_name = 'legal-rag'

if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=1536,
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

index = pinecone.Index(index_name)

ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'content-type': 'text/plain; charset=utf-8', 'x-pinecone-api-version': '2024-04', 'access-control-allow-origin': '*', 'vary': 'origin,access-control-request-method,access-control-request-headers', 'access-control-expose-headers': '*', 'X-Cloud-Trace-Context': '4bb079162df9be3aab459c4b577ab0fd', 'Date': 'Thu, 04 Jul 2024 07:36:00 GMT', 'Server': 'Google Frontend', 'Content-Length': '86', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
HTTP response body: Request failed. Your free plan supports 0 starter indexes. Use a different index type.


Then we connect to the index:

In [None]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.04838,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

Our index is now ready but it's empty. It is a vector index, so it needs vectors. As mentioned, to create these vector embeddings we will OpenAI's `text-embedding-ada-002` model — we can access it via LangChain like so:

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002", openai_api_key="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")

Using this model we can create embeddings like so:

In [None]:
# texts = [
#     'this is the first chunk of text',
#     'then another second chunk of text is here'
# ]

# res = embed_model.embed_documents(texts)
# len(res), len(res[0])

From this we get two (aligning to our two chunks of text) 1536-dimensional embeddings.

We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [None]:
from tqdm.auto import tqdm  # for progress bar

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

  0%|          | 0/49 [00:00<?, ?it/s]

We can check that the vector index has been populated using `describe_index_stats` like before:

In [None]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.04838,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

## Retrieval Augmented Generation

We've built a fully-fledged knowledge base. Now it's time to connect that knowledge base to our chatbot. To do that we'll be diving back into LangChain and reusing our template prompt from earlier.

To use LangChain here we need to load the LangChain abstraction for a vector index, called a `vectorstore`. We pass in our vector `index` to initialize the object.

In [None]:
from langchain.vectorstores import Pinecone

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)



Using this `vectorstore` we can already query the index and see if we have any relevant information given our question about Llama 2.

In [None]:
query = "What is so special about Llama 2?"

source_documents = vectorstore.similarity_search(query, k=3)
source_documents

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tun

We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to connect the output from our `vectorstore` to our `chat` chatbot. To do that we can use the same logic as we used earlier.

In [None]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.
    Contexts:
    {source_knowledge}
    Query: {query}"""
    return augmented_prompt

Using this we produce an augmented prompt:

In [None]:
print(augment_prompt(query))

Using the contexts below, answer the query.
    Contexts:
    Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang
Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang
Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov Thomas Scialom
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
ourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety
asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalignwithh

There is still a lot of text here, so let's pass it onto our chat model to see how it performs.

In [None]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) developed by the researchers. These LLMs range in scale from 7 billion to 70 billion parameters and are optimized for dialogue use cases. The Llama 2 models, such as L/l.sc/a.sc/m.sc/a.sc/two.taboldstyle and L/l.sc/a.sc/m.sc/a.sc/two.taboldstyle-C/h.sc/a.sc/t.sc, outperform existing open-source chat models on various benchmarks tested. Additionally, based on humane evaluations for helpfulness and safety, the Llama 2 models may serve as suitable substitutes for closed-source models.

The researchers provide a detailed description of their approach to fine-tuning and safety in Llama 2, highlighting the performance and usability enhancements achieved through this process. The Llama 2 models are designed to align with human preferences, which enhances their usability and safety. This fine-tuning step can be resource-intensive in terms of both compute and human annotation, but it contributes to the effectivene

We can continue with more Llama 2 questions. Let's try _without_ RAG first:

In [None]:
prompt = HumanMessage(
    content="what safety measures were used in the development of llama 2?"
)

res = chat(messages + [prompt])
print(res.content)

In the development of Llama 2, a family of pretrained and fine-tuned large language models (LLMs), safety measures were implemented to ensure the models were optimized for dialogue use cases while also addressing safety concerns. Some of the safety measures used in the development of Llama 2 include:

1. Fine-tuning approach: The fine-tuned LLMs in Llama 2 were optimized for dialogue use cases, and the fine-tuning process likely included considerations for both model performance and safety. Fine-tuning involves adjusting the model's parameters to improve its performance on specific tasks while potentially addressing safety concerns.

2. Human evaluations: The safety of the Llama 2 models was assessed through human evaluations for helpfulness and safety. These evaluations involved human annotators assessing the model's responses for appropriateness, accuracy, and potential harmful content. This step helps ensure that the models produce safe and useful outputs.

3. Comparison with closed

The chatbot is able to respond about Llama 2 thanks to it's conversational history stored in `messages`. However, it doesn't know anything about the safety measures themselves as we have not provided it with that information via the RAG pipeline. Let's try again but with RAG.

In [None]:
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)

Safety measures used in the development of Llama 2 include:

1. Safety-specific data annotation and tuning: Safety-specific data annotation involves labeling data to identify and mitigate potential safety risks in the language models. Tuning refers to adjusting model parameters to improve safety features and reduce harmful outputs.

2. Red-teaming: Red-teaming involves employing a dedicated team to actively identify and address potential vulnerabilities and risks in the language models. This proactive approach helps enhance the safety and reliability of the models.

3. Iterative evaluations: Iterative evaluations involve continuously assessing the performance and safety of the language models throughout the development process. This iterative approach allows for ongoing improvements to be made to ensure the models meet safety standards.

By implementing these safety measures, the developers of Llama 2 aimed to increase the safety of the models and address potential risks associated wit

# RAAGAS EVALUATION

In [None]:
#!pip install ragas==0.0.11
!pip show ragas

Name: ragas
Version: 0.0.11
Summary: 
Home-page: 
Author: 
Author-email: 
License: 
Location: /usr/local/lib/python3.10/dist-packages
Requires: datasets, langchain, numpy, openai, protobuf, pydantic, sentence-transformers, transformers
Required-by: 


In [None]:
from ragas.langchain.evalchain import RagasEvaluatorChain
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_relevancy,
    context_recall,
)

In [None]:
ground_truths = 'Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) developed by the researchers.'
responses = {
    'query': query,
    'result': res.content,
    'source_documents': source_documents,
    'ground_truths': ground_truths
}

In [None]:
# create evaluation metrics
eval_chains = {
    m.name: RagasEvaluatorChain(metric=m)
    for m in [faithfulness, answer_relevancy, context_relevancy, context_recall]
}



config.json:   0%|          | 0.00/647 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/57.4M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/517 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [None]:
# for response in responses:
#   for name, eval_chain in eval_chains.items():
#     score_name = f"{name}_score"
#     print(f"{score_name}: {eval_chain(response)[score_name]}")
for name, eval_chain in eval_chains.items():
    score_name = f"{name}_score"
    print(f"{score_name}: {eval_chain(responses)[score_name]}")

faithfulness_score: 0.0
answer_relevancy_score: 0.8755608591424066
context_ relevancy_score: 0.0647924264272054
context_recall_score: 1.0


In [None]:
import plotly.graph_objects as go

for name, eval_chain in eval_chains.items():
    score_name = f"{name}_score"
    result = eval_chain(responses)
    print(f"{score_name}: {result[score_name]}")

    # Create the data dictionary for plotting
    data = {
        'faithfulness': result.get('faithfulness_score', 0),
        'answer_relevancy': result.get('answer_relevancy_score', 0),
        'context_relevancy': result.get('context_relevancy_score', 0),
        'context_recall': result.get('context_recall_score', 0),
        'answer_correctness': result.get('answer_correctness', 0),
        'answer_similarity': result.get('answer_similarity', 0)
    }

    # Filter out keys with 0 values
    data = {k: v for k, v in data.items() if v != 0}

    # Create the Plotly figure
    fig = go.Figure()
    fig.add_trace(go.Scatterpolar(
        r=list(data.values()),
        theta=list(data.keys()),
        fill='toself',
        name=name
    ))

    fig.update_layout(
        polar=dict(
            radialaxis=dict(
                visible=True,
                range=[0, 1]
            )),
        showlegend=True,
        title=f'Evaluation for {name}',
        width=800,
    )

    fig.show()

faithfulness_score: 0.0


answer_relevancy_score: 0.8755748981436625


context_ relevancy_score: 0.006854867935180664


context_recall_score: 1.0
