# Basic RAG with Model-graded Eval

In this example we'll build a simple RAG application on Volume 7 of History of the United States of America, 
and evaluate it across 
* **relevance** -- does the answer make sense in context of the original question?, 
* **faithfulness** -- is the final answer faithful to the data that we fed into the LLM?
* **coherence** -- is the answer consistent and easy to understand?

We'll use AIConfig to manage and iterate on all our prompts, both for the generation step of the RAG pipeline, as well as its evaluation.

## Install dependencies

Create .env file containing the following line:
`OPENAI_API_KEY=<your key here>`
> You can get your key from https://platform.openai.com/api-keys 


In [1]:
%pip install python-aiconfig==1.1.20
%pip install chromadb

import dotenv
dotenv.load_dotenv()

Collecting importlib-metadata<6.0.0,>=5.0.0 (from json-spec->jsoncomment==0.4.2->lastmile-utils==0.0.21->python-aiconfig==1.1.20)
  Using cached importlib_metadata-5.2.0-py3-none-any.whl (21 kB)
Installing collected packages: importlib-metadata
  Attempting uninstall: importlib-metadata
    Found existing installation: importlib-metadata 6.11.0
    Uninstalling importlib-metadata-6.11.0:
      Successfully uninstalled importlib-metadata-6.11.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opentelemetry-api 1.22.0 requires importlib-metadata<7.0,>=6.0, but you have importlib-metadata 5.2.0 which is incompatible.[0m[31m
[0mSuccessfully installed importlib-metadata-5.2.0
Note: you may need to restart the kernel to use updated packages.
Collecting importlib-metadata<7.0,>=6.0 (from opentelemetry-api>=1.2.0->chromadb)
  Using cached importlib_metadata-6.11

True

In [2]:
import argparse
import asyncio
import os
import sys
from aiconfig import AIConfigRuntime
import chromadb
from glob import glob


  from .autonotebook import tqdm as notebook_tqdm


## Download the raw data
Fetch Volume 7 of the History of the United States of America (our raw unstructured dataset)

In [3]:
!mkdir -p data/books/
!curl -o data/books/pg72846.txt https://www.gutenberg.org/cache/epub/72846/pg72846.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  636k  100  636k    0     0  2725k      0 --:--:-- --:--:-- --:--:-- 2742k


In [4]:
!head data/books/pg72846.txt

The Project Gutenberg eBook of History of the United States of America, Volume 7 (of 9)
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.



In [5]:
collection_name="us_history_volume_7"
chromadb_path="chroma_2.db"

## RAG Data Ingestion & Indexing
Chunk the data and ingest it into a Chroma DB collection.

> We use a very naive text splitting strategy with fixed-size chunks. For a production environment, this step will be critical to optimize.

**Note:** You can also run this as a CLI script using the command 
```
!python rag.py ingest `data/books/` --chroma-collection-name us_history_volume_7
```

In [6]:
def chunk_markdown(text, chunk_size=1000):
    chunks = []
    for i in range(0, len(text), chunk_size):
        yield text[i : i + chunk_size]
    return chunks

In [7]:
async def run_ingest(directory, collection_name):
    chroma_client = chromadb.PersistentClient(path=chromadb_path)
    collection = chroma_client.create_collection(name=collection_name)

    for i, filename in enumerate(glob(f"{directory}/**/*", recursive=True)):
        print("Ingesting:", i, filename)
        documents = []
        metadatas = []
        ids = []

        with open(filename, "r") as f:
            data = f.read()
            for j, chunk in enumerate(chunk_markdown(data)):
                documents.append(chunk)
                metadatas.append({"source": filename})
                ids.append(f"doc_{i}_chunk{j}")

        collection.add(documents=documents, metadatas=metadatas, ids=ids)

In [8]:
try:
    await run_ingest(directory="data/books", collection_name=collection_name)
except Exception as e:
    print(f"Ingest failed: {e}.\nIf the collection exists already, this is fine.")

Ingest failed: Collection us_history_volume_7 already exists.
If the collection exists already, this is fine.


## RAG Query & Response Generation
Query the index for context given a user-supplied question, and use that context to generate a response

**Note:** You can also run this as a CLI script using the Example command: 
```
!python rag.py query "In July, flour sold at Boston for _?" -k=10 --chroma-collection-name us_history_volume_7
```

In [9]:
def retrieve_data(collection, query, k):
    print("Querying for:", query)
    context = collection.query(query_texts=[query], n_results=k)
    return context


def serialize_retrieved_data(data):
    # print("Serializing data:", type(data), data)
    return "\n".join(f"{k}={v}" for k, v in data.items())


async def generate(query, context, query_index):
    config = AIConfigRuntime.load("rag.aiconfig.yaml")
    # params = {
    #     f"query{query_index}": query, 
    #     f"context{query_index}": context
    # }
    params = {
        "query": query, 
        "context": context
    }
    print("Running generate with params:", params)
    # prompt = f"generate_{query_index}"
    prompt = "generate"
    return await config.run_and_get_output_text(
        prompt, params=params
    )

async def run_query(query, collection_name, k, query_index):
    chroma_client = chromadb.PersistentClient(path=chromadb_path)
    collection = chroma_client.get_collection(name=collection_name)
    data = retrieve_data(collection, query, k)
    print("Retrieved data:\n", "\n".join(data["documents"][0]))
    context = serialize_retrieved_data(data)
    result = await generate(query, context, query_index)
    print("\n\nResponse:\n", result)

    return (query, context, result)

In [10]:
queries = [
    "What was the price of flour sold in Boston?",
    "When and why did the british Blockade happen?" 
]

query_index = 1

In [11]:
query, context, result = await run_query(
    queries[query_index], collection_name, k=10, 
    query_index=query_index
)

Querying for: When and why did the british Blockade happen?


[0;93m2024-02-08 18:03:42.566317 [W:onnxruntime:, helper.cc:67 IsInputSupported] CoreML does not support input dim > 16384. Input:embeddings.word_embeddings.weight, shape: {30522,384}[m
[0;93m2024-02-08 18:03:42.566774 [W:onnxruntime:, coreml_execution_provider.cc:81 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 49 number of nodes in the graph: 323 number of nodes supported by CoreML: 231[m


Retrieved data:
 antic during the
winter months.

With it went the tale of Napoleon’s immense disaster. October 23 he
began his retreat; November 23 he succeeded in crossing the Beresina
and escaping capture; December 5 he abandoned what was still left of
his army; and December 19, after travelling secretly and without rest
across Europe, he appeared suddenly in Paris, still powerful, but in
danger. Nothing could be better calculated to support the Russian
mediation in the President’s mind. The possibility of remaining without
a friend in the world while carrying on a war without hope of success,
gave to the Czar’s friendship a value altogether new.

Other news crossed the ocean at the same time, but encouraged no hope
that England would give way. First in importance, and not to be trifled
with, was the British official announcement, dated December 26, 1812,
of the blockade of the Chesapeake and Delaware. Americans held that
this blockade was illegal,[18]--a blockade of a coast, not of

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




Response:
 The British blockade happened in December 1812. It was announced by the British government on December 26, 1812. The blockade of the Chesapeake and Delaware was considered illegal by Americans. The blockade was one of the grievances against which the war was waged. It caused a significant impact on the American economy, as it ceased the export of American produce from the Chesapeake and Delaware. The blockade became more stringent as time went on, and it was enforced with energy by the British navy. The British blockade and other measures taken by the British government, such as issuing licenses for importation of necessary supplies, caused significant disruption and annoyance to the American people.


## Evaluate the response
Run evals on the responses across 
* **relevance** -- does the answer make sense in context of the original question?, 
* **faithfulness** -- is the final answer faithful to the data that we fed into the LLM?
* **coherence** -- is the answer consistent and easy to understand?
* **succinctness** -- does the answer contain unnecessary information?

In [12]:
async def run_evals(query, context, answer, query_index):
    config = AIConfigRuntime.load("rag.aiconfig.yaml")
    def _get_prompt(criterion):
        # return f"evaluate_{criterion}_{query_index}"
        return f"evaluate_{criterion}"
    return [
        await config.run_and_get_output_text(
            _get_prompt(criterion),
            # params={
            #     f"query_{query_index}": query,
            #     f"context_{query_index}": context,
            #     f"generate_{query_index}": {
            #         "output": answer,
            #     }
            # },
            params={
                "query": query,
                "context": context,
                "generate": {
                    "output": answer
                }
            }
        )
        for criterion in [
            "relevance", "faithfulness", "coherence", "succinctness"
        ]
    ]


In [13]:
print(f"Evaluating...Query: {query} \n Answer: {result}")
evals = await run_evals(query, context, result, query_index)
print("Evaluations:")
for criterion, score in zip(
    ["relevance", "faithfulness", "coherence", "succinctness"], 
    evals
):
    print(f"\n\n{criterion}: {score}")


Evaluating...Query: When and why did the british Blockade happen? 
 Answer: The British blockade happened in December 1812. It was announced by the British government on December 26, 1812. The blockade of the Chesapeake and Delaware was considered illegal by Americans. The blockade was one of the grievances against which the war was waged. It caused a significant impact on the American economy, as it ceased the export of American produce from the Chesapeake and Delaware. The blockade became more stringent as time went on, and it was enforced with energy by the British navy. The British blockade and other measures taken by the British government, such as issuing licenses for importation of necessary supplies, caused significant disruption and annoyance to the American people.
Evaluations:


relevance: Yes, the answer satisfactorily answers the question. It mentions when the British Blockade happened (December 1812) and explains why it happened, along with its effects on the American eco

In [14]:
!python3 rag.py info




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Starting info
Available Chroma Collections: [Collection(name=us_history_volume_7)]
