# Evaluating RAG quality with MLFlow
This notebook demonstrates how to use MLFlow to evaluate the quality of a Retrieval-Augmented Generation (RAG) system. We will:
- Split, vectorize, and index a text with ChromaDB
- Configure an MLFlow model that queries the vector DB based on a user prompt and summarizes the results
- Compare the output to an expected output with `mlflow.evaluate`.

## Setting up the vector database

In [1]:
# set up chromadb collection
import chromadb
chroma_client = chromadb.Client()
docs = chroma_client.create_collection("retrieval_docs")

For simplicity, we'll restrict our attention to one document—the [MLFlow Concepts](https://mlflow.org/docs/latest/concepts.html) docs. Let's extract the docs and split them by sentence.

In [40]:
# Extract text from https://mlflow.org/docs/latest/concepts.html
import requests
from bs4 import BeautifulSoup

def extract_text(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    # remove script and style elements
    for script in soup(["script", "style"]):
        script.decompose()
        
    # find the header and get all text after it
    text = ''
    start_collecting = False
    for tag in soup.find_all(True):
        if tag.name == 'h1' and tag.text.strip().lower() == 'concepts':
            start_collecting = True
        if start_collecting:
            text += ' ' + tag.get_text()
    # get text
    #text = soup.get_text()

    # split into sentences
    text = text.replace('\n', ' ')
    sentences = text.split('.')
    # remove leading and trailing whitespaces
    sentences = [sentence.strip() for sentence in sentences if sentence]

    return sentences

url = 'https://mlflow.org/docs/latest/concepts.html'
concepts = extract_text(url)

# remove footer/navigation components
concepts = concepts[:-4]


Now we can add aour texts to our ChromaDB vector database. Note that, in a production setting, it would be worthwhile to spend some more time on document formatting; e.g. grouping (or omitting) code blocks and removing strings that do not contain meaningful information.

[By default](https://docs.trychroma.com/embeddings#default-all-minilm-l6-v2), ChromaDB uses the `all-MiniLM-L6-v2` model to generate embeddings from the texts; this can be changed easily.  

In [43]:
docs.add(documents=concepts,
         ids=[f'id_{i}' for i in range(len(concepts))],)

/Users/daniel.liden/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [06:35<00:00, 210kiB/s]   


Now we can `peek()` at the first few entries.

In [47]:
docs.peek()

{'ids': ['id_0',
  'id_1',
  'id_2',
  'id_3',
  'id_4',
  'id_5',
  'id_6',
  'id_7',
  'id_8',
  'id_9'],
 'embeddings': [[0.006552808452397585,
   -0.1455732136964798,
   -0.03630339354276657,
   -0.04149722307920456,
   0.0789048969745636,
   -0.018954314291477203,
   0.0011541025014594197,
   -0.023578038439154625,
   0.02918737381696701,
   -0.038153208792209625,
   -0.05957527086138725,
   -0.0333360880613327,
   0.02628481760621071,
   -0.04695046320557594,
   -0.012434099800884724,
   -0.01221745740622282,
   0.01589595526456833,
   0.02787201851606369,
   -0.12112826108932495,
   -0.03770775347948074,
   -0.004761024843901396,
   0.029753129929304123,
   -0.07157041877508163,
   0.05196459963917732,
   -0.06580796092748642,
   0.07447721064090729,
   0.027940645813941956,
   -0.009607233107089996,
   0.034426648169755936,
   -0.030379241332411766,
   0.0070349485613405704,
   0.09528154134750366,
   0.05013096332550049,
   0.09825506806373596,
   -0.0035253402311354876,
   0.

Now we can run a sample query against this database.

In [49]:
results = docs.query(query_texts = ["How can an individual data scientist use MLFlow?"])
results["documents"][0]


['Example Use Cases  There are multiple ways you can use MLflow, whether you are a data scientist working alone or part of a large organization: Individual Data Scientists can use MLflow Tracking to track experiments locally on their machine, organize code in projects for future reuse, and output models that production engineers can then deploy using MLflow’s deployment tools',
 'Example Use Cases    There are multiple ways you can use MLflow, whether you are a data scientist working alone or part of a large organization: Individual Data Scientists can use MLflow Tracking to track experiments locally on their machine, organize code in projects for future reuse, and output models that production engineers can then deploy using MLflow’s deployment tools',
 'At the same time, MLflow aims to take any codebase written in its format and make it reproducible and reusable by multiple data scientists',
 'Data Science Teams Large Organizations can share projects, models, and results using MLflow

### Writing a Retrieval Function

We want to build a system that takes a user promt, finds the most relevant texts in the vector database, passes those texts to a language model, and returns a summary. We *don't* want to return the vector embeddings or IDs, and we want to concatenate the top matches into a single string we can pass to the language model. Here, we write a short utility function for returning the query results in a useful format. 

In [None]:
def gen_context(db, prompt, top_n=3):
    results = db.query(query_texts=prompt, n_results=top_n)
    texts = results["documents"][0]
    texts = texts.join("\n")
    return texts