# RAG

RAG consists of three steps:

1. RETRIEVAL - Find relevant documents using search
2. AUGMENTATION - Include the documents in the prompt
3. GENERATION - LLM generates an answer using the retrieved context

In [3]:
import requests

In [51]:
repo_owner = 'evidentlyai'
repo_name = 'docs'
branch_name = 'main'

zip_url = f'https://github.com/{repo_owner}/{repo_name}/archive/refs/heads/{branch_name}.zip'
zip_response = requests.get(zip_url)

In [52]:
len(zip_response.content)

17545668

In [53]:
import io
import zipfile

zip_archive = zipfile.ZipFile(io.BytesIO(zip_response.content))

In [54]:
filenames = zip_archive.namelist()
filenames[20:30]

['docs-main/docs/library/report.mdx',
 'docs-main/docs/library/synthetic_data_api.mdx',
 'docs-main/docs/library/tags_metadata.mdx',
 'docs-main/docs/library/tests.mdx',
 'docs-main/docs/platform/',
 'docs-main/docs/platform/alerts.mdx',
 'docs-main/docs/platform/dashboard_add_panels.mdx',
 'docs-main/docs/platform/dashboard_add_panels_ui.mdx',
 'docs-main/docs/platform/dashboard_overview.mdx',
 'docs-main/docs/platform/dashboard_panel_types.mdx']

In [55]:
filename = 'docs-main/docs/platform/alerts.mdx'

mdx_file = zip_archive.open(filename)

In [56]:
mdx_content = mdx_file.read().decode('utf8')
print(mdx_content)

---
title: 'Alerts'
description: 'How to set up alerts.'
---

<Check>
  Built-in alerting is a Pro feature available in the **Evidently Cloud** and **Evidently Enterprise**.
</Check>

![](/images/alerts.png)

To enable alerts, open the Project and navigate to the "Alerts" in the left menu. You must set:

* A notification channel.

* An alert condition.

## Notification channels

You can choose between the following options:

* **Email**. Add email addresses to send alerts to.

* **Slack**. Add a Slack webhook.

* **Discord**. Add a Discord webhook.

## Alert conditions

### Failed tests

If you use Tests (conditional checks) in your Project, you can tie alerting to the failed Tests in a Test Suite. Toggle this option on the Alerts page. Evidently will set an alert to the defined channel if any of the Tests fail.

<Tip>
</Tip>

### Custom conditions

You can also set alerts on individual Metric values. For example, you can generate Alerts when the share of drifting features is above a c

In [57]:
import frontmatter

post = frontmatter.loads(mdx_content)
print(post.content[:100])

<Check>
  Built-in alerting is a Pro feature available in the **Evidently Cloud** and **Evidently En


In [58]:
post.metadata

{'title': 'Alerts', 'description': 'How to set up alerts.'}

In [59]:
_, filename_corrected = filename.split('/', maxsplit=1)
print(filename_corrected)

docs/platform/alerts.mdx


In [60]:
filenames

['docs-main/',
 'docs-main/api-reference/',
 'docs-main/api-reference/endpoint/',
 'docs-main/api-reference/endpoint/create.mdx',
 'docs-main/api-reference/endpoint/delete.mdx',
 'docs-main/api-reference/endpoint/get.mdx',
 'docs-main/api-reference/introduction.mdx',
 'docs-main/api-reference/openapi.json',
 'docs-main/changelog/',
 'docs-main/changelog/changelog.mdx',
 'docs-main/docs/',
 'docs-main/docs/library/',
 'docs-main/docs/library/data_definition.mdx',
 'docs-main/docs/library/descriptors.mdx',
 'docs-main/docs/library/evaluations_overview.mdx',
 'docs-main/docs/library/leftover_content.mdx',
 'docs-main/docs/library/metric_generator.mdx',
 'docs-main/docs/library/output_formats.mdx',
 'docs-main/docs/library/overview.mdx',
 'docs-main/docs/library/prompt_optimization.mdx',
 'docs-main/docs/library/report.mdx',
 'docs-main/docs/library/synthetic_data_api.mdx',
 'docs-main/docs/library/tags_metadata.mdx',
 'docs-main/docs/library/tests.mdx',
 'docs-main/docs/platform/',
 'docs

In [61]:
doc = {
    'content': post.content,
    'metadata': post.metadata.get('title'),
    'description': post.metadata.get('description'),
    'filename': filename_corrected
}

In [62]:
doc

 'metadata': 'Alerts',
 'description': 'How to set up alerts.',
 'filename': 'docs/platform/alerts.mdx'}

In [63]:
def read_github_repository(repo_owner, repo_name, branch="main"):
    url = f"https://github.com/{repo_owner}/{repo_name}/archive/refs/heads/{branch}.zip"
    response = requests.get(url)
    response.raise_for_status()

    documents = []
    with zipfile.ZipFile(io.BytesIO(response.content)) as zip_ref:
        for file_path in zip_ref.namelist():
            if not file_path.endswith(('.md', '.mdx')):
                continue
            with zip_ref.open(file_path) as file:
                content = file.read().decode('utf-8')
                post = frontmatter.loads(content)
                doc = {
                    'content': post.content,
                    'title': post.metadata.get('title'),
                    'description': post.metadata.get('description'),
                    'filename': file_path.split('/', 1)[-1]
                }
                documents.append(doc)

    return documents

In [64]:
repo_owner = 'evidentlyai'
repo_name = 'docs'
branch_name = 'main'

documents = read_github_repository(repo_owner=repo_owner, repo_name=repo_name, branch=branch_name)
len(documents)

95

In [65]:
from gitsource import GithubRepositoryDataReader

reader = GithubRepositoryDataReader(
    repo_owner='evidentlyai',
    repo_name = 'docs',
    allowed_extensions={'md', 'mdx'},
)

files = reader.read()

print(f'Loaded {len(files)} documents.')

Loaded 95 documents.


In [66]:
md_file = files[10]

In [67]:
md_file.parse()

{'title': 'Output formats',
 'description': 'How to export the evaluation results.',
 'content': 'You can view or export Reports in multiple formats.\n\n**Pre-requisites**:\n\n* You know how to [generate Reports](/docs/library/report).\n\n## Log to Workspace\n\nYou can save the computed Report in Evidently Cloud or your local workspace.\n\n```python\nws.add_run(project.id, my_eval, include_data=False)\n```\n\n<Info>\n  **Uploading evals**. Check Quickstart examples [for ML](/quickstart_ml) or [for LLM](/quickstart_llm) for a full workflow.\n</Info>\n\n## View in Jupyter notebook\n\nYou can directly render the visual summary of evaluation results in interactive Python environments like Jupyter notebook or Colab.\n\nAfter running the Report, simply call the resulting Python object:\n\n```python\nmy_report\n```\n\nThis will render the HTML object directly in the notebook cell.\n\n## HTML\n\nYou can also save this interactive visual Report as an HTML file to open in a browser:\n\n```python

In [68]:
documents = [f.parse() for f in files]

In [69]:
len(documents)

95

# Indexing data

1. search <-- elastic search or minsearch
2. prompt 
3. llm>>

In [70]:
from minsearch import Index

In [71]:
query = 'LLM as a Judge'

In [72]:
index = Index(
    text_fields=['title', 'description', 'content'],
    keyword_fields = ['filename']
)
index.fit(documents)

<minsearch.minsearch.Index at 0x783df61b5e20>

In [73]:
results = index.search(query, num_results=5)

In [74]:
len(results)

5

In [75]:
# Best match
results[0]

{'title': 'LLM as a judge',
 'description': 'How to create and evaluate an LLM judge.',
 'content': 'import CloudSignup from \'/snippets/cloud_signup.mdx\';\nimport CreateProject from \'/snippets/create_project.mdx\';\n\nIn this tutorial, we\'ll show how to evaluate text for custom criteria using LLM as the judge, and evaluate the LLM judge itself.\n\n<Info>\n  **This is a local example.** You will run and explore results using the open-source Python library. At the end, we’ll optionally show how to upload results to the Evidently Platform for easy exploration.\n</Info>\n\nWe\'ll explore two ways to use an LLM as a judge:\n\n- **Reference-based**. Compare new responses against a reference. This is useful for regression testing or whenever you have a "ground truth" (approved responses) to compare against.\n- **Open-ended**. Evaluate responses based on custom criteria, which helps evaluate new outputs when there\'s no reference available.\n\nWe will focus on demonstrating **how to create

# Chunking

In [33]:
doc_sizes = [(doc.filename, len(doc.content)) for doc in files]
doc_sizes.sort(key = lambda x: x[1], reverse=True)

for filename, size in doc_sizes[:5]:
    print(f'{filename}: {size} characters')

metrics/all_metrics.mdx: 55085 characters
metrics/all_descriptors.mdx: 31976 characters
docs/platform/dashboard_panel_types.mdx: 31647 characters
docs/library/leftover_content.mdx: 28742 characters
metrics/customize_llm_judge.mdx: 26847 characters


1. search <-- 5 docs
2. prompt <-- 5 X 20k = 100k characters (prices for tokens)
3. llm>>

In [43]:
documents = list(range(0,100))

In [44]:
window_size = 10
start = 0
step = 5

chunks = []

while start < len(documents):
    end = start + window_size
    chunk = documents[start:end]
    if len(chunk) < window_size:
        break
    chunks.append(chunks)
    print(chunk)
    start = start + step

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
[25, 26, 27, 28, 29, 30, 31, 32, 33, 34]
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
[35, 36, 37, 38, 39, 40, 41, 42, 43, 44]
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54]
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
[55, 56, 57, 58, 59, 60, 61, 62, 63, 64]
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]
[65, 66, 67, 68, 69, 70, 71, 72, 73, 74]
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84]
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89]
[85, 86, 87, 88, 89, 90, 91, 92, 93, 94]
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


In [45]:
def sliding_window(text, size=1000, step=500):
    chunks = []
    start = 0
    text_length = len(text)

    while start < text_length:
        end = start + size
        chunk = text[start:end]
        chunks.append({'start': start, 'content': chunk})

        start = end - step

        if end >= text_length:
            break

    return chunks

In [46]:
len(sliding_window(results[0]['content'], size=3000, step=2500))

39

In [76]:
document_chunks = []

for doc in documents:
    if not doc.get('content'):
        continue
    copy = doc.copy()
    content = copy.pop('content')

    chunks = sliding_window(content, size=3000, step=1500)

    for i, chunk in enumerate(chunks):
        chunk.update(copy)
        chunk['chunk_id'] = i
        document_chunks.append(chunk)

In [78]:
document_chunks[:5]

[{'start': 0,
  'content': '<Note>\n  If you\'re not looking to build API reference documentation, you can delete\n  this section by removing the api-reference folder.\n</Note>\n\n## Welcome\n\nThere are two ways to build API documentation: [OpenAPI](https://mintlify.com/docs/api-playground/openapi/setup) and [MDX components](https://mintlify.com/docs/api-playground/mdx/configuration). For the starter kit, we are using the following OpenAPI specification.\n\n<Card\n  title="Plant Store Endpoints"\n  icon="leaf"\n  href="https://github.com/mintlify/starter/blob/main/api-reference/openapi.json"\n>\n  View the OpenAPI specification file\n</Card>\n\n## Authentication\n\nAll API endpoints are authenticated using Bearer tokens and picked up from the specification file.\n\n```json\n"security": [\n  {\n    "bearerAuth": []\n  }\n]\n```',
  'title': 'Introduction',
  'description': 'Example section for showcasing API endpoints',
  'filename': 'api-reference/introduction.mdx',
  'chunk_id': 0},


In [79]:
chunk_index = Index(
    text_fields=['title', 'description', 'content'],
    keyword_fields = ['filename']
)
chunk_index.fit(document_chunks)

<minsearch.minsearch.Index at 0x783df61dd1f0>

In [80]:
chunk_index.search(query)

[{'start': 0,
  'content': 'import CloudSignup from \'/snippets/cloud_signup.mdx\';\nimport CreateProject from \'/snippets/create_project.mdx\';\n\nIn this tutorial, we\'ll show how to evaluate text for custom criteria using LLM as the judge, and evaluate the LLM judge itself.\n\n<Info>\n  **This is a local example.** You will run and explore results using the open-source Python library. At the end, we’ll optionally show how to upload results to the Evidently Platform for easy exploration.\n</Info>\n\nWe\'ll explore two ways to use an LLM as a judge:\n\n- **Reference-based**. Compare new responses against a reference. This is useful for regression testing or whenever you have a "ground truth" (approved responses) to compare against.\n- **Open-ended**. Evaluate responses based on custom criteria, which helps evaluate new outputs when there\'s no reference available.\n\nWe will focus on demonstrating **how to create and tune the LLM evaluator**, which you can then apply in different cont

In [81]:
from gitsource import chunk_documents

In [85]:
document_chunks = chunk_documents(documents, size=3000, step=1500)

In [86]:
document_chunks[10]

{'start': 9000,
 'content': 'cation=[BinaryClassification(\n        target="target",\n        prediction_labels="prediction")],\n    categorical_columns=["target", "prediction"])\n```\n\nAvailable options and defaults:\n\n```python\n    target: str = "target"\n    prediction_labels: Optional[str] = None\n    prediction_probas: Optional[str] = "prediction" #if probabilistic classification\n    pos_label: Label = 1 #name of the positive label\n    labels: Optional[Dict[Label, str]] = None\n```\n\n### Ranking\n\n#### RecSys\n\nTo evaluate recommender systems performance, you must map the columns with:\n\n- Prediction: this could be predicted score or rank.\n- Target: relevance labels (e.g., this could be an interaction result like user click or upvote, or a true relevance label)\n\nThe **target** column can contain either:\n\n- a binary label (where `1` is a positive outcome)\n- any scores (positive values, where a higher value corresponds to a better match or a more valuable user action)

# RAG implemented

In [88]:
from openai import OpenAI

openai_client = OpenAI()

In [104]:
query = 'how do I implement llm as a judge?'

In [105]:
search_results = chunk_index.search(query, num_results=5)

In [106]:
import json

In [107]:
search_result_json = json.dumps(search_results, indent=2)

In [108]:
instructions = """
You're a course assistant, your task is to answer the QUESTION from the
course students using the provided CONTEXT
"""

user_prompt = f"""
<QUESTION>
{query}
</QUESTION>

<CONTEXT>
{search_result_json}
</CONTEXT>
""".strip()

In [109]:
def llm(user_prompt, instructions=None, model='gpt-4o-mini'):

    messages = []

    if instructions is not None:
        messages.append({'role': 'system', 'content': instructions})

    messages.append({'role': 'user', 'content': user_prompt})

    response = openai_client.responses.create(
        model=model,
        input=messages
    )

    return response.output_text

In [110]:
answer = llm(user_prompt,instructions)

In [111]:
print(answer)

To implement an LLM as a judge, follow these steps:

1. **Install Required Libraries**:
   You need to install the Evidently library. Use the command:
   ```python
   pip install evidently
   ```

2. **Set Up Your Environment**:
   Import necessary modules and set your OpenAI API key:
   ```python
   import os
   os.environ["OPENAI_API_KEY"] = "YOUR_KEY"
   ```

3. **Create a Dataset**:
   Generate a toy Q&A dataset that includes:
   - **Questions**: Inputs sent to the LLM.
   - **Target Responses**: Approved accurate responses.
   - **New Responses**: The generated responses to be evaluated.
   - **Manual Labels**: Labels indicating if responses are correct or not.

   Here’s an example of how to create this dataset:
   ```python
   data = [
       ["question_1", "target_response_1", "new_response_1", "manual_label_1"],
       # Add more entries as needed
   ]
   ```

4. **Create an LLM Evaluator**: 
   Define a prompt designed for evaluating responses. You can utilize pre-defined tem

In [112]:
def search(query):
    return chunk_index.search(query, num_results=5)

In [113]:
instructions = """
You're a course assistant, your task is to answer the QUESTION from the
course students using the provided CONTEXT
"""

def build_prompt(query, search_results):
    search_result_json = json.dumps(search_results, indent=2)

    user_prompt = f"""
    <QUESTION>
    {query}
    </QUESTION>

    <CONTEXT>
    {search_result_json}
    </CONTEXT>
    """.strip()

    return user_prompt

In [114]:
def rag(query):

    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt, instructions)

    return answer

In [115]:
rag('how do i implement llm as a judge?')

'To implement an LLM as a judge, follow these steps based on the provided context:\n\n1. **Prerequisites**:\n   - Ensure you have basic Python knowledge.\n   - Obtain an OpenAI API key to use for the LLM evaluator.\n\n2. **Set Up Your Environment**:\n   - Install the Evidently library:\n     ```python\n     pip install evidently\n     ```\n   - Import the required Python modules:\n     ```python\n     import pandas as pd\n     import numpy as np\n     from evidently import Dataset, DataDefinition, Report\n     from evidently.metrics import *\n     from evidently.llm.templates import BinaryClassificationPromptTemplate\n     ```\n\n3. **Configure Your OpenAI API Key**:\n   - Set your OpenAI API key as an environment variable:\n     ```python\n     import os\n     os.environ["OPENAI_API_KEY"] = "YOUR_KEY"\n     ```\n\n4. **Create an Evaluation Dataset**:\n   - Construct a toy Q&A dataset that includes:\n     - Questions (inputs sent to the LLM).\n     - Target responses (approved answers)