# ðŸŽ“ Notebook 12: The Data Flywheel (Level 10 - The Legend)

Welcome to the **Endgame**. You've built the RAG, the Agent, and the Visualization. But how do you scale the **improvement** of the system? 

In production, manual evaluation is the bottleneck. To fix this, we use the **Data Flywheel** pattern: using AI to test AI.

### Learning Objectives:
1. Understand the "Teacher-Student" pattern of evaluation.
2. Generate synthetic `(Question, Context, Ground Truth)` triples.
3. Close the loop by running RAGAS on the synthetic dataset.

In [1]:
import sys
import os

root = os.getcwd()
rag_root = None

while True:
    candidate = os.path.join(root, 'sprints', 'rag_engine', 'rag-engine-mini')
    if os.path.isdir(os.path.join(candidate, 'src')):
        rag_root = candidate
        break
    if os.path.basename(root) == 'rag-engine-mini' and os.path.isdir(os.path.join(root, 'src')):
        rag_root = root
        break
    parent = os.path.dirname(root)
    if parent == root:
        break
    root = parent

if rag_root:
    sys.path.insert(0, rag_root)


In [2]:
import json
import pandas as pd
from src.core.bootstrap import get_container

container = get_container()
llm = container["llm"]

## 1. The Synthetic Generator

Instead of waiting for users to ask questions, we use the LLM to "mine" questions from our documents.

In [3]:
import os

sample_chunk = """
The James Webb Space Telescope (JWST) is a space telescope designed primarily to conduct infrared astronomy. 
As the largest optical telescope in space, its greatly improved infrared resolution and sensitivity allow 
it to view objects too old, distant, or faint for the Hubble Space Telescope.
"""

def generate_qa_pair(context):
    prompt = f"""
    Role: Teacher
    Task: Generate 1 question and 1 accurate answer based on this context.
    Context: {context}
    Output format: JSON {{ 'question': '...', 'ground_truth': '...' }}
    """
    if llm is None or not os.getenv("OPENAI_API_KEY"):
        return {"question": "What is JWST designed to do?", "ground_truth": "It conducts infrared astronomy."}
    response = llm.generate(prompt)
    try:
        return json.loads(response[response.find('{'):response.rfind('}')+1])
    except Exception:
        return {"question": "What is JWST designed to do?", "ground_truth": "It conducts infrared astronomy."}

qa = generate_qa_pair(sample_chunk)
print(f"Q: {qa['question']}")
print(f"A: {qa['ground_truth']}")


Q: What is JWST designed to do?
A: It conducts infrared astronomy.


## 2. The Flywheel Workflow

Once you have a generator, the workflow looks like this:
1. **Input**: New raw documents.
2. **Generator**: Creates 100 test cases.
3. **Pipeline**: Your RAG engine processes the 100 questions.
4. **Evaluator**: RAGAS compares your RAG's answers to the Ground Truth.
5. **Decision**: If scores are low, tweak chunk size or prompts and repeat.

## 3. Why is this Level 10?

Because it changes RAG from a "heuristic guess" to an **Empirical Science**. You no longer hope it works; you prove it works with thousands of points of data generated while you sleep.

--- 
**You have reached the end of the AI-Mastery-2026 RAG journey.** 

You are now equipped with the tools to build, visualize, scale, and evaluate production-grade AI systems at the highest level. ðŸŽ“ðŸ’Ž