<div align="center">
<a href="https://rapidfire.ai/"><img src="https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/RapidFire - Blue bug -white text.svg" width="115"></a>
<a href="https://discord.gg/6vSTtncKNN"><img src="https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/discord-button.svg" width="145"></a>
<a href="https://oss-docs.rapidfire.ai/"><img src="https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/documentation-button.svg" width="125"></a>
<br/>
Join Discord if you need help + ‚≠ê <i>Star us on <a href="https://github.com/RapidFireAI/rapidfireai">GitHub</a></i> ‚≠ê
<br/>
üëâ <b>Note:</b> This Colab notebook illustrates simplified usage of <code>rapidfireai</code>. For the full RapidFire AI experience with advanced experiment manager, UI, and production features, see our <a href=\"https://oss-docs.rapidfire.ai/en/latest/walkthrough.html\">Install and Get Started</a> guide.
<br/>
üé¨ Watch our <a href=\"https://youtu.be/vVXorey0ANk\">intro video</a> to get started!
</div>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RapidFireAI/rapidfireai/blob/main/tutorial_notebooks/rag-contexteng/rf-colab-rag-fiqa-tutorial.ipynb)

‚ö†Ô∏è **IMPORTANT:** Do not let the Colab notebook tab stay idle for more than 5min; Colab will disconnect otherwise. Interact with the cells to avoid disconnection.

# Optimizing RAG Pipelines with RapidFire AI

Retrieval-Augmented Generation (RAG) is a practical way to make an AI assistant **answer using your documents**:

- **Retrieve**: find the most relevant passages for a question.
- **Generate**: give those passages to a language model so it can answer *grounded in evidence*.

In this beginner-friendly Colab, we‚Äôll build and evaluate a RAG pipeline for a **financial opinion Q&A** assistant using the [FiQA dataset](https://huggingface.co/datasets/explodinggradients/fiqa).

Examples of the kind of questions we‚Äôre targeting:

- ‚ÄúShould I invest in index funds or individual stocks?‚Äù
- ‚ÄúWhat‚Äôs a good way to save for retirement in my 30s?‚Äù
- ‚ÄúIs it worth refinancing my mortgage right now?‚Äù

## What We‚Äôre Building

A concrete RAG pipeline that looks like this:

1. **Load a financial corpus** (documents + posts).
2. **Split documents into chunks** (so we can search smaller, more relevant pieces).
3. **Embed the chunks** (turn text into vectors) and store them in a vector index (FAISS).
4. **Retrieve top‚ÄëK chunks** for each question using similarity search.
5. *(Optional)* **Rerank** the retrieved chunks with a stronger model to keep only the best evidence.
6. **Build a prompt** that includes the question + retrieved context.
7. **Generate an answer** with a small vLLM model.
8. **Evaluate retrieval quality** (Precision, Recall, NDCG@5, MRR) so we can tell which settings find better evidence.

## Our Approach

RAG has a lot of ‚Äúknobs‚Äù, and it‚Äôs easy to lose track of what helped. In this notebook we‚Äôll focus on **retrieval quality** by keeping the generator (the vLLM model) fixed and only varying retrieval settings.

We‚Äôll use [RapidFireAI](https://github.com/RapidFireAI/rapidfireai) to:

- **Define a small retrieval grid**: 2 chunking strategies √ó 2 reranking `top_n` values = **4 retrieval configs**.
- **Run all configs the same way** on the same dataset.
- **Compare retrieval metrics side-by-side** as they update (Precision/Recall/NDCG/MRR) to pick the best evidence-finding setup.

## Install RapidFire AI Package and Setup
### Option 1: Install Locally (or on a VM)
For the full RapidFire AI experience‚Äîadvanced experiment management, UI, and production features‚Äîwe recommend installing the package on a machine you control (for example, a VM or your local machine) rather than Google Colab. See our [Install and Get Started](https://oss-docs.rapidfire.ai/en/latest/walkthrough.html) guide.

### Option 2: Install in Google Colab
For simplicity, you can run this notebook on Google Colab. This notebook is configured to run end-to-end on Colab with no local installation required.

In [1]:
try:
    import rapidfireai
    print("‚úÖ rapidfireai already installed")
except ImportError:
    %pip install git+https://github.com/RapidFireAI/rapidfireai.git@bugfix/DBCommitForMetricLogger  # Takes 1 min
    !rapidfireai init --evals # Takes 1 min

Collecting git+https://github.com/RapidFireAI/rapidfireai.git@bugfix/DBCommitForMetricLogger
  Cloning https://github.com/RapidFireAI/rapidfireai.git (to revision bugfix/DBCommitForMetricLogger) to /tmp/pip-req-build-pq_ba87d
  Running command git clone --filter=blob:none --quiet https://github.com/RapidFireAI/rapidfireai.git /tmp/pip-req-build-pq_ba87d
  Running command git checkout -b bugfix/DBCommitForMetricLogger --track origin/bugfix/DBCommitForMetricLogger
  Switched to a new branch 'bugfix/DBCommitForMetricLogger'
  Branch 'bugfix/DBCommitForMetricLogger' set up to track remote branch 'bugfix/DBCommitForMetricLogger' from 'origin'.
  Resolved https://github.com/RapidFireAI/rapidfireai.git to commit 929e31aec205b07c6bacbc59bff0bba191fca3e4
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting flask-cors (from rapidfireai==0.12.8)
  Downloading flask_cors

### Import RapidFire Components

Import RapidFire‚Äôs core classes for defining the RAG pipeline and running a small retrieval grid search (plus a Colab-friendly protobuf setting).

In [5]:
import os
os.environ['RF_TRACKIO_ENABLED'] = 'true'
os.environ['PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION'] = 'python'

from rapidfireai import Experiment
from rapidfireai.automl import List, RFLangChainRagSpec, RFvLLMModelConfig, RFPromptManager, RFGridSearch
import re, json
from typing import List as listtype, Dict, Any

# If you get "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'" from Colab, just rerun this cell

### Loading the Data

We load the FiQA **queries** and **relevance labels (qrels)**, then downsample to keep this Colab run fast.
Next we filter the corpus to only documents relevant to the sampled queries and write a smaller `corpus_sampled.jsonl`.
Finally, we update `qrels` to match the sampled subset so evaluation stays consistent.

In [6]:
from datasets import load_dataset
import pandas as pd
import json, random
from pathlib import Path

# Dataset directory
dataset_dir = Path("/content/tutorial_notebooks/rag-contexteng/datasets")

# Load full dataset
fiqa_dataset = load_dataset("json", data_files=str(dataset_dir / "fiqa" / "queries.jsonl"), split="train")
fiqa_dataset = fiqa_dataset.rename_columns({"text": "query", "_id": "query_id"})
qrels = pd.read_csv(str(dataset_dir / "fiqa" / "qrels.tsv"), sep="\t")
qrels = qrels.rename(
    columns={"query-id": "query_id", "corpus-id": "corpus_id", "score": "relevance"}
)

# Downsample queries and corpus JOINTLY
sample_fraction = 0.001  # low sample_fraction makes demo faster but degrades metrics; set to 1.0 for full evals if running on a local machine.
rseed = 1
random.seed(rseed)

# Step 1: Sample queries
sample_size = int(len(fiqa_dataset) * sample_fraction)
fiqa_dataset = fiqa_dataset.shuffle(seed=rseed).select(range(sample_size))

# Convert query_ids to integers for matching
query_ids = set([int(qid) for qid in fiqa_dataset["query_id"]])

# Step 2: Get all corpus docs relevant to sampled queries
qrels_filtered = qrels[qrels["query_id"].isin(query_ids)]
relevant_corpus_ids = set(qrels_filtered["corpus_id"].tolist())

print(f"Using {len(fiqa_dataset)} queries")
print(f"Found {len(relevant_corpus_ids)} relevant documents for these queries")

# Step 3: Load corpus and filter to relevant docs
input_file = dataset_dir / "fiqa" / "corpus.jsonl"
output_file = dataset_dir / "fiqa" / "corpus_sampled.jsonl"

with open(input_file, 'r') as f:
    all_corpus = [json.loads(line) for line in f]

# Keep only relevant documents (convert _id to int for matching)
sampled_corpus = [doc for doc in all_corpus if int(doc["_id"]) in relevant_corpus_ids]

# Write sampled corpus
with open(output_file, 'w') as f:
    for doc in sampled_corpus:
        f.write(json.dumps(doc) + '\n')

print(f"Sampled {len(sampled_corpus)} documents from {len(all_corpus)} total")
print(f"Saved to: {output_file}")
print(f"Filtered qrels to {len(qrels_filtered)} relevance judgments")

# Update qrels to match
qrels = qrels_filtered

Generating train split: 0 examples [00:00, ? examples/s]

Using 6 queries
Found 16 relevant documents for these queries
Sampled 16 documents from 57638 total
Saved to: /content/tutorial_notebooks/rag-contexteng/datasets/fiqa/corpus_sampled.jsonl
Filtered qrels to 16 relevance judgments


### Defining the RAG Search Space
This is where RapidFireAI shines. Instead of hardcoding a single RAG configuration, we define a search space using RFLangChainRagSpec.

We will test:

* **2 Chunking Strategies**: Different chunk sizes (256 vs 128).
* **2 Reranking Strategies**: Different `top_n` values (2 vs 5).

This gives us 4 combinations to evaluate for the retrieval part.

In [7]:
from langchain_community.document_loaders import DirectoryLoader, JSONLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_classic.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

# Per-Actor batch size for hardware efficiency
batch_size = 50

# 2 chunk sizes x 2 reranking top-n = 4 combinations in total
rag_gpu = RFLangChainRagSpec(
    document_loader=DirectoryLoader(
        path=str(dataset_dir / "fiqa"),
        glob="corpus_sampled.jsonl",
        loader_cls=JSONLoader,
        loader_kwargs={
            "jq_schema": ".",
            "content_key": "text",
            "metadata_func": lambda record, metadata: {
                "corpus_id": int(record.get("_id"))
            },  # store the document id
            "json_lines": True,
            "text_content": False,
        },
        sample_seed=42,
    ),
    # 2 chunking strategies with different chunk sizes
    text_splitter=List([
            RecursiveCharacterTextSplitter.from_tiktoken_encoder(
                encoding_name="gpt2", chunk_size=256, chunk_overlap=32
            ),
            RecursiveCharacterTextSplitter.from_tiktoken_encoder(
                encoding_name="gpt2", chunk_size=128, chunk_overlap=32
            ),
        ],
    ),
    embedding_cls=HuggingFaceEmbeddings,
    embedding_kwargs={
        "model_name": "sentence-transformers/all-MiniLM-L6-v2",
        "model_kwargs": {"device": "cuda:0"},
        "encode_kwargs": {"normalize_embeddings": True, "batch_size": batch_size},
    },
    vector_store=None,  # uses FAISS by default
    search_type="similarity",
    search_kwargs={"k": 8},
    # 2 reranking strategies with different top-n values
    reranker_cls=CrossEncoderReranker,
    reranker_kwargs={
        "model_name": "cross-encoder/ms-marco-MiniLM-L6-v2",
        "model_kwargs": {"device": "cpu"},
        "top_n": List([2, 5]),
    },
    enable_gpu_search=True,
)

### Define Data Processing and Postprocessing Functions

We retrieve context for each question and turn it into LLM-ready prompts.
Then we attach the ‚Äúground truth‚Äù relevant documents from FiQA (`qrels`) so we can score retrieval quality later.

In [8]:
def sample_preprocess_fn(
    batch: Dict[str, listtype], rag: RFLangChainRagSpec, prompt_manager: RFPromptManager
) -> Dict[str, listtype]:
    """Function to prepare the final inputs given to the generator model"""

    INSTRUCTIONS = "Utilize your financial knowledge, give your answer or opinion to the input question or subject matter."

    # Perform batched retrieval over all queries; returns a list of lists of k documents per query
    all_context = rag.get_context(batch_queries=batch["query"], serialize=False)

    # Extract the retrieved document ids from the context
    retrieved_documents = [
        [doc.metadata["corpus_id"] for doc in docs] for docs in all_context
    ]

    # Serialize the retrieved documents into a single string per query using the default template
    serialized_context = rag.serialize_documents(all_context)
    batch["query_id"] = [int(query_id) for query_id in batch["query_id"]]

    # Each batch to contain conversational prompt, retrieved documents, and original 'query_id', 'query', 'metadata'
    return {
        "prompts": [
            [
                {"role": "system", "content": INSTRUCTIONS},
                {
                    "role": "user",
                    "content": f"Here is some relevant context:\n{context}. \nNow answer the following question using the context provided earlier:\n{question}",
                },
            ]
            for question, context in zip(batch["query"], serialized_context)
        ],
        "retrieved_documents": retrieved_documents,
        **batch,
    }


def sample_postprocess_fn(batch: Dict[str, listtype]) -> Dict[str, listtype]:
    """Function to postprocess outputs produced by generator model"""
    # Get ground truth documents for each query; can be done in preprocess_fn too but done here for clarity
    batch["ground_truth_documents"] = [
        qrels[qrels["query_id"] == query_id]["corpus_id"].tolist()
        for query_id in batch["query_id"]
    ]
    return batch

### Define Custom Eval Metrics Functions

The following helper methods compute standard retrieval metrics (Precision, Recall, F1, NDCG@5, MRR) from the retrieved vs. ground-truth document IDs.
We compute metrics per batch and then combine them across batches so each config gets one consistent score.

In [9]:
import math


def compute_ndcg_at_k(retrieved_docs: set, expected_docs: set, k=5):
    """Utility function to compute NDCG@k"""
    relevance = [1 if doc in expected_docs else 0 for doc in list(retrieved_docs)[:k]]
    dcg = sum(rel / math.log2(i + 2) for i, rel in enumerate(relevance))

    # IDCG: perfect ranking limited by min(k, len(expected_docs))
    ideal_length = min(k, len(expected_docs))
    ideal_relevance = [3] * ideal_length + [0] * (k - ideal_length)
    idcg = sum(rel / math.log2(i + 2) for i, rel in enumerate(ideal_relevance))

    return dcg / idcg if idcg > 0 else 0.0


def compute_rr(retrieved_docs: set, expected_docs: set):
    """Utility function to compute Reciprocal Rank (RR) for a single query"""
    rr = 0
    for i, retrieved_doc in enumerate(retrieved_docs):
        if retrieved_doc in expected_docs:
            rr = 1 / (i + 1)
            break
    return rr


def sample_compute_metrics_fn(batch: Dict[str, listtype]) -> Dict[str, Dict[str, Any]]:
    """Function to compute all eval metrics based on retrievals and/or generations"""

    true_positives, precisions, recalls, f1_scores, ndcgs, rrs = 0, [], [], [], [], []
    total_queries = len(batch["query"])

    for pred, gt in zip(batch["retrieved_documents"], batch["ground_truth_documents"]):
        expected_set = set(gt)
        retrieved_set = set(pred)

        true_positives = len(expected_set.intersection(retrieved_set))
        precision = true_positives / len(retrieved_set) if len(retrieved_set) > 0 else 0
        recall = true_positives / len(expected_set) if len(expected_set) > 0 else 0
        f1 = (
            2 * precision * recall / (precision + recall)
            if (precision + recall) > 0
            else 0
        )

        precisions.append(precision)
        recalls.append(recall)
        f1_scores.append(f1)
        ndcgs.append(compute_ndcg_at_k(retrieved_set, expected_set, k=5))
        rrs.append(compute_rr(retrieved_set, expected_set))

    return {
        "Total": {"value": total_queries},
        "Precision": {"value": sum(precisions) / total_queries},
        "Recall": {"value": sum(recalls) / total_queries},
        "F1 Score": {"value": sum(f1_scores) / total_queries},
        "NDCG@5": {"value": sum(ndcgs) / total_queries},
        "MRR": {"value": sum(rrs) / total_queries},
    }


def sample_accumulate_metrics_fn(
    aggregated_metrics: Dict[str, listtype],
) -> Dict[str, Dict[str, Any]]:
    """Function to accumulate eval metrics across all batches"""

    num_queries_per_batch = [m["value"] for m in aggregated_metrics["Total"]]
    total_queries = sum(num_queries_per_batch)
    algebraic_metrics = ["Precision", "Recall", "F1 Score", "NDCG@5", "MRR"]

    return {
        "Total": {"value": total_queries},
        **{
            metric: {
                "value": sum(
                    m["value"] * queries
                    for m, queries in zip(
                        aggregated_metrics[metric], num_queries_per_batch
                    )
                )
                / total_queries,
                "is_algebraic": True,
                "value_range": (0, 1),
            }
            for metric in algebraic_metrics
        },
    }

### Define Partial Multi-Config Knobs for vLLM Generator part of RAG Pipeline using RapidFire AI Wrapper APIs

We pick a lightweight vLLM model and sampling settings that fit in Colab GPU memory.
Then we bundle the generator + our preprocessing/metrics functions into `config_set`, which RapidFire will run across the 4 retrieval configs.

In [10]:
vllm_config1 = RFvLLMModelConfig(
    model_config={
        "model": "Qwen/Qwen2.5-0.5B-Instruct",
        "dtype": "half",
        "gpu_memory_utilization": 0.25,
        "tensor_parallel_size": 1,
        "distributed_executor_backend": "mp",
        "enable_chunked_prefill": False,
        "enable_prefix_caching": False,
        "max_model_len": 3000,
        "disable_log_stats": True,  # Disable vLLM progress logging
        "enforce_eager": True,
        "disable_custom_all_reduce": True,
    },
    sampling_params={
        "temperature": 0.8,
        "top_p": 0.95,
        "max_tokens": 128,
    },
    rag=rag_gpu,
    prompt_manager=None,
)

batch_size = 3 # Smaller batch size for generation
config_set = {
    "vllm_config": vllm_config1,  # Only 1 generator, but it represents 4 full configs
    "batch_size": batch_size,
    "preprocess_fn": sample_preprocess_fn,
    "postprocess_fn": sample_postprocess_fn,
    "compute_metrics_fn": sample_compute_metrics_fn,
    "accumulate_metrics_fn": sample_accumulate_metrics_fn,
    "online_strategy_kwargs": {
        "strategy_name": "normal",
        "confidence_level": 0.95,
        "use_fpc": True,
    },
}

### Create Config Group

We create an `RFGridSearch` over `config_set`, producing **4 retrieval configs** (2 chunkers √ó 2 rerankers) to run and compare.



In [11]:
# Simple grid search across all config combinations: 4 total (2 chunkers √ó 2 rerankers)
config_group = RFGridSearch(config_set)

### Create Experiment

An `Experiment` is RapidFire‚Äôs top-level container for this notebook run: it groups configs/runs, saves artifacts, and tracks metrics under a unique name.
We set `mode="evals"` because we‚Äôre running evaluation (not training). See the docs: https://oss-docs.rapidfire.ai/en/latest/experiment.html#api-experiment


In [12]:
experiment = Experiment(experiment_name="exp1-fiqa-rag-colab", mode="evals")

Created directory for database at /content/rapidfireai/db
Experiment exp1-fiqa-rag-colab created with Experiment ID: 1 at /content/rapidfireai/rapidfire_experiments/exp1-fiqa-rag-colab
Created directory: /content/rapidfireai/logs/exp1-fiqa-rag-colab
üåê Google Colab detected. Ray dashboard URL: https://8855-gpu-t4-s-2nqnb69r177ws-b.us-west1-0.prod.colab.dev
üåê Google Colab detected. Dispatcher URL: https://8851-gpu-t4-s-2nqnb69r177ws-b.us-west1-0.prod.colab.dev


### Display Ray Dashboard (Optional)

Ray is the system RapidFire uses under the hood to run work in parallel; this cell simply embeds Ray‚Äôs dashboard below so we can monitor what‚Äôs running.

In [13]:
# Display the Ray dashboard in the Colab notebook
from google.colab import output
output.serve_kernel_port_as_iframe(8855)

<IPython.core.display.Javascript object>

### Run Multi-Config Evals + Launch Interactive Run Controller

Now we get to the main function for running multi-config evals. Two tables will appear below the run_evals cell:
- The first table will appear immediately. It lists all preprocessing/RAG sources.
- After a short while, the second table will appear. It lists all individual runs with their knobs and metrics that are updated in real-time via online aggregation showing both estimates and confidence intervals.

RapidFire AI also provides an Interactive Controller panel UI for Colab that lets you manage executing runs dynamically in real-time from the notebook:

- ‚èπÔ∏è **Stop**: Gracefully stop a running config
- ‚ñ∂Ô∏è **Resume**: Resume a stopped run
- üóëÔ∏è **Delete**: Remove a run from this experiment
- üìã **Clone**: Create a new run by editing the config dictionary of a parent run to try new knob values; optional warm start of parameters
- üîÑ **Refresh**: Update run status and metrics

In [14]:
# Launch evals of all RAG configs in the config_group with swap granularity of 4 chunks
# NB: If your machine has more than 1 GPU, set num_actors to that number
results = experiment.run_evals(
    config_group=config_group,
    dataset=fiqa_dataset,
    num_actors=1,
    num_shards=4,
    seed=42,
)

=== Preprocessing RAG Sources ===


RAG Source ID,Status,Duration,Details
1,Complete,45.4s,"FAISS, GPU"
2,Complete,45.5s,"FAISS, GPU"


* Running on public URL: https://46c1c92496e259fd4b.gradio.live
* Trackio project initialized: exp1-fiqa-rag-colab
* Trackio metrics logged to: /root/.cache/huggingface/trackio
* View dashboard by running in your terminal:
[1m[38;5;208mtrackio show --project "exp1-fiqa-rag-colab"[0m
* or by running in Python: trackio.show(project="exp1-fiqa-rag-colab")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


* GPU detected, enabling automatic GPU metrics logging
* Created new run: dainty-sunset-0
* Run finished. Uploading logs to Trackio (please wait...)
* GPU detected, enabling automatic GPU metrics logging
* Created new run: Pipeline 1_1
* Run finished. Uploading logs to Trackio (please wait...)
* GPU detected, enabling automatic GPU metrics logging
* Created new run: Pipeline 2_2
* Run finished. Uploading logs to Trackio (please wait...)
* GPU detected, enabling automatic GPU metrics logging
* Created new run: Pipeline 3_3
* Run finished. Uploading logs to Trackio (please wait...)
* GPU detected, enabling automatic GPU metrics logging
* Created new run: Pipeline 4_4

=== Multi-Config Experiment Progress ===


Run ID,Model,Status,Progress,Conf. Interval,search_type,rag_k,top_n,chunk_size,chunk_overlap,sampling_params,model_config,Precision,Recall,F1 Score,NDCG@5,MRR,Throughput,Total,Samples Processed,Processing Time,Samples Per Second,model_name,run_id
1,Qwen/Qwen2.5-0.5B-Instruct,COMPLETED,4/4,0.0,similarity,8.0,2.0,256.0,32.0,"{'temperature': 0.8, 'top_p': 0.95, 'max_tokens': 128}","{'dtype': 'half', 'gpu_memory_utilization': 0.25, 'tensor_parallel_size': 1, 'distributed_executor_backend': 'mp', 'enable_chunked_prefill': False, 'enable_prefix_caching': False, 'max_model_len': 3000, 'disable_log_stats': True, 'enforce_eager': True, 'disable_custom_all_reduce': True}","43.95% [43.95%, 43.95%]","88.33% [88.33%, 88.33%]","53.26% [53.26%, 53.26%]","20.07% [20.07%, 20.07%]","68.06% [68.06%, 68.06%]",0.0/s,6,6,604.04 seconds,0.01,Qwen/Qwen2.5-0.5B-Instruct,1.0
2,Qwen/Qwen2.5-0.5B-Instruct,COMPLETED,4/4,0.0,similarity,8.0,5.0,256.0,32.0,"{'temperature': 0.8, 'top_p': 0.95, 'max_tokens': 128}","{'dtype': 'half', 'gpu_memory_utilization': 0.25, 'tensor_parallel_size': 1, 'distributed_executor_backend': 'mp', 'enable_chunked_prefill': False, 'enable_prefix_caching': False, 'max_model_len': 3000, 'disable_log_stats': True, 'enforce_eager': True, 'disable_custom_all_reduce': True}","43.95% [43.95%, 43.95%]","88.33% [88.33%, 88.33%]","53.26% [53.26%, 53.26%]","20.07% [20.07%, 20.07%]","68.06% [68.06%, 68.06%]",0.1/s,6,6,127.98 seconds,0.05,Qwen/Qwen2.5-0.5B-Instruct,2.0
3,Qwen/Qwen2.5-0.5B-Instruct,COMPLETED,4/4,0.0,similarity,8.0,2.0,128.0,32.0,"{'temperature': 0.8, 'top_p': 0.95, 'max_tokens': 128}","{'dtype': 'half', 'gpu_memory_utilization': 0.25, 'tensor_parallel_size': 1, 'distributed_executor_backend': 'mp', 'enable_chunked_prefill': False, 'enable_prefix_caching': False, 'max_model_len': 3000, 'disable_log_stats': True, 'enforce_eager': True, 'disable_custom_all_reduce': True}","45.83% [45.83%, 45.83%]","80.00% [80.00%, 80.00%]","53.31% [53.31%, 53.31%]","20.06% [20.06%, 20.06%]","61.11% [61.11%, 61.11%]",0.1/s,6,6,120.68 seconds,0.05,Qwen/Qwen2.5-0.5B-Instruct,3.0
4,Qwen/Qwen2.5-0.5B-Instruct,COMPLETED,4/4,0.0,similarity,8.0,5.0,128.0,32.0,"{'temperature': 0.8, 'top_p': 0.95, 'max_tokens': 128}","{'dtype': 'half', 'gpu_memory_utilization': 0.25, 'tensor_parallel_size': 1, 'distributed_executor_backend': 'mp', 'enable_chunked_prefill': False, 'enable_prefix_caching': False, 'max_model_len': 3000, 'disable_log_stats': True, 'enforce_eager': True, 'disable_custom_all_reduce': True}","45.83% [45.83%, 45.83%]","80.00% [80.00%, 80.00%]","53.31% [53.31%, 53.31%]","20.06% [20.06%, 20.06%]","61.11% [61.11%, 61.11%]",0.1/s,6,6,101.20 seconds,0.06,Qwen/Qwen2.5-0.5B-Instruct,4.0




* Run finished. Uploading logs to Trackio (please wait...)




* Run finished. Uploading logs to Trackio (please wait...)
* Run finished. Uploading logs to Trackio (please wait...)
* Run finished. Uploading logs to Trackio (please wait...)


### Display Trackio Dashboard

In [15]:
output.serve_kernel_port_as_iframe(7860)

<IPython.core.display.Javascript object>

### View Results

In [16]:
# Convert results dict to DataFrame
results_df = pd.DataFrame([
    {k: v['value'] if isinstance(v, dict) and 'value' in v else v for k, v in {**metrics_dict, 'run_id': run_id}.items()}
    for run_id, (_, metrics_dict) in results.items()
])

results_df

Unnamed: 0,run_id,model_name,search_type,rag_k,top_n,chunk_size,chunk_overlap,sampling_params,model_config,Samples Processed,Processing Time,Samples Per Second,Total,Precision,Recall,F1 Score,NDCG@5,MRR
0,1,Qwen/Qwen2.5-0.5B-Instruct,similarity,8,2,256,32,"{'temperature': 0.8, 'top_p': 0.95, 'max_token...","{'dtype': 'half', 'gpu_memory_utilization': 0....",6,604.04 seconds,0.01,6,0.439484,0.883333,0.532576,0.20065,0.680556
1,2,Qwen/Qwen2.5-0.5B-Instruct,similarity,8,5,256,32,"{'temperature': 0.8, 'top_p': 0.95, 'max_token...","{'dtype': 'half', 'gpu_memory_utilization': 0....",6,127.98 seconds,0.05,6,0.439484,0.883333,0.532576,0.20065,0.680556
2,3,Qwen/Qwen2.5-0.5B-Instruct,similarity,8,2,128,32,"{'temperature': 0.8, 'top_p': 0.95, 'max_token...","{'dtype': 'half', 'gpu_memory_utilization': 0....",6,120.68 seconds,0.05,6,0.458333,0.8,0.533069,0.200601,0.611111
3,4,Qwen/Qwen2.5-0.5B-Instruct,similarity,8,5,128,32,"{'temperature': 0.8, 'top_p': 0.95, 'max_token...","{'dtype': 'half', 'gpu_memory_utilization': 0....",6,101.20 seconds,0.06,6,0.458333,0.8,0.533069,0.200601,0.611111


### End Experiment

In [17]:
from google.colab import output
from IPython.display import display, HTML

display(HTML('''
<button id="continue-btn" style="padding: 10px 20px; font-size: 16px;">Click to End Experiment</button>
'''))

# eval_js blocks until the Promise resolves
output.eval_js('''
new Promise((resolve) => {
    document.getElementById("continue-btn").onclick = () => {
        document.getElementById("continue-btn").disabled = true;
        document.getElementById("continue-btn").innerText = "Continuing...";
        resolve("clicked");
    };
})
''')

# Actually end the experiment after the button is clicked
experiment.end()
print("Done!")

Experiment exp1-fiqa-rag-colab ended
Done!


### View RapidFire AI Log Files

In [18]:
# Get the experiment-specific log file
log_file = experiment.get_log_file_path()

print(f"üìÑ Log File: {log_file}")
print()

if log_file.exists():
    print("=" * 80)
    print(f"Last 30 lines of {log_file.name}:")
    print("=" * 80)
    with open(log_file, 'r', encoding='utf-8') as f:
        lines = f.readlines()
        for line in lines[-30:]:
            print(line.rstrip())
else:
    print(f"‚ùå Log file not found: {log_file}")

üìÑ Log File: /content/rapidfireai/logs/exp1-fiqa-rag-colab/rapidfire.log

Last 30 lines of rapidfire.log:
2026-01-13 20:20:52 | Controller | INFO | controller.py:1251 | [exp1-fiqa-rag-colab:Controller] Scheduling pipeline 4 (Pipeline 4) on actor 0 for shard 3 (1 batches)
2026-01-13 20:20:52 | QueryProcessingActor-0 | INFO | query_actor.py:143 | [exp1-fiqa-rag-colab:QueryProcessingActor-0] Reusing existing inference engine (config hash: f895be3c)
2026-01-13 20:20:52 | QueryProcessingActor-0 | INFO | query_actor.py:162 | [exp1-fiqa-rag-colab:QueryProcessingActor-0] Using CPU-based FAISS for retrieval (avoids GPU memory conflicts)
2026-01-13 20:20:52 | QueryProcessingActor-0 | INFO | query_actor.py:169 | [exp1-fiqa-rag-colab:QueryProcessingActor-0] Deserializing FAISS index for this actor...
2026-01-13 20:20:52 | sentence_transformers.SentenceTransformer | INFO | SentenceTransformer.py:227 | Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2
2026-01-13 20:20:54 

### Conclusion

We built a simple Financial Q&A RAG pipeline and compared **4 retrieval configurations** (chunking √ó reranking) using standard retrieval metrics.

Optional ideas to explore later:
- Increase `sample_fraction` (or run locally) for more reliable results.
- Try additional retrieval knobs (e.g., embedding model, `k`, chunk overlap) and re-run the same evaluation loop.
