# A/B Testing Functionality Demo

This notebook demonstrates the A/B testing functionality of the RAG Engine Mini.

## Learning Objectives

By the end of this notebook, you will understand:
1. How the A/B testing functionality works in the RAG Engine
2. How to set up and run A/B tests
3. How to interpret A/B test results
4. The architecture of the A/B testing service
5. How A/B testing fits into RAG optimization

In [None]:
import sys
import os
from pathlib import Path
import asyncio
import json
from datetime import datetime

# Add the project root to the path
project_root = Path("../")
sys.path.insert(0, str(project_root))

print(f"Project root: {project_root}")
print("Environment set up successfully")

## Understanding the A/B Testing Architecture

The A/B testing functionality follows the same architectural patterns as the rest of the RAG Engine:

1. **Port/Adapter Pattern**: The `ABTestingServicePort` defines the interface
2. **Dependency Injection**: Services are injected through the container
3. **Separation of Concerns**: A/B logic is separate from API logic
4. **Statistical Analysis**: Built-in statistical significance calculations
5. **Experiment Lifecycle**: Full management from creation to analysis

In [None]:
# Let's look at the A/B testing service definition
from src.application.services.ab_testing_service import ABTestingService, ABExperiment, ExperimentVariant, ExperimentStatus, VariantType

print("A/B Testing Service Components:")
print(f"- A/B Testing Service: {ABTestingService.__name__}")
print(f"- A/B Experiment: {ABExperiment.__name__}")
print(f"- Experiment Variant: {ExperimentVariant.__name__}")

print(f"\nExperiment statuses available:")
for status in ExperimentStatus:
    print(f"- {status.value}")

print(f"\nVariant types available:")
for variant_type in VariantType:
    print(f"- {variant_type.value}")

print(f"\nA/B testing service methods: {[method for method in dir(ABTestingService) if not method.startswith('_') and callable(getattr(ABTestingService, method, None))]}\n")

## Using the A/B Testing Service

Let's see how to use the A/B testing service to create and manage experiments:

In [None]:
# Import required classes
from src.application.services.ab_testing_service import ABTestingService, ABExperiment, ExperimentVariant, ExperimentStatus, VariantType

# Create the A/B testing service
ab_service = ABTestingService()

print("A/B testing service initialized successfully")

## Creating an A/B Test Experiment

Let's create an experiment to compare two different LLM models:

In [None]:
# Define our experiment comparing two LLM models
experiment = ABExperiment(
    experiment_id="llm-comparison-001",
    name="GPT-3.5 vs GPT-4 Response Quality",
    description="Comparing response quality between GPT-3.5 and GPT-4 for RAG queries",
    status=ExperimentStatus.DRAFT,
    variants=[
        ExperimentVariant(
            name="gpt-3.5-control",
            description="Using GPT-3.5 as the base model",
            traffic_split=0.5,
            config={"model": "gpt-3.5-turbo", "temperature": 0.7},
            variant_type=VariantType.CONTROL
        ),
        ExperimentVariant(
            name="gpt-4-treatment",
            description="Using GPT-4 as the improved model",
            traffic_split=0.5,
            config={"model": "gpt-4", "temperature": 0.7},
            variant_type=VariantType.TREATMENT
        )
    ],
    metrics=["response_time", "user_satisfaction_score", "answer_relevance"],
    created_by="rag-engine-admin",
    hypothesis="GPT-4 will produce higher quality answers with similar response times",
    owner="ai-team"
)

# Create the experiment
try:
    created_experiment = asyncio.run(ab_service.create_experiment(experiment))
    print("✅ A/B test experiment created successfully")
    print(f"   Experiment ID: {created_experiment.experiment_id}")
    print(f"   Name: {created_experiment.name}")
    print(f"   Status: {created_experiment.status}")
    print(f"   Variants: {len(created_experiment.variants)}")
    print(f"   Metrics: {created_experiment.metrics}")
except Exception as e:
    print(f"❌ Failed to create experiment: {e}")

## Activating the Experiment

Now let's activate the experiment to start assigning users to variants:

In [None]:
try:
    activated_experiment = asyncio.run(
        ab_service.update_experiment_status(
            experiment_id=created_experiment.experiment_id,
            status=ExperimentStatus.ACTIVE
        )
    )
    print("✅ A/B test experiment activated successfully")
    print(f"   Experiment ID: {activated_experiment.experiment_id}")
    print(f"   New Status: {activated_experiment.status}")
    print(f"   Updated at: {activated_experiment.updated_at}")
except Exception as e:
    print(f"❌ Failed to activate experiment: {e}")

## Assigning Users to Variants

Let's simulate assigning users to different variants in the experiment:

In [None]:
# Simulate assigning several users to variants
users = [f"user-{i}" for i in range(10)]
assignments = []

print("Assigning users to experiment variants:\n")

for user_id in users:
    try:
        assignment = asyncio.run(
            ab_service.assign_variant(
                experiment_id=created_experiment.experiment_id,
                user_id=user_id,
                context={"user_type": "premium" if i % 2 == 0 else "standard"}
            )
        )
        assignments.append(assignment)
        print(f"✅ User {user_id} assigned to variant: {assignment.variant_name}")
    except Exception as e:
        print(f"❌ Failed to assign user {user_id}: {e}")

print(f"\nTotal assignments made: {len(assignments)}")

# Count assignments by variant
variant_counts = {}
for assignment in assignments:
    variant = assignment.variant_name
    variant_counts[variant] = variant_counts.get(variant, 0) + 1

print("\nDistribution by variant:")
for variant, count in variant_counts.items():
    print(f"  {variant}: {count} users")

## Tracking Events for Analysis

Now let's simulate tracking events during the experiment:

In [None]:
# Simulate tracking events for each assignment
print("Tracking events for analysis:\n")

for i, assignment in enumerate(assignments):
    # Generate simulated metrics based on the variant
    if "gpt-4" in assignment.variant_name:
        # GPT-4 variant performs slightly better
        satisfaction_score = 4.2 + (i % 3) * 0.1  # Slightly higher scores
        response_time = 1.2 + (i % 5) * 0.1
        relevance = 0.85 + (i % 4) * 0.02
    else:
        # GPT-3.5 variant baseline performance
        satisfaction_score = 3.8 + (i % 3) * 0.1
        response_time = 1.1 + (i % 5) * 0.1
        relevance = 0.78 + (i % 4) * 0.02

    # Track different types of events
    events_to_track = [
        {"type": "user_satisfaction_score", "value": satisfaction_score},
        {"type": "response_time", "value": response_time},
        {"type": "answer_relevance", "value": relevance}
    ]
    
    for event in events_to_track:
        try:
            asyncio.run(
                ab_service.track_event(
                    experiment_id=created_experiment.experiment_id,
                    user_id=assignment.user_id,
                    variant_name=assignment.variant_name,
                    event_type=event["type"],
                    value=event["value"],
                    metadata={"session_id": f"session-{i}", "timestamp": assignment.assigned_at.isoformat()}
                )
            )
        
        except Exception as e:
            print(f"❌ Failed to track event for user {assignment.user_id}: {e}")

print(f"✅ Tracked events for {len(assignments)} users")

## Analyzing Experiment Results

Now let's get the results of our A/B test:

In [None]:
try:
    results = asyncio.run(
        ab_service.get_experiment_results(created_experiment.experiment_id)
    )
    
    print("A/B Test Results:")
    print(f"- Experiment ID: {results.experiment_id}")
    print(f"- Winner: {results.winner or 'No clear winner'}")
    print(f"- Statistically Significant: {results.is_significant}")
    print(f"- Conclusion: {results.conclusion}")
    
    print(f"\nVariant Results:")
    for variant_name, metrics in results.variant_results.items():
        print(f"  \033[1m{variant_name}\033[0m:")
        print(f"    - Total Events: {metrics.get('total_events', 0)}")
        print(f"    - Conversion Rate: {metrics.get('conversion_rate', 0):.3f}")
        print(f"    - Average Value: {metrics.get('average_value', 0):.3f}")
        
    print(f"\nStatistical Significance:")
    for metric, comparisons in results.statistical_significance.items():
        print(f"  \033[1m{metric}\033[0m:")
        for variant, stats in comparisons.items():
            print(f"    - vs {variant}: p-value = {stats.get('p_value', 'N/A'):.3f}, ")
            print(f"               significant = {stats.get('significant', 'N/A')}, ")
            print(f"               effect_size = {stats.get('effect_size', 'N/A'):.3f}")
            
except Exception as e:
    print(f"❌ Failed to get experiment results: {e}")

## Calculating Sample Size

Let's see how to calculate the required sample size for an experiment:

In [None]:
try:
    # Calculate sample size needed for an experiment
    # Assuming a baseline conversion rate of 10%, wanting to detect a 3% absolute improvement
    sample_size = asyncio.run(
        ab_service.calculate_sample_size(
            baseline_conversion_rate=0.10,  # 10% baseline
            minimum_detectable_effect=0.03,  # Want to detect 3% absolute improvement
            significance_level=0.05,  # 95% confidence
            power=0.8  # 80% power
        )
    )
    
    print(f"Required Sample Size Calculation:")
    print(f"- Baseline conversion rate: 10%")
    print(f"- Minimum detectable effect: 3% absolute")
    print(f"- Confidence level: 95%")
    print(f"- Statistical power: 80%")
    print(f"- Required sample size: {sample_size:,} total participants")
    print(f"- Recommended: {sample_size//2:,} per variant")
    
except Exception as e:
    print(f"❌ Failed to calculate sample size: {e}")

## API Endpoints

The A/B testing functionality is also available through API endpoints. Let's examine the routes:

In [None]:
# Import the API router to see available endpoints
from src.api.v1.routes_ab_testing import router

print("A/B Testing API routes:")
for route in router.routes:
    if hasattr(route, 'methods') and hasattr(route, 'path'):
        print(f"- {list(route.methods)}: {route.path}")

print(f"\nTotal A/B testing API routes: {len([r for r in router.routes if hasattr(r, 'methods')])}")

## How A/B Testing Benefits RAG Systems

A/B testing is especially valuable in RAG systems for optimizing various components:

1. **Model Comparison**: Compare different LLMs for response quality
2. **Prompt Engineering**: Test different prompt strategies
3. **Retrieval Methods**: Compare vector vs keyword search effectiveness
4. **Chunking Strategies**: Evaluate different document segmentation approaches
5. **Reranking Algorithms**: Test different re-ranking methods
6. **System Parameters**: Optimize temperature, top_p, and other settings

## Summary

In this notebook, we explored the A/B testing functionality of the RAG Engine Mini:

1. **Architecture**: The A/B testing service follows the same architectural patterns as the rest of the system
2. **Experiment Management**: Full lifecycle from creation to analysis
3. **Statistical Analysis**: Built-in significance testing and result interpretation
4. **API Access**: Multiple endpoints for programmatic access
5. **RAG Optimization**: Specific value for improving RAG system components

A/B testing is essential for continuously improving RAG systems, allowing teams to make data-driven decisions about which models, prompts, and algorithms perform best in production environments. The RAG Engine's A/B testing implementation provides comprehensive tools for conducting rigorous experiments with proper statistical analysis.