# ðŸ§  RAG Second Brain v16 - Full Experiment

**Paper:** Co-occurrence, Sequence and Knowledge Graph with Ontology as a Second Brain for AI-LLM

This notebook runs the complete experiment with:
- Dense retrieval (E5-base-v2)
- PPMI co-occurrence retrieval (REAL implementation)
- KG + OWL reasoning retrieval (REAL implementation)
- RRF and Learned Gating fusion

**Runtime:** ~15-30 minutes on T4 GPU

## 1. Setup Environment

In [None]:
# Clone the repository
!git clone https://github.com/Anirach/rag-second-brain.git
%cd rag-second-brain

In [None]:
# Install dependencies
!pip install -q torch transformers sentence-transformers faiss-gpu \
    owlready2 networkx scipy datasets tqdm spacy
!python -m spacy download en_core_web_sm -q

In [None]:
# Verify GPU
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## 2. Run Full Experiment

In [None]:
# Run experiment with 500 samples (for paper results, use full dataset)
!PYTHONPATH=. python scripts/run_experiment.py --n_samples 500

## 3. Full Dataset Experiment (for paper)

Run this for final paper results (~2-3 hours on T4):

In [None]:
# Uncomment to run full experiment
# !PYTHONPATH=. python scripts/run_experiment.py --n_samples 7405

## 4. View Results

In [None]:
import json
from pathlib import Path

results_path = Path("experiments/results/experiment_results.json")
if results_path.exists():
    with open(results_path) as f:
        results = json.load(f)
    
    print("=" * 70)
    print(f"{'Method':<12} | {'R@5':>8} | {'R@10':>8} | {'R@20':>8} | {'Both@10':>8}")
    print("-" * 70)
    
    for method in ["dense", "ppmi", "kg", "rrf"]:
        m = results["metrics"][method]
        r5 = m.get("recall@5", 0)
        r10 = m.get("recall@10", 0)
        r20 = m.get("recall@20", 0)
        b10 = m.get("both@10", 0)
        print(f"{method.upper():<12} | {r5:>8.3f} | {r10:>8.3f} | {r20:>8.3f} | {b10:>8.3f}")
    
    print("=" * 70)
    
    # Calculate improvement
    dense_r10 = results["metrics"]["dense"]["recall@10"]
    rrf_r10 = results["metrics"]["rrf"]["recall@10"]
    improvement = ((rrf_r10 - dense_r10) / dense_r10) * 100
    print(f"\nðŸ“Š RRF improves over Dense by {improvement:.1f}% relative at R@10")
else:
    print("Results not found. Run the experiment first.")

## 5. Download Results

In [None]:
from google.colab import files
files.download('experiments/results/experiment_results.json')