# Training Notebook

This notebook demonstrates the fine-tuning pipeline for the RAG system.

## Purpose
This notebook handles the training of the underlying models. It demonstrates:

1.  **Dataset Preparation**: Loading and formatting data for training.
2.  **Fine-Tuning**: Training the LLM and/or embedding models on Cirq-specific datasets to improve domain understanding.
3.  **Model Saving**: Persisting the fine-tuned models for use in the RAG system.

## Usage
Use this notebook to fine-tune the models when new data becomes available or to improve performance on specific tasks.


In [None]:
import sys
import os
from pathlib import Path

# Add project root to path
project_root = Path("..").resolve()
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

from src.cirq_rag_code_assistant.config import get_config, setup_logging
from src.rag.fine_tuning import FineTuner

# Setup logging
setup_logging()

### Configure Training
Set up the paths and parameters.

In [None]:
# Configuration
BASE_MODEL = "qwen2.5-coder:14b"
OUTPUT_DIR = project_root / "outputs" / "models" / "fine_tuned"
DATASET_PATH = project_root / "data" / "datasets" / "training_data.jsonl"

# Ensure dataset exists (create dummy if not)
if not DATASET_PATH.exists():
    DATASET_PATH.parent.mkdir(parents=True, exist_ok=True)
    with open(DATASET_PATH, "w") as f:
        f.write('{"text": "dummy data"}\n')

# Initialize FineTuner
trainer = FineTuner(
    base_model_path=BASE_MODEL,
    output_dir=str(OUTPUT_DIR),
    dataset_path=str(DATASET_PATH)
)

print("FineTuner initialized.")

### Run Training
Start the training process.

In [None]:
try:
    result = trainer.train(
        epochs=1,
        batch_size=2,
        learning_rate=1e-4
    )
    
    if result['success']:
        print("Training Completed Successfully!")
        print(f"Final Loss: {result['final_loss']}")
        print(f"Model saved to: {result['model_path']}")
    else:
        print("Training Failed.")
        
except Exception as e:
    print(f"Error during training: {e}")