# FakeLenseV2 - Training Guide

This notebook demonstrates how to train a FakeLenseV2 model from scratch.

## Topics Covered
1. Data preparation
2. Configuration setup
3. Training the model
4. Monitoring training progress
5. Evaluating results

## 1. Setup

In [None]:
import sys
sys.path.append('../..')

import json
import torch
from code.train import Trainer
from code.agents.fake_news_agent import FakeNewsAgent
from code.utils.feature_extraction import FeatureExtractor
from code.utils.config import get_default_config
from code.models.vectorizer import BaseVectorizer

print(f"Device: {'CUDA' if torch.cuda.is_available() else 'CPU'}")

## 2. Prepare Training Data

Your training data should be in JSON format:

In [None]:
# Example training data structure
example_data = [
    {
        "text": "The government announces new policy.",
        "source_reliability": "Reuters",
        "social_reactions": 5000,
        "label": 2  # 0=Fake, 1=Suspicious, 2=Real
    },
    {
        "text": "Aliens discovered in local park!",
        "source_reliability": "Unknown Blog",
        "social_reactions": 100000,
        "label": 0
    }
]

# Save as JSON (optional)
# with open('../../data/my_train_data.json', 'w') as f:
#     json.dump(example_data, f, indent=2)

print("Data format example:")
print(json.dumps(example_data[0], indent=2))

## 3. Load Training Data

In [None]:
# Load your training data
with open("../../data/train_data.json", "r", encoding="utf-8") as f:
    train_data = json.load(f)

print(f"Loaded {len(train_data)} training samples")
print(f"\nFirst sample:")
print(json.dumps(train_data[0], indent=2))

## 4. Configure Training Parameters

In [None]:
# Get default configuration
config = get_default_config()

# Customize for quick training (for demonstration)
config.update({
    "num_episodes": 10,       # Number of training episodes
    "batch_size": 32,          # Batch size for replay
    "learning_rate": 0.001,    # Learning rate
    "patience": 5,             # Early stopping patience
    "use_residual": True,      # Use Residual DQN
})

print("Training configuration:")
for key, value in config.items():
    if key in ["num_episodes", "batch_size", "learning_rate", "patience"]:
        print(f"  {key:20s}: {value}")

## 5. Initialize Components

In [None]:
# Initialize vectorizer and feature extractor
vectorizer = BaseVectorizer(model_name="bert-base-uncased")
feature_extractor = FeatureExtractor(vectorizer=vectorizer)

# Initialize agent
state_size = 770  # 768 (BERT) + 2 (metadata)
action_size = 3   # Fake, Suspicious, Real

agent = FakeNewsAgent(state_size, action_size, config)

print("✅ Components initialized")
print(f"   State size: {state_size}")
print(f"   Action size: {action_size}")
print(f"   Model: {agent.model.__class__.__name__}")

## 6. Create Trainer and Start Training

In [None]:
# Create trainer
trainer = Trainer(agent, feature_extractor, config)

# Start training
print("\n" + "="*60)
print("STARTING TRAINING")
print("="*60)

trainer.train(
    train_data,
    num_episodes=config["num_episodes"],
    patience=config["patience"]
)

print("\n✅ Training completed!")

## 7. Visualize Training Progress

The training curve is automatically saved to `models/training_curve.png`

In [None]:
import matplotlib.pyplot as plt
from IPython.display import Image, display

# Display training curve
try:
    display(Image(filename='../../models/training_curve.png'))
except:
    print("Training curve not found")

## 8. Test the Trained Model

In [None]:
from code.inference import InferenceEngine

# Load the trained model
engine = InferenceEngine("../../models/best_model.pth", config)

# Test prediction
test_text = "Scientists make breakthrough discovery in renewable energy."
prediction = engine.predict(
    text=test_text,
    source="Reuters",
    social_reactions=5000
)

labels = {0: "Fake News", 1: "Suspicious News", 2: "Real News"}
print(f"\nTest Prediction:")
print(f"Text: {test_text}")
print(f"Result: {labels[prediction]}")

## 9. Save Configuration

Save your training configuration for reproducibility.

In [None]:
# Save configuration
with open('../../models/training_config.json', 'w') as f:
    json.dump(config, f, indent=2)

print("✅ Configuration saved to models/training_config.json")

## 10. Next Steps

- **Evaluate**: Use the evaluation notebook to assess performance
- **Fine-tune**: Adjust hyperparameters and retrain
- **Deploy**: See API deployment guide

## Tips for Better Training

1. **More Data**: Use larger datasets for better generalization
2. **Balanced Dataset**: Ensure equal representation of all classes
3. **Hyperparameter Tuning**: Experiment with learning rate, batch size
4. **Longer Training**: Increase num_episodes for better convergence
5. **GPU**: Use GPU for faster training