# 03 - Agent Training (Interactive)

**Goal:** Train our DQN agent and watch it learn in real-time!

---

## What Happens During Training

1. **Agent observes** market state (21 technical indicators)
2. **Agent chooses action** (Long, Short, or Hold)
3. **Environment responds** with new state and reward
4. **Agent learns** by updating Q-values
5. **Repeat** for 100,000 timesteps

## What We'll Visualize

- Episode rewards over time (is it learning?)
- Exploration vs exploitation (epsilon decay)
- Q-value evolution (agent's confidence)
- Action distribution (what does it prefer?)
- Portfolio value progression
- Training loss (neural network optimization)

---

In [None]:
# Imports
import sys
sys.path.append("..")

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import torch
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# RL imports
from stable_baselines3 import DQN
from stable_baselines3.common.callbacks import BaseCallback, EvalCallback
from stable_baselines3.common.monitor import Monitor
from src.environments.timing_env import TimingEnv
from src.utils.config import ConfigLoader

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

print("✓ Imports successful")

# Check device
if torch.xpu.is_available():
    device = 'xpu'
    print(f"✓ Using Intel XPU: {torch.xpu.get_device_name(0)}")
elif torch.cuda.is_available():
    device = 'cuda'
    print(f"✓ Using CUDA GPU")
else:
    device = 'cpu'
    print("Using CPU (training will be slower)")