# Training a DQN Agent for Atari Breakout

This notebook provides a guided walkthrough for running the training script for our Deep Q-Learning (DQN) agent. 

All the core logic (the DQN model, replay buffer, and training loop) is neatly organized in the `src/` directory. The `train.py` script is the main entry point that brings all these modules together.

For a complete theoretical breakdown of *how* this works (Q-Learning, Bellman Equation, Experience Replay, etc.), please see the `README.md` file.

## 1. Setup & Dependencies

First, we need to install the necessary libraries. The `requirements.txt` file handles most of them, but the `gymnasium` Atari package requires a special installation command to accept the game ROM licenses.

In [None]:
# Install dependencies from requirements.txt
!pip install -r requirements.txt

# Install gymnasium with Atari support AND accept the ROM license
!pip install -q "gymnasium[atari,accept-rom-license]"

## 2. Running the Training Script

We can now execute the `train.py` script directly from this notebook. 

We will run a **shorter training session** than the default to demonstrate the process. A full run to achieve high performance can take millions of steps (days of training). 

Here's a breakdown of the custom arguments we'll use:
* `--num_episodes 1500`: Run for 1,500 episodes (a short demo).
* `--replay_memory_init_size 10000`: Populate the buffer with 10k random steps before training (reduced from 50k for a quick start).
* `--epsilon_decay_steps 100000`: Decay epsilon from 1.0 to 0.1 over 100k steps (reduced from 500k).
* `--max_steps_per_episode 5000`: **Note:** The original notebook had this at 1000, which was too low and likely caused the agent to stop prematurely. We use a more reasonable 5,000 steps.
* `--experiment_dir ./experiments/notebook_run`: Save logs and videos to a specific directory.

Training will log to the console and to `logs/training.log`.

In [None]:
!python train.py \
    --num_episodes 1500 \
    --replay_memory_init_size 10000 \
    --epsilon_decay_steps 100000 \
    --update_target_estimator_every 5000 \
    --max_steps_per_episode 5000 \
    --experiment_dir ./experiments/notebook_run

## 3. Monitoring with TensorBoard

The script saves all training metrics (reward, loss, episode length, etc.) to the `experiments/` directory. You can use TensorBoard to visualize this data in real-time.

You can run the following command in a **separate terminal** from the `tf-atari-dqn-breakout` directory:

In [None]:
# You must run this from a separate terminal in the project's root directory
# tensorboard --logdir=./experiments

After running the command, open your browser to `http://localhost:6006` to see the training graphs.

You will see the metrics from our `notebook_run` and any other runs you've started.