CartPole RL Suite — Neural Networks & Reinforcement Learning

Course project demonstrating three distinct AI learning paradigms on the CartPole-v1 environment.

What Is CartPole?

A cart sits on a frictionless track. A pole is hinged to the top of the cart. On every timestep you push the cart left or right. The goal is to keep the pole upright for as long as possible. An episode ends when:

The pole angle exceeds 12 from vertical, or
The cart moves more than 2.4 units from centre, or
500 timesteps are reached (perfect score = 500)

The 4-dimensional state vector fed to every model:

Index	Feature	Range
0	Cart position	2.4
1	Cart velocity
2	Pole angle (rad)	0.21
3	Pole angular velocity

Three Learning Approaches

1. Deep Q-Network — cartpole_dqn.py

Pure reinforcement learning. No human input. The agent uses trial-and-error and a neural network Q-function to learn which action maximises future reward.

Key DQN concepts implemented:

Experience replay buffer — stores past (s, a, r, s, done) tuples, samples random mini-batches to break temporal correlations
Target network — a frozen copy of the Q-network updated every N steps; stabilises training
Epsilon-greedy exploration — starts nearly random (ε1), decays to ε=0.05 so the agent exploits what it has learned
Custom Q-network architecture — configurable MLP hidden layers via --net-arch
EvalCallback — evaluates the deterministic policy on a separate env every 5 000 steps, completely isolated from exploration noise
TensorBoard logging — logs loss, Q-values, episode stats to ./tb_logs/
Global seeds — andom, umpy, orch, and SB3 all seeded for reproducible runs

Result achieved: 500/500 reward (maximum possible) from ~95 000 steps. Training time 2.5 minutes on CPU.

2. Human Play & Data Collection — cartpole_human.py

An interactive Pygame game where a human plays CartPole with keyboard controls. Every (state action) pair from every step is recorded and written to human_data.pkl and human_data.csv.

What the game does:

Opens two windows: a stats panel (600500 Pygame) and the CartPole physics window (Gymnasium)
Stats panel shows: high score, rolling average score, episode counter, total steps recorded
Live pole danger bar — a colour gauge (green red) showing the current pole angle as a % of the 12 failure limit, so you can see how critical the situation is in real time
3-2-1 countdown before each episode (2.9 s total — fast enough to not be annoying)
Episode result screen (AMAZING / EXCELLENT / GOOD / NOT BAD / FELL) — waits for SPACE before continuing
Data is saved after every episode — quitting early with Q loses nothing
After all episodes, offers to immediately train an imitation model on your data and watch it play

Controls:

Key	Action
A / LEFT	Push cart left
D / RIGHT	Push cart right
SPACE	Start next episode
Q / ESC	Quit

3. Imitation Learning — cartpole_imitation.py

Behavioural cloning — pure supervised learning on the recorded human gameplay. No reward signal. No environment interaction during training.

How it works: human gameplay (state, action) pairs train classifier policy: state action

At inference the CartPole env gives 4 numbers classifier predicts left or right action taken. It copies your decision-making, so quality scales directly with how good and how much data you provide.

Three classifiers:

Model	Best for

Training pipeline:

Loads human_data.pkl, extracts states + actions
Data sufficiency check — warns if <200 samples, refuses training if <50; prints the majority-class baseline (always predict most common action) so you know the trivial benchmark
StandardScaler — normalises all 4 features to zero mean/unit variance; fitted only on the training split to prevent data leakage into the test set
Auto-scaled MLP architecture — prevents overfitting: (32,16) for <300 samples, (64,32) for <1 000, (128,64) for larger sets
early_stopping=True — MLP training stops when validation loss plateaus rather than hammering a fixed iteration limit
5-fold cross-validation inside a sklearn Pipeline — each fold scales independently; gives reliable std accuracy estimates
Prints baseline vs model accuracy with a pass/fail indicator
Analysis plot: feature importance (or weights), human action distribution, MLP training loss curve

Quick Start

Install dependencies

�ash pip install -r requirements.txt

Full pipeline (recommended for the course demo)

Step 1: Train the DQN agent (~2.5 min) �ash python cartpole_dqn.py --train --timesteps 100000

Step 2: Play the game and collect data (~5–10 min) �ash python cartpole_human.py --episodes 20

Play at least 20 episodes for the imitation model to be meaningful.

Step 3: Watch the trained DQN play �ash python cartpole_dqn.py --test

Step 4: Train imitation models and compare all three �ash python cartpole_imitation.py --model compare --evaluate

Advanced CLI

cartpole_dqn.py

`�ash

Tune every DQN hyperparameter from the command line

python cartpole_dqn.py --train
--timesteps 100000
--lr 1e-3
--buffer-size 50000
--batch-size 32
--gamma 0.99
--exploration-fraction 0.1
--final-eps 0.05
--target-update 500
--net-arch 256 256
--seed 42

Test without recording video

python cartpole_dqn.py --test --no-video

Open TensorBoard dashboard

tensorboard --logdir tb_logs `

cartpole_human.py

�ash python cartpole_human.py --episodes 20 # play 20 episodes python cartpole_human.py --auto-demo # skip prompt, auto-train AI python cartpole_human.py --no-demo # skip AI demo entirely python cartpole_human.py --output my_data.pkl # custom save path

cartpole_imitation.py

�ash python cartpole_imitation.py --model neural_network --evaluate python cartpole_imitation.py --model logistic_regression --evaluate python cartpole_imitation.py --model compare # train all 3, pick best python cartpole_imitation.py --model neural_network --save my_model.pkl python cartpole_imitation.py --load my_model.pkl --evaluate --no-render

DQN Training Results

The plot shows:

Faded line — raw per-episode reward (noisy due to epsilon exploration)
Solid line — 50-episode rolling mean
Shaded band — 1 standard deviation
Lower panel — epsilon decay from ~1.0 0.05

Deterministic policy (EvalCallback) reached 500/500 by ~95 000 steps.

Generated Files

File / Folder	Created by	Description
dqn_cartpole.zip	cartpole_dqn.py --train	Trained DQN model weights
�est_model/	cartpole_dqn.py --train	Best checkpoint (EvalCallback)

Project Structure

NNRL Project/ cartpole_dqn.py # DQN agent: train, test, full hyperparameter CLI cartpole_human.py # Interactive game: human play + data collection cartpole_imitation.py # Behavioural cloning from recorded human data convert_to_gif.py # Convert videos/mp4 to gif for sharing test_human_control.py # Standalone test for Pygame/gym setup requirements.txt # All Python dependencies with minimum versions README.md # This file dqn_cartpole.zip # Pre-trained DQN model (500/500 reward) reward_curve.png # Training reward + epsilon decay plot imitation_analysis_neural_network.png human_data.pkl # Example recorded gameplay (binary) human_data.csv # Same data as readable CSV videos/ # Recorded test episode MP4s best_model/ # Best DQN checkpoint (auto-saved during training) tb_logs/ # TensorBoard training logs .venv/ # Python virtual environment

Comparison of All Three Approaches

	Human Play	Imitation Learning	DQN
Data source	Your keyboard	Your recordings	Environment reward signal
Training time	Instant (you play)	Seconds–minutes	~2.5 min on CPU
Reward achieved	~10–50 steps (beginner)	Mirrors your skill	500/500 (optimal)
Learns from mistakes	You do, manually	No	Yes
Needs human demos	Yes (you are the demo)	Yes	No
Key ML concept	Manual policy	Supervised learning / classification	Off-policy RL + neural Q-function

Requirements

Python 3.8+ — install everything with:

�ash pip install -r requirements.txt

Core packages: gymnasium[classic-control] stable-baselines3 orch scikit-learn pygame matplotlib ensorboard
ich

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
best_model		best_model
tb_logs/DQN_1		tb_logs/DQN_1
videos		videos
.gitignore		.gitignore
EVALUATION.md		EVALUATION.md
QUICK_START_GUIDE.md		QUICK_START_GUIDE.md
README.md		README.md
cartpole_1v1.py		cartpole_1v1.py
cartpole_dagger.py		cartpole_dagger.py
cartpole_dqn.py		cartpole_dqn.py
cartpole_human.py		cartpole_human.py
cartpole_imitation.py		cartpole_imitation.py
compare_all.py		compare_all.py
comparison_chart.png		comparison_chart.png
comparison_table.csv		comparison_table.csv
convert_to_gif.py		convert_to_gif.py
dagger_final_model.pkl		dagger_final_model.pkl
dagger_progress.csv		dagger_progress.csv
dagger_progress.png		dagger_progress.png
dqn_cartpole.zip		dqn_cartpole.zip
expert_data.csv		expert_data.csv
expert_data.pkl		expert_data.pkl
generate_expert_data.py		generate_expert_data.py
human_data.csv		human_data.csv
human_data.pkl		human_data.pkl
imitation_analysis_neural_network.png		imitation_analysis_neural_network.png
imitation_analysis_random_forest.png		imitation_analysis_random_forest.png
out.txt		out.txt
requirements.txt		requirements.txt
reward_curve.png		reward_curve.png
rl_fighter.py		rl_fighter.py
test_human_control.py		test_human_control.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CartPole RL Suite — Neural Networks & Reinforcement Learning

What Is CartPole?

Three Learning Approaches

1. Deep Q-Network — cartpole_dqn.py

2. Human Play & Data Collection — cartpole_human.py

3. Imitation Learning — cartpole_imitation.py

Quick Start

Install dependencies

Full pipeline (recommended for the course demo)

Advanced CLI

cartpole_dqn.py

Tune every DQN hyperparameter from the command line

Test without recording video

Open TensorBoard dashboard

cartpole_human.py

cartpole_imitation.py

DQN Training Results

Generated Files

Project Structure

Comparison of All Three Approaches

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CartPole RL Suite — Neural Networks & Reinforcement Learning

What Is CartPole?

Three Learning Approaches

1. Deep Q-Network — cartpole_dqn.py

2. Human Play & Data Collection — cartpole_human.py

3. Imitation Learning — cartpole_imitation.py

Quick Start

Install dependencies

Full pipeline (recommended for the course demo)

Advanced CLI

cartpole_dqn.py

Tune every DQN hyperparameter from the command line

Test without recording video

Open TensorBoard dashboard

cartpole_human.py

cartpole_imitation.py

DQN Training Results

Generated Files

Project Structure

Comparison of All Three Approaches

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages