# Tetris RL script runner

Run the baseline policies and the vanilla policy gradient CNN.

Note: GUI render modes will open a window. If you are running headless, skip the render commands.


In [1]:
from pathlib import Path
import os

root = Path.cwd()
if (root / 'tetris_code').exists():
    pass
elif root.name == 'tetris_code' and (root.parent / 'tetris_code').exists():
    root = root.parent
    os.chdir(root)

print('Working directory:', Path.cwd())


Working directory: /Users/colinminini/Desktop/SCOC_ICE/RL_Tetris_Project


## Baseline policies (render)

These open a window to visualize the episode.


In [4]:
!python3 tetris_code/view_episode_policy_random.py


Game Over!
Final Score: 11


In [5]:
!python3 tetris_code/view_episode_policy_greedy.py


^Core: 21
Traceback (most recent call last):
  File "/Users/colinminini/Desktop/SCOC_ICE/RL_Tetris_Project/tetris_code/view_episode_policy_greedy.py", line 21, in <module>
    action = policies.policy_greedy(env) 
  File "/Users/colinminini/Desktop/SCOC_ICE/RL_Tetris_Project/tetris_code/policies.py", line 96, in policy_greedy
    observation, reward, terminated, truncated, info = new_env.step(action)
  File "/Users/colinminini/Desktop/SCOC_ICE/RL_Tetris_Project/.venv/lib/python3.13/site-packages/gymnasium/wrappers/common.py", line 393, in step
    return super().step(action)
  File "/Users/colinminini/Desktop/SCOC_ICE/RL_Tetris_Project/.venv/lib/python3.13/site-packages/gymnasium/core.py", line 327, in step
    return self.env.step(action)
  File "/Users/colinminini/Desktop/SCOC_ICE/RL_Tetris_Project/.venv/lib/python3.13/site-packages/gymnasium/wrappers/common.py", line 285, in step
    return self.env.step(action)
  File "/Users/colinminini/Desktop/SCOC_ICE/RL_Tetris_Project/.venv/lib

In [4]:
!python3 tetris_code/view_episode_policy_down.py


Game Over!
Final Score: 10


## Train the vanilla policy gradient CNN

Adjust `--episodes` and `--lr` as needed.


In [None]:
!python3 tetris_code/train_pg_cnn.py --episodes 100 --normalize-returns --save-path tetris_code/checkpoints/pg_cnn.pt


Episode 10/500 avg_reward=11.50 loss=-8.1618
Episode 20/500 avg_reward=12.00 loss=-3.4895
Episode 30/500 avg_reward=9.80 loss=12.9351
Episode 40/500 avg_reward=9.40 loss=-2.1371
Episode 50/500 avg_reward=9.40 loss=-2.6470
Episode 60/500 avg_reward=9.40 loss=9.7559
Episode 70/500 avg_reward=9.70 loss=-3.5654
Episode 80/500 avg_reward=9.70 loss=-5.8657
Episode 90/500 avg_reward=9.80 loss=-5.4945
Episode 100/500 avg_reward=9.80 loss=-3.3098
Episode 110/500 avg_reward=9.70 loss=-3.6400
Episode 120/500 avg_reward=9.90 loss=1.7315
Episode 130/500 avg_reward=9.70 loss=3.3003
Episode 140/500 avg_reward=9.60 loss=-1.6163
Episode 150/500 avg_reward=10.20 loss=-0.3797
Episode 160/500 avg_reward=10.00 loss=-0.1061
Episode 170/500 avg_reward=9.80 loss=-1.2423
Episode 180/500 avg_reward=9.90 loss=-2.4446
Episode 190/500 avg_reward=10.00 loss=1.9912
Episode 200/500 avg_reward=10.10 loss=4.5046
Episode 210/500 avg_reward=9.90 loss=3.6949
Episode 220/500 avg_reward=9.70 loss=-1.9723
Episode 230/500 avg

## Evaluate the trained policy

Use `--render` for a visual run, or `--stochastic` to sample actions.


In [6]:
!python3 tetris_code/evaluate_pg_cnn.py --episodes 10 --model-path tetris_code/checkpoints/pg_cnn.pt


Episode 1 reward=10.00
Episode 2 reward=9.00
Episode 3 reward=10.00
Episode 4 reward=9.00
Episode 5 reward=9.00
Episode 6 reward=9.00
Episode 7 reward=10.00
Episode 8 reward=9.00
Episode 9 reward=10.00
Episode 10 reward=11.00
Average reward=9.60 +/- 0.66 (n=10)


## Train DQN with after-states

This uses macro-actions (rotate/move + hard drop) and after-state values.


In [9]:
!python3 tetris_code/train_dqn_afterstate.py --episodes 50 --save-path tetris_code/checkpoints/dqn_afterstate.pt


Episode 10/50 avg_reward=-134.84 epsilon=0.993
Episode 20/50 avg_reward=-117.62 epsilon=0.988
Episode 30/50 avg_reward=-129.24 epsilon=0.981
Episode 40/50 avg_reward=-134.78 epsilon=0.974
Episode 50/50 avg_reward=-142.54 epsilon=0.968
Saved checkpoint to tetris_code/checkpoints/dqn_afterstate.pt


## Evaluate the DQN after-state agent

Use `--render` for a visual run.


In [16]:
!python3 tetris_code/evaluate_dqn_afterstate.py --episodes 10 --model-path tetris_code/checkpoints/dqn_afterstate.pt


^Cisode number 1 Score: 1802.0
Traceback (most recent call last):
  File "/Users/colinminini/Desktop/SCOC_ICE/Reinforcement Learning/tetris_code/evaluate_dqn_afterstate.py", line 168, in <module>
    main()
  File "/Users/colinminini/Desktop/SCOC_ICE/Reinforcement Learning/tetris_code/evaluate_dqn_afterstate.py", line 153, in main
    candidates = enumerate_after_states(env, sequences)
  File "/Users/colinminini/Desktop/SCOC_ICE/Reinforcement Learning/tetris_code/dqn_afterstate.py", line 173, in enumerate_after_states
    obs, _, done, info = simulate_sequence(env, seq)
  File "/Users/colinminini/Desktop/SCOC_ICE/Reinforcement Learning/tetris_code/dqn_afterstate.py", line 139, in simulate_sequence
    obs, reward, terminated, truncated, info = new_env.step(action)
  File "/Users/colinminini/Desktop/SCOC_ICE/Reinforcement Learning/.venv/lib/python3.13/site-packages/gymnasium/wrappers/common.py", line 393, in step
    return super().step(action)
  File "/Users/colinminini/Desktop/SCOC_IC

## Render a DQN episode

This will open a window to visualize the trained agent.


In [17]:
!python3 tetris_code/evaluate_dqn_afterstate.py --episodes 1 --model-path tetris_code/checkpoints/dqn_afterstate.pt --render


^Cisode number 1 Score: 2069.0
Traceback (most recent call last):
  File "/Users/colinminini/Desktop/SCOC_ICE/Reinforcement Learning/tetris_code/evaluate_dqn_afterstate.py", line 168, in <module>
    main()
  File "/Users/colinminini/Desktop/SCOC_ICE/Reinforcement Learning/tetris_code/evaluate_dqn_afterstate.py", line 130, in main
    obs, reward_sum, done, info = run_sequence(
  File "/Users/colinminini/Desktop/SCOC_ICE/Reinforcement Learning/tetris_code/dqn_afterstate.py", line 157, in run_sequence
    render_fn(env)
  File "/Users/colinminini/Desktop/SCOC_ICE/Reinforcement Learning/tetris_code/evaluate_dqn_afterstate.py", line 77, in render_step
    env.render()
  File "/Users/colinminini/Desktop/SCOC_ICE/Reinforcement Learning/.venv/lib/python3.13/site-packages/gymnasium/wrappers/common.py", line 409, in render
    return super().render()
  File "/Users/colinminini/Desktop/SCOC_ICE/Reinforcement Learning/.venv/lib/python3.13/site-packages/gymnasium/core.py", line 337, in render
   

## Render a single evaluation episode

This will open a window to visualize the trained policy.


In [None]:
!python3 tetris_code/evaluate_pg_cnn.py --episodes 1 --model-path tetris_code/checkpoints/pg_cnn.pt --render


Episode 1 reward=11.00
Episode 2 reward=10.00
Episode 3 reward=10.00
Episode 4 reward=10.00
Episode 5 reward=10.00
Episode 6 reward=10.00
Episode 7 reward=10.00
Episode 8 reward=10.00
Episode 9 reward=11.00
Episode 10 reward=10.00
Average reward=10.20 +/- 0.40 (n=10)
