This is the self-contained real Super Mario Bros PPO vs PPO+RND demo.
- Demo HTML:
real_mario_demo.html - Training scripts:
scripts/ - Completed plan:
real_mario_execution_plan_completed.md - Requirements:
requirements_mario_demo.txt
- Rollout GIFs/videos:
mario_results/rollouts/andmario_results/experiments/ - Charts:
mario_results/figures/ - Metrics:
mario_results/run_summary.json - CSV metrics:
mario_results/csv/ - Experiment A 1M run:
mario_results/experiments/experiment_a_ppo_1m/ - PPT files:
PPT/
Large model checkpoints, runtime logs, PID files, cache files, QA screenshots, and archived exploratory runs are local artifacts and are excluded by .gitignore.
python scripts/mario_sb3_rnd_demo.py --total-timesteps 50000 --random-steps 1200 --seed 0 --rnd-beta 0.05 --rnd-lr 0.00001python scripts/experiment_a_ppo_1m.py --total-timesteps 1000000 --n-envs 4