# Darwin Plays Super Mario Bros
### *Training A Reward-Based Machine Learning Model to Play Classic Video Games*

**Author:** Xander Carroll    
**Course:** Physics 5680, Autumn 2025  
**Date:** October 30, 2025

**Project Repository:** [Link](https://github.com/Xander-Carroll/PHYSICS5680-Neural-Network-Final)
&nbsp;

<small>*I used GPT-5 mini to help craft several paragraphs of text throughout this notebook.*</small>

---

## Abstract


Classic video games offer a controlled and well-understood environment for experimenting with machine learning, providing clear objectives and measurable rewards without the complexity of modern games. This project will explore training a reward-based neural network to play Super Mario Bros on the Nintendo Entertainment System (NES). The game state will be extracted using an NES emulator and encoded into a representation suitable for a neural network. The network will then be trained to maximize in-game rewards such as level progress and the level timer. It is anticipated that this approach will demonstrate how simple, reward-driven models can learn effective strategies in constrained environments.

---

## 1. Introduction

### Problem Description
Developing artificial agents that can learn to play video games has long been a benchmark problem in machine learning and artificial intelligence. While modern games offer highly complex environments, classic games like Super Mario Bros for the Nintendo Entertainment System (NES) provide simpler, well-defined environments where objectives, rewards, and state representations are clear. The problem we aim to solve is training a reward-based neural network to play Super Mario Bros, using in-game feedback to guide learning. This requires designing a system that can interpret the game state, map it into a format suitable for a neural network, and optimize agent behavior to choose the controller inputs that will maximize cumulative rewards.

### Motivation
This project advances research in reinforcement learning by testing how reward-based neural networks can learn behavior in simple, well-defined environments. Classic games like Super Mario Bros provide an ideal platform for such experiments. It is easy to simulate, has clear objectives, and still requires sequential decision-making and adaptation. By evolving a neural network to play the game using only in-game rewards as feedback, this project explores how complex behavior can emerge from simple fitness functions.

### Background
Our primary environment is the NES version of [Super Mario Bros](https://en.wikipedia.org/wiki/Super_Mario_Bros).$`^1`$ This game is one of the most popular from its era, has been reverse engineered, and is very well documented by the community. Super Mario Bros is a side-scrolling platformer. The player's goal is to make forward progress, eventually reaching the end of the level while avoiding hazards. The [BizHawk](https://tasvideos.org/Bizhawk) NES emulator will be used to provide a programmatic interface to read the game's memory and feed controller inputs to the game in real-time.$`^2`$ This will allow the neural network to "see" and "interact" with the game. The problem is framed as a reward-based learning task, where the agent receives feedback proportional to in-game progress, creating a natural fitness function for optimization.

### Inputs and Outputs
Input(s): The algorithm will recieve the current game state from the BizHawk emulator. This will include a map of the level, with terrain layout, enemy locations, and the player's current progress. This information will be encoded into a representation suitable for neural network processing.

Output(s): The network will produce a set of controller inputs (e.g., hold left, press button A) to be executed in the game environment. Eventually, we expect these controller outputs to maximize the fitness function (player's foward progress in the level). 

### Project Goals

1. **Primary Goal:**
Train a fitness-based neural network to autonomously play Super Mario Bros and achieve measurable progress through one or more levels. The primary metric is maximizing cumulative in-game reward, including level completion.

1. **Stretch Goal:**
Extend the model to generalize across multiple levels or similar games. This will include testing whether a network trained on one level can adapt to unseen layouts.

---

## 2. Related Work

**Source 1:**

*Stanley, Kenneth O., and Risto Miikkulainen. "Evolving neural networks through augmenting topologies." Evolutionary computation 10.2 (2002): 99-127.*

This is one of the early papers describing NEAT (NeuroEvolution of Augmenting Topologies). The NEAT algorithim not only adjusts the weights of connections in neural networks, but also their topologies (the nodes and connections in the network). The network starts with no hidden nodes, and over time will evolve to add them.

**Source 2:**

*Sethbling. “MarI/O - Machine Learning for Video Games.” YouTube, YouTube, www.youtube.com/watch?v=qv6UVOQ0F44. Accessed 13 Nov. 2025.*

This project directly inspired my own work. It applies NEAT to train an artificial neural network to play Super Mario World for the Super Nintendo Entertainment System. The system evaluates agents based on a fitness function that rewards forward progress and then breeds the best performers to produce increasingly complex behavior over successive generations.

**Source 3:**

*Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).*

This paper presents the Deep Q‑Network (DQN) approach, where a convolutional neural network is trained with Q‑learning from raw pixel input to play multiple Atari 2600 games. 

**Source 4:**

*Chrispresso. “Ai Learns to Play Super Mario Bros Using a Genetic Algorithm and Neural Network.” Chrispresso, 14 Mar. 2020, chrispresso.github.io/AI_Learns_To_Play_SMB_Using_GA_And_NN. *

This project uses Deep Q learning to evolve a neural network capable of playing "Super Mario Bros" on the NES. Instead of training through gradient descent, the model’s weights are optimized using a fitness score that rewards progress and survival.

<br/>

**Strengths and Weaknesses:**

I now have two sources using the NEAT algorithim, and two sources using the Q-Learning algorthim. In both cases, I have a paper detailing the aproach, and a project where the technique is applied to a Mario game.

The NEAT-based approaches are simple and can be implemented from scratch, while the Q-Learning approaches will need to be implemented using a library like Tensorflow. However, deep learning libraries tend to have abundant documentation for use in python projects.

<br/>

**The Two Approaches and State of the Art:**

Modern state-of-the-art videogame playing agents tend to use deep learning methods such as DQN, which reliably learn from raw pixels using stable training techniques. Neuroevolution remains popular for small environments and hobbyist projects, especially when reward shaping or topology discovery is important. The related Mario projects are not state-of-the-art, and show both styles applied in practice.

---

## 3. Ethical Considerations

Because this project uses a self-contained video game environment, the ethical concerns are relatively minimal compared to many real-world machine learning applications. The data used to train the model comes entirely from the "Super Mario Bros" game itself. This means there is no personal, demographic, or sensitive data involved, and thus no risk of violating data privacy or introducing human-related bias. The training process is purely synthetic, relying on game-generated feedback rather than external datasets.

Potential misuse is also limited, as the model is designed for research and educational purposes. However, as with many AI techniques, the underlying algorithms (Deep Q-Learning and NEAT) could theoretically be adapted for automation or control systems in other domains, where ethical considerations such as fairness, accountability, or safety would become more significant.

---

## 4. Project Setup and Imports

### 4.1. Key Libraries

Below is a brief description of the primary Python packages that will be used in this project.

  * **pandas:** Used for data manipulation and analysis. It provides powerful data structures, like the DataFrame, for handling and exploring structured data.
  * **numpy:** The fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.
  * **matplotlib:** This library is for data visualization. Matplotlib is a comprehensive library for creating static visualizations, while Seaborn provides a high-level interface for drawing attractive and informative statistical graphics.
  * **TensorFlow:** An end-to-end open-source platform for machine learning, specializing in deep learning. It is used for building and training neural networks for tasks like image classification, natural language processing, and more.


In [1]:
# Import all necessary standard libraries here
import sys

# Import all necessary external libraries here
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

# Configure plots for readability
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 14

### 4.2. Version Information


In [5]:
# Print version information
print(f"Python version: {sys.version}")
print(f"pandas version: {pd.__version__}")
print(f"numpy version: {np.__version__}")
print(f"TensorFlow version: {tf.__version__}")

Python version: 3.13.7 (tags/v3.13.7:bcee1c3, Aug 14 2025, 14:15:11) [MSC v.1944 64 bit (AMD64)]
pandas version: 2.3.2
numpy version: 2.3.2
TensorFlow version: 2.20.0


---

## 5. The Dataset

**Description:**

The dataset for this project is not a static collection, but a stream of real-time input vectors extracted from the NES game "Super Mario Bros." as it runs inside the BizHawk emulator. Each sample corresponds to a single game frame and consists of a simple encoding for the state of the level.

Because the game runs at 60 frames per second, the agent will generate tens or hundreds of thousands of frames through its own gameplay attempts, which serve as the training data. Because the agent continuously produces new states through exploration, the traditional split into training, validation, and test sets does not directly apply. Instead, the "training" set will be the ammount of frames the agent spends learning the game, while the "testing" set is given by releasing the final agent to play various levels.

**Data Spesifics:**

The data is very simple. Each frame, N blocks around the player are represented with a state vector. The vector encodes "air" as zeros, "tiles" as ones, and "enemies" as negative ones. Later, additional information might be included in this vector, such as mario's position, the level timer, or the score.

**Preprocessing:**

All preprocessing occurs inside a Lua script running in BizHawk. The script reads specific memory addresses each frame (e.g., Mario’s position, tile map contents, and enemy locations) using the emulator’s API. These values are packaged into a compact vector representation of the current world state. An example of this encoding with N=6 (6 tiles of "vision" in any direction from the player) are shown below. In the second case, a red box is drawn with N=6 tiles around the player.

The Lua script then transmits each state vector to a Python backend via TCP, where it will be fed directly into the Q-learning model.

**Source:**

The dataset originates entirely from gameplay in Super Mario Bros (Nintendo, 1985) executed within the BizHawk open-source emulator. Data is collected from RAM while the game is being played.




Level 1 - Still Frame             |  Level 1 - Vector Encoding
:-------------------------:|:-------------------------:
![](img/level-1.png)  |  ![](img/matrix-1.png)


Level 2 - Still Frame             |  Level 2 - Vector Encoding
:-------------------------:|:-------------------------:
![](img/level-2.png)  |  ![](img/matrix-2.png)



---

## 6. Methods
(This section should be about a few paragraphs, describing the methods you are using.)

Describe the machine learning algorithm(s) you are using to address your problem. Focus on providing a clear, conceptual understanding of how each method works.

- **Algorithm Description:** Explain the core idea behind each algorithm you used. What is its main goal and how does it approach the problem? For example, for a decision tree, you would explain how it splits data based on features to make decisions.

- **Visual Aids:** Feel free to include figures or diagrams that help explain your algorithm. These can be powerful tools to illustrate complex ideas. NOTE: Any figures not made by you MUST have a clear reference to the original source.

   Example:

<img src="figs/text_embeddings.png" 
        alt="Picture" 
        width="600" 
        style="display: block; margin: 0 auto" />
<div style="text-align: right;">
<small><cite>Figure from: https://www.searchenginejournal.com/wp-content/uploads/2019/12/use-5de7e39f7ccbd.png</cite></small>
</div>



- **Mathematical Detail (Optional):** You do not need to include detailed mathematical equations unless they are critical to understanding your specific implementation or a unique aspect of the algorithm you are highlighting.

  Example:
  
  **The Cauchy-Schwarz Inequality**
$$\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)$$

- **Clarity:** Provide a concise description (approximately one paragraph) of how each algorithm works. Assume the reader is a fellow student in your major who may not be familiar with machine learning algorithms, so emphasize clarity and comprehension.

- **Subsections:** If you use multiple methods, break them out into subsections for clarity.

### 6.1. Method 1: (e.g., Logistic Regression)

*(Some explanatory text and figures if necessary.)*

### 6.2. Method 2: (e.g., Support Vector Machine)

*(Some explanatory text and figures if necessary.)*

In [None]:
# Define your model architectures, loss functions, and any helper functions here.

# Example: Define a simple model using scikit-learn
# from sklearn.linear_model import LogisticRegression
#
# model = LogisticRegression(random_state=42)

---

## 7. Results

*(This section should focus on the objective presentation of your findings. Present the data, metrics, and outputs from your experiments without interpretation.)*

* **Experimental Setup:** Briefly describe how you conducted your experiments.
    * **Hyperparameters:** Specify the final (hyper)parameters chosen for your models (e.g., learning rate, number of trees, regularization strength) and briefly mention the process used to select them (e.g., grid search, manual tuning).
    * **Cross-Validation:** Describe your use of cross-validation, including the number of folds if applicable.
* **Evaluation Metrics:** Clearly identify and explain the primary metrics you are using to evaluate your models (e.g., accuracy, precision, AUC, mean squared error). Provide the equations for any metrics that are not common.
* **Quantitative Results:** Present your main findings using tables and plots. This is the core of your results section.
    * For classification tasks, this should include final performance metrics and visualizations like confusion matrices or ROC/AUPRC curves. 
    * For regression tasks, this should include metrics like Mean Absolute Error (MAE) or R-squared values.
* **Qualitative Results:** Show specific examples of your model's outputs.
    * Display a few examples where the model performed correctly.
    * It is also crucial to show a few representative examples of where your algorithm failed.

**NOTE:** All figures need legends, axis labels, and readable font sizes. All tables should be clearly labeled.



### 7.1. Train and Evaluate Models

In [None]:
# Train your model(s) on the training data
# model.fit(X_train, y_train)

# Evaluate on the validation set to tune hyperparameters
# val_predictions = model.predict(X_val)

# Final evaluation on the test set
# test_predictions = model.predict(X_test)

### 7.2. Visualize Results

In [None]:
# Generate plots and tables to present your findings

# Example: Plot a confusion matrix
# cm = confusion_matrix(y_test, test_predictions)
# sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
# plt.xlabel('Predicted')
# plt.ylabel('Actual')
# plt.title('Confusion Matrix')
# plt.show()

# Example: Create a results table
# results = {
#     'Model': ['Logistic Regression', 'SVM'],
#     'Accuracy': [0.85, 0.92],
#     'Precision': [0.83, 0.91]
# }
# results_df = pd.DataFrame(results)
# display(results_df)

---

## 8. Discussion

*(In this section, you will interpret the results you presented above. Explain what your findings mean, analyze why the models behaved as they did, and discuss the implications of your work.)*

* **Interpretation of Results:** What do your results from the previous section actually mean? For instance, does a 95% accuracy on your test set mean the problem is solved? Why or why not?
* **Error Analysis:** Look at the qualitative results where your algorithm failed. Is there a pattern? Does the model consistently struggle with a specific class or type of data? Propose hypotheses for why these failures might be occurring.
* **Comparison of Methods:** If you used multiple algorithms, which performed best? Discuss potential reasons for the difference in performance. Did one model's assumptions better fit the data?
* **Overfitting:** Discuss any evidence of overfitting you observed (e.g., a large gap between training and validation performance). What steps did you take to mitigate it, and how effective were they?
* **Limitations:** What are the main limitations of your work? This could be related to the size of your dataset, the features you used, or the assumptions made by your models.

**NOTE:** Add sections as needed.

## 9. Conclusions & Future Work

* **Summary:** Summarize your report and reiterate the key findings.
* **Performance:** Which algorithms were the highest-performing? Why do you think some algorithms worked better than others?
* **Future Work:** If you had more time, more team members, or more computational resources, what would you explore next? (e.g., try different model architectures, collect more data, explore different features).

## References

1) https://en.wikipedia.org/wiki/Super_Mario_Bros
2) https://tasvideos.org/Bizhawk
3) https://github.com/Xander-Carroll/PHYSICS5680-Neural-Network-Final
