
# Darwin Plays Super Mario Bros
### *Training A Reward-Based Machine Learning Model to Play Classic Video Games*


**Author:** Xander Carroll    
**Course:** Physics 5680, Autumn 2025  
**Date:** October 30, 2025

**Project Repository:** [Link](https://github.com/Xander-Carroll/PHYSICS5680-Neural-Network-Final)
&nbsp;

<small>*I used GPT-5 mini to help craft several paragraphs of text throughout this notebook.*</small>

---

## Abstract

Classic video games offer a controlled and well-understood environment for experimenting with machine learning, providing clear objectives and measurable rewards without the complexity of modern games. This project will explore training a reward-based neural network to play Super Mario Bros on the Nintendo Entertainment System (NES). The game state will be extracted using an NES emulator and encoded into a representation suitable for a neural network. The network will then be trained to maximize in-game rewards such as level progress and the level timer. It is anticipated that this approach will demonstrate how simple, reward-driven models can learn effective strategies in constrained environments.

---

## 1. Introduction

### Problem Description
Developing artificial agents that can learn to play video games has long been a benchmark problem in machine learning and artificial intelligence. While modern games offer highly complex environments, classic games like Super Mario Bros for the Nintendo Entertainment System (NES) provide simpler, well-defined environments where objectives, rewards, and state representations are clear. The problem we aim to solve is training a reward-based neural network to play Super Mario Bros, using in-game feedback to guide learning. This requires designing a system that can interpret the game state, map it into a format suitable for a neural network, and optimize agent behavior to choose the controller inputs that will maximize cumulative rewards.

### Motivation
This project advances research in reinforcement learning by testing how reward-based neural networks can learn behavior in simple, well-defined environments. Classic games like Super Mario Bros provide an ideal platform for such experiments. It is easy to simulate, has clear objectives, and still requires sequential decision-making and adaptation. By evolving a neural network to play the game using only in-game rewards as feedback, this project explores how complex behavior can emerge from simple fitness functions.

### Background
Our primary environment is the NES version of [Super Mario Bros](https://en.wikipedia.org/wiki/Super_Mario_Bros).$`^1`$ This game is one of the most popular from its era, has been reverse engineered, and is very well documented by the community. Super Mario Bros is a side-scrolling platformer. The player's goal is to make forward progress, eventually reaching the end of the level while avoiding hazards. The [BizHawk](https://tasvideos.org/Bizhawk) NES emulator will be used to provide a programmatic interface to read the game's memory and feed controller inputs to the game in real-time.$`^2`$ This will allow the neural network to "see" and "interact" with the game. The problem is framed as a reward-based learning task, where the agent receives feedback proportional to in-game progress, creating a natural fitness function for optimization.

### Inputs and Outputs
Input(s): The algorithm will recieve the current game state from the BizHawk emulator. This will include a map of the level, with terrain layout, enemy locations, and the player's current progress. This information will be encoded into a representation suitable for neural network processing.

Output(s): The network will produce a set of controller inputs (e.g., hold left, press button A) to be executed in the game environment. Eventually, we expect these controller outputs to maximize the fitness function (player's foward progress in the level). 

### Project Goals

1. **Primary Goal:**
Train a fitness-based neural network to autonomously play Super Mario Bros and achieve measurable progress through one or more levels. The primary metric is maximizing cumulative in-game reward, including level completion.

1. **Stretch Goal:**
Extend the model to generalize across multiple levels or similar games. This will include testing whether a network trained on one level can adapt to unseen layouts.

---

## 2. Related Work

*(This section should briefly review key studies or projects relevant to your topic.)*

Make sure you address the following in this section:

* **Methodological Review:** Find and group existing papers or projects based on their methodological approaches.
* **Strengths and Weaknesses:** Discuss the pros and cons of these approaches and how they compare to your work.
* **State-of-the-Art:** Highlight clever or effective methods you found and describe the current state-of-the-art in the field.
* **References:** Include at least 3-5 relevant references, covering previous attempts and methods applied to similar problems. *Note: Google Scholar is great for sourcing papers and managing citations: https://scholar.google.com/.*

---

## 3. Ethical Considerations

*(This section should be 1-2 paragraphs.)*

Machine learning models can have a significant societal impact. Briefly discuss the ethical implications of your project.

* **Data Privacy & Bias:** Where did your data come from? Does it contain sensitive information? Could the dataset or the way it was collected introduce bias (e.g., gender, racial, or geographical bias)?
* **Potential Misuse:** Could the algorithm or its results be used for malicious or harmful purposes?
* **Impact:** Who could be positively or negatively affected by the application of your model in the real world?


-----

## 4. Project Setup and Imports

*(This initial section is for setting up your project's environment. It includes importing the necessary Python libraries and documenting their versions to ensure your work is reproducible.)*

### 1.1. Key Libraries

Below is a brief description of the primary Python packages that will be used in this project.

  * **pandas:** Used for data manipulation and analysis. It provides powerful data structures, like the DataFrame, for handling and exploring structured data.
  * **numpy:** The fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.
  * **matplotlib & seaborn:** These are libraries for data visualization. Matplotlib is a comprehensive library for creating static visualizations, while Seaborn provides a high-level interface for drawing attractive and informative statistical graphics.
  * **Plotly:** A graphing library used to create interactive, publication-quality visualizations.  This is especially useful for creating dynamic plots that allow for zooming, panning, and hovering to inspect data points.
  * **scikit-learn:** A primary machine learning library that provides simple and efficient tools for data mining and data analysis, including a wide range of classification, regression, and clustering algorithms.
  * **TensorFlow:** An end-to-end open-source platform for machine learning, specializing in deep learning. It is used for building and training neural networks for tasks like image classification, natural language processing, and more.



In [None]:
# JUST AN EXAMPLE!!! Yours will probably be different!!

# Import all necessary libraries here
# import pandas as pd
# import numpy as np
# import matplotlib.pyplot as plt
# import seaborn as sns
# import plotly
# import plotly.express as px
# import sklearn
# import tensorflow as tf
# import sys

# from sklearn.model_selection import train_test_split
# from sklearn.metrics import confusion_matrix, accuracy_score


# # Configure plots for readability
# plt.rcParams['figure.figsize'] = (10, 6)
# plt.rcParams['font.size'] = 14

# print("Libraries imported successfully.")


### 4.2. Version Information

*(To ensure that the results in this notebook can be reproduced, it is important to record the versions of the key libraries used. The code below will print the versions of Python and the packages listed above.)*



In [None]:
# JUST AN EXAMPLE!!! Yours will probably be different!!
# Print version information
# print(f"Python version: {sys.version}")
# print(f"pandas version: {pd.__version__}")
# print(f"numpy version: {np.__version__}")
# print(f"seaborn version: {sns.__version__}")
# print(f"plotly version: {plotly.__version__}")
# print(f"scikit-learn version: {sklearn.__version__}")
# print(f"TensorFlow version: {tf.__version__}")


---

## 5. The Dataset

*This should be quite extensive.*

Describe the dataset(s) you are using for your project.

* **Description:** What is your dataset? How many training, validation, and test examples are there?
* **Preprocessing:** Discuss any preprocessing techniques you applied, such as normalization, handling missing values, or data augmentation.
* **Data Specifics:** Provide details on data specifics, e.g., image resolution, time-series discretization, or text encoding.
* **Source:** Cite the source from where you obtained your dataset.
* **Visualization:** You **must** show examples of your data.
    * Display examples from your dataset, especially examples from different classes.
    * Highlight and display important features that describe your dataset and/or are nmeeded for classification/regression/clustering.


**NOTE:** All figures need legends, axis labels, and readable font sizes.

### 5.1. Load and Preprocess Data

In [None]:
# Load your dataset here (e.g., from a CSV file)
# df = pd.read_csv('path/to/your/data.csv')

# Perform any necessary preprocessing steps
# - Handle missing values
# - Normalize numerical features
# - Encode categorical variables

# Split the data into training, validation, and test sets
# train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)
# train_df, val_df = train_test_split(train_df, test_size=0.25, random_state=42) # 0.25 * 0.8 = 0.2

# print(f"Training set size: {len(train_df)}")
# print(f"Validation set size: {len(val_df)}")
# print(f"Test set size: {len(test_df)}")

### 5.2. Exploratory Data Analysis (EDA)

In [None]:
# Create visualizations to understand your data
# - Histograms of feature distributions
# - Correlation heatmaps
# - Example data points (e.g., show a few images or text samples)

# Example: Plot a histogram of a feature named 'age'
# sns.histplot(data=df, x='age', kde=True)
# plt.title('Distribution of Age')
# plt.show()

---

## 6. Methods
(This section should be about a few paragraphs, describing the methods you are using.)

Describe the machine learning algorithm(s) you are using to address your problem. Focus on providing a clear, conceptual understanding of how each method works.

- **Algorithm Description:** Explain the core idea behind each algorithm you used. What is its main goal and how does it approach the problem? For example, for a decision tree, you would explain how it splits data based on features to make decisions.

- **Visual Aids:** Feel free to include figures or diagrams that help explain your algorithm. These can be powerful tools to illustrate complex ideas. NOTE: Any figures not made by you MUST have a clear reference to the original source.

   Example:

<img src="figs/text_embeddings.png" 
        alt="Picture" 
        width="600" 
        style="display: block; margin: 0 auto" />
<div style="text-align: right;">
<small><cite>Figure from: https://www.searchenginejournal.com/wp-content/uploads/2019/12/use-5de7e39f7ccbd.png</cite></small>
</div>



- **Mathematical Detail (Optional):** You do not need to include detailed mathematical equations unless they are critical to understanding your specific implementation or a unique aspect of the algorithm you are highlighting.

  Example:
  
  **The Cauchy-Schwarz Inequality**
$$\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)$$

- **Clarity:** Provide a concise description (approximately one paragraph) of how each algorithm works. Assume the reader is a fellow student in your major who may not be familiar with machine learning algorithms, so emphasize clarity and comprehension.

- **Subsections:** If you use multiple methods, break them out into subsections for clarity.

### 6.1. Method 1: (e.g., Logistic Regression)

*(Some explanatory text and figures if necessary.)*

### 6.2. Method 2: (e.g., Support Vector Machine)

*(Some explanatory text and figures if necessary.)*

In [None]:
# Define your model architectures, loss functions, and any helper functions here.

# Example: Define a simple model using scikit-learn
# from sklearn.linear_model import LogisticRegression
#
# model = LogisticRegression(random_state=42)

---

## 7. Results

*(This section should focus on the objective presentation of your findings. Present the data, metrics, and outputs from your experiments without interpretation.)*

* **Experimental Setup:** Briefly describe how you conducted your experiments.
    * **Hyperparameters:** Specify the final (hyper)parameters chosen for your models (e.g., learning rate, number of trees, regularization strength) and briefly mention the process used to select them (e.g., grid search, manual tuning).
    * **Cross-Validation:** Describe your use of cross-validation, including the number of folds if applicable.
* **Evaluation Metrics:** Clearly identify and explain the primary metrics you are using to evaluate your models (e.g., accuracy, precision, AUC, mean squared error). Provide the equations for any metrics that are not common.
* **Quantitative Results:** Present your main findings using tables and plots. This is the core of your results section.
    * For classification tasks, this should include final performance metrics and visualizations like confusion matrices or ROC/AUPRC curves. 
    * For regression tasks, this should include metrics like Mean Absolute Error (MAE) or R-squared values.
* **Qualitative Results:** Show specific examples of your model's outputs.
    * Display a few examples where the model performed correctly.
    * It is also crucial to show a few representative examples of where your algorithm failed.

**NOTE:** All figures need legends, axis labels, and readable font sizes. All tables should be clearly labeled.



### 7.1. Train and Evaluate Models

In [None]:
# Train your model(s) on the training data
# model.fit(X_train, y_train)

# Evaluate on the validation set to tune hyperparameters
# val_predictions = model.predict(X_val)

# Final evaluation on the test set
# test_predictions = model.predict(X_test)

### 7.2. Visualize Results

In [None]:
# Generate plots and tables to present your findings

# Example: Plot a confusion matrix
# cm = confusion_matrix(y_test, test_predictions)
# sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
# plt.xlabel('Predicted')
# plt.ylabel('Actual')
# plt.title('Confusion Matrix')
# plt.show()

# Example: Create a results table
# results = {
#     'Model': ['Logistic Regression', 'SVM'],
#     'Accuracy': [0.85, 0.92],
#     'Precision': [0.83, 0.91]
# }
# results_df = pd.DataFrame(results)
# display(results_df)

---

## 8. Discussion

*(In this section, you will interpret the results you presented above. Explain what your findings mean, analyze why the models behaved as they did, and discuss the implications of your work.)*

* **Interpretation of Results:** What do your results from the previous section actually mean? For instance, does a 95% accuracy on your test set mean the problem is solved? Why or why not?
* **Error Analysis:** Look at the qualitative results where your algorithm failed. Is there a pattern? Does the model consistently struggle with a specific class or type of data? Propose hypotheses for why these failures might be occurring.
* **Comparison of Methods:** If you used multiple algorithms, which performed best? Discuss potential reasons for the difference in performance. Did one model's assumptions better fit the data?
* **Overfitting:** Discuss any evidence of overfitting you observed (e.g., a large gap between training and validation performance). What steps did you take to mitigate it, and how effective were they?
* **Limitations:** What are the main limitations of your work? This could be related to the size of your dataset, the features you used, or the assumptions made by your models.

**NOTE:** Add sections as needed.

## 9. Conclusions & Future Work

* **Summary:** Summarize your report and reiterate the key findings.
* **Performance:** Which algorithms were the highest-performing? Why do you think some algorithms worked better than others?
* **Future Work:** If you had more time, more team members, or more computational resources, what would you explore next? (e.g., try different model architectures, collect more data, explore different features).

## References

1) https://github.com/Xander-Carroll/PHYSICS5680-Neural-Network-Final
2) https://en.wikipedia.org/wiki/Super_Mario_Bros
3) https://tasvideos.org/Bizhawk