# Deep Q-Learning for Space Invaders

This report documents the implementation, modifications, and results of training a Deep Q-Learning (DQL) model to play the Space Invaders game. The task involved adapting an example from the lecture material, running the model, and optimizing the implementation for time and resource efficiency. The trained model and the source code were submitted as per the task requirements.



## **Task Overview**

The original implementation of DQL required training for 50 million frames, a process demanding significant computational resources. Given the constraints of this project, the focus was on experementing with the training process. This document outlines the changes made to the model, the challenges faced, and the results obtained.


## **Modifications and Optimizations**

### **Initial Changes**

 **Activation Function:**
   - The activation function was changed from ReLU to Leaky ReLU.

 **Environment Recording:**
   - Added `env = RecordEpisodeStatistics(env)` to try to monitor and log episode statistics.

 **Running Reward Threshold:**
   - The initial running reward threshold was set to **9000** as a goal for performance evaluation.



### **Training Session 1 (Local Machine, Visual Studio)**

- **Result:**
  - The script ran for approximately 40 hours.
  - Achieved a best score of 685.0, with a running reward of 198.80 at episode 615 and frame count 340,000.
- **Challenge:**
  - Visual Studio crashed, interrupting the training process.



### **Training Session 2 (a try with Kaggle Notebooks, Local Machine)**

- Attempted to continue training using Kaggle Notebooks.
- **Issue:**
  - Encountered a `ValueError: bad marshal data (unknown type code)` when attempting to download and use the previously saved model.
- **Resolution:**
  - Switched back to training on the local machine.

### **Subsequent Changes**

 **Running Reward Adjustment:**
   - The running reward threshold was lowered to **1000** as 9000 was too optimistic.

 **Exploration Rate Adjustments:**
   - **Epsilon_max:** Reduced to **0.7**.
   - **Epsilon_min:** Reduced to **0.05**.

 **Memory Management:**
   - **Max_memory_length:** Reduced to **10,000** to alleviate memory issues on the local machine.

 **Result:**
  - Training reached a frame count of **4,930,000**.
  - However, Visual Studio crashed again due to memory limitations.



### **Training Session 3 (Local Machine)**

- Resumed training on the local machine after the crash due to "run out of memory" issues.


### **Final Adjustments**

 **Exploration Rate:**
   - Restored final exploration rate (ε) to **0.1**, as recommended in the original DeepMind paper.

 **Environment Vectorization:**
   - Attempted to vectorize the environment to speed up training.
   - **Issue:**
     - This resulted in very strange statistical outputs for rewards and best scores. Also, the process didn't seem to become faster (same 10000 frames per first 10 minutes), so training was reverted back to single-environment training.

 **Starting Epsilon:**
   - Set the starting epsilon (ε) to **0.5**, considering the continued nature of training.

### **Final Results**

- Visual Studio crashed again after reaching a total frame count of **5,990,000**.
- This was considered to be the final point of the training.
- The training process was distributed over three sessions, each lasting approximately 3 days.




## **Challenges and Resolutions**

### **Computational Limitations**
- Limited computational power restricted the ability to train even for 10 million frames as the school task suggested.

### **Software Crashes**
- Visual Studio crashes due to memory issues were mitigated by:
  - Reducing `max_memory_length`.



## **Conclusion**

Despite computational and technical challenges, the model was trained to achieve best score of last 100: 825.0, running reward: 257.40 at episode 9680.

The project highlights the importance of adapting machine learning models to computational resources and iteratively improving implementation to overcome challenges. Further optimization and access to higher computational resources would allow for extended training and improved performance.


## **Repository Documentation**

The folder "code" contains the code for the 3 training sessions.

The folder "process" contains some theoretical research as well the log notes made during the training process.

The folder "viz" contains two last video outputs as well as the statistics and its visualization for two last training sessions.

