# COGS 118B - Final Project

# SpaceInvaders

## Group members

- Brian Liu
- Valeria Avila
- Juan Villalobos

# Abstract 
This section should be short and clearly stated. It should be a single paragraph <200 words.  It should summarize: 
- what your goal/problem is
- what the data used represents 
- the solution/what you did
- major results you came up with (mention how results are measured) 

__NB:__ this final project form is much more report-like than the proposal and the checkpoint. Think in terms of writing a paper with bits of code in the middle to make the plots/tables

Using OpenAI's Space Invaders gaming environment, we will be training an AI agent to destroy as many invaders as possible by firing a laser cannon before the invaders reach Earth or before they hit the agent. Points are earned each time the agent destroys an invader, therefore, the success of how well the agent works will be determined by the amount of points it scored. This will be achieved by using the deep q learning algorithm over the course of 2000 episodes.

# Background
Space Invaders is a classic "shoot 'em up" arcade game developed in the 1970s in Japan. Space Invaders is considered one of the most influencial video games as it was the first fixed shooter. The way this game works is by having the user control a laser cannon and defeat a wave of aliens before they reach Earth or attacks the user. The more aliens that are destroyed, the more points are earned.

Space Invaders has been used frequently as a subject of intrest in the field of AI and game theory because of the game's simplicity and complexity at the same time. Heuristic search algorithms such as A-star [1] and genetic algorithms have been used to evaluate each possible move. Reinforcement learning techniques such as Q-learning or Markov Decision Processes have been used to train AI agents to play Space Invaders at a high level of proficiency.

AI is significant in Space Invaders because it gives a fun and interesting platform to develop and test AI algorithms.

# Problem Statement

Clearly describe the problem that you are solving. Avoid ambiguous words. The problem described should be well defined and should have at least one ML-relevant potential solution. Additionally, describe the problem thoroughly such that it is clear that the problem is quantifiable (the problem can be expressed in mathematical or logical terms), measurable (the problem can be measured by some metric and clearly observed), and replicable (the problem can be reproduced and occurs more than once).

# Data

Since the agent is being trained with deep q learning, we do not need any data, instead we will use an environment for the agent to learn in:
- https://www.gymlibrary.dev/environments/atari/space_invaders/
- This environment has an action space of 18 possible actions the agent can take.
- The observation space of this environment is 201 x 160, and is colored.
- Rewards/points are earned by destroying as many invaders as possible. Invaders destroyed are worth 5, 10, 15, 20, 25, 30 points in
the first through sixth rows respectively.
- There are an infinate amount of points that can be earned until the agent loses the battle.

# Proposed Solution

In this section, clearly describe a solution to the problem. The solution should be applicable to the project domain and appropriate for the dataset(s) or input(s) given. Provide enough detail (e.g., algorithmic description and/or theoretical properties) to convince us that your solution is applicable. Make sure to describe how the solution will be tested.  

If you know details already, describe how (e.g., library used, function calls) you plan to implement the solution in a way that is reproducible.

If it is appropriate to the problem statement, describe a benchmark model<a name="sota"></a>[<sup>[3]</sup>](#sotanote) against which your solution will be compared. 

# Evaluation Metrics

There are two forms of evaluation that make sense to use when playing Space Invaders: time played and score. Therefore, the evaluation for this agent will be the average score it obtains when playing Space Invaders over a series of games. Another metric can be the average duration of the game over a series of games. These evaluations help determine how well the agent is playing because they provide insights into the average score the agent achieves, which is a good indicator of its ability to accumulate points. The average time metric helps assess how long the agent can last before losing the game.

# Results

You may have done tons of work on this. Not all of it belongs here. 

Reports should have a __narrative__. Once you've looked through all your results over the quarter, decide on one main point and 2-4 secondary points you want us to understand. Include the detailed code and analysis results of those points only; you should spend more time/code/plots on your main point than the others.

If you went down any blind alleys that you later decided to not pursue, please don't abuse the TAs time by throwing in 81 lines of code and 4 plots related to something you actually abandoned.  Consider deleting things that are not important to your narrative.  If its slightly relevant to the narrative or you just want us to know you tried something, you could keep it in by summarizing the result in this report in a sentence or two, moving the actual analysis to another file in your repo, and providing us a link to that file.

### Subsection 1

You will likely have different subsections as you go through your report. For instance you might start with an analysis of the dataset/problem and from there you might be able to draw out the kinds of algorithms that are / aren't appropriate to tackle the solution.  Or something else completely if this isn't the way your project works.

### Subsection 2

Another likely section is if you are doing any feature selection through cross-validation or hand-design/validation of features/transformations of the data

### Subsection 3

Probably you need to describe the base model and demonstrate its performance.  Maybe you include a learning curve to show whether you have enough data to do train/validate/test split or have to go to k-folds or LOOCV or ???

### Subsection 4

Perhaps some exploration of the model selection (hyper-parameters) or algorithm selection task. Validation curves, plots showing the variability of perfromance across folds of the cross-validation, etc. If you're doing one, the outcome of the null hypothesis test or parsimony principle check to show how you are selecting the best model.

### Subsection 5 

Maybe you do model selection again, but using a different kind of metric than before?



# Discussion

### Interpreting the result

OK, you've given us quite a bit of tech informaiton above, now its time to tell us what to pay attention to in all that.  Think clearly about your results, decide on one main point and 2-4 secondary points you want us to understand. Highlight HOW your results support those points.  You probably want 2-5 sentences per point.

### Limitations
Two limitations we can think about with our project has to do with the time used to train our neural network and with the neural network itself.

We trained the agent for 2000 episodes but perhaps if we had trained it longer it would have produced better results. The longer the training the more the agent would have learned, hence, being able to play Space Invaders more precision and accuracy.

Training the AI with a deeper neural network would have produced another limitation.

Lastly, we believe that if we were improve on both of these aspects, there would have been an overall improvement in our results. Training a deeper neural network over the span of a longer time would be some aspects we would test further in the future.

### Ethics & Privacy
While training the agent, there could be issues concerning fairness. The individual may have biases when deciding how to train the agent. For example, there could be bias in determining which actions deserve a reward and which do not. This could lead to an unfair algorithm. Another ethics that we may run in to is when we play Space Invaders online. Since our project doesn't requrie any outside data, but our agent can be used against others, we will not be using this agent against others and only be training our agent within a closed environment. We will be objective by strictly following the rules that have governed Space Invaders for years.

### Conclusion

Reiterate your main point and in just a few sentences tell us how your results support it. Mention how this work would fit in the background/context of other work in this field if you can. Suggest directions for future work if you want to.

# Footnotes
<a name="lorenznote"></a>1.[^](#lorenz): Lorenz, T. (9 Dec 2021) Birds Aren’t Real, or Are They? Inside a Gen Z Conspiracy Theory. *The New York Times*. https://www.nytimes.com/2021/12/09/technology/birds-arent-real-gen-z-misinformation.html<br> 
<a name="admonishnote"></a>2.[^](#admonish): Also refs should be important to the background, not some randomly chosen vaguely related stuff. Include a web link if possible in refs as above.<br>
<a name="sotanote"></a>3.[^](#sota): Perhaps the current state of the art solution such as you see on [Papers with code](https://paperswithcode.com/sota). Or maybe not SOTA, but rather a standard textbook/Kaggle solution to this kind of problem
