## 0. Install Dependencies

Note: If the installation failes, try to remove the single quotes from `gym[atari]`.


In [None]:
!pip install tensorflow==2.3.1 gym keras-rl2 'gym[atari]'

## 1. Test Random Environment with OpenAI Gym

In [2]:
import gym
import random

### `import gym`

Imports the OpenAI gym library.

### `import random`

Imports the random library.

In [3]:
env = gym.make('SpaceInvaders-v0')
height, width, channels = env.observation_space.shape
actions = env.action_space.n

### `env = gym.make('SpaceInvaders-v0')`

Create the Space-Invaders gym environment.
This is the frame based version (there is also a RAM-based version available).


### `env.observation_space.shape`

Retrieving a first image from the environment, which is part of the v0 state.
Extracting the shape of the image.
This is needed for the neural network later on.

`height, width` -> height and width of the image.

`channels` -> color channels of the image.

### `actions = env.action_space.n`

This provides the number of available actions. 




In [4]:
env.unwrapped.get_action_meanings()

['NOOP', 'FIRE', 'RIGHT', 'LEFT', 'RIGHTFIRE', 'LEFTFIRE']

### `env.unwrapped.get_action_meanings()`

Get the action names which are available.
This are the actions our agent can actually take inside of the environment.

In [10]:
episodes:int = 5 # playing 5 episodes of Space invaders
for episode in range(episodes):
    state = env.reset()
    done: bool = False
    score: int = 0

    while not done:
        env.render()
        action = random.choice([n for n in range(6)])
        n_state, reward, done, info = env.step(action)
        # print('reward: {} -- done: {} -- info: {}'.format(reward, done, info))
        score += reward
    print('Episode: {} -- Score: {}'.format(episode, score))
env.close()

Episode: 0 -- Score: 230.0
Episode: 1 -- Score: 275.0
Episode: 2 -- Score: 50.0
Episode: 3 -- Score: 60.0
Episode: 4 -- Score: 135.0


### `state = env.reset()`

Resets the environment and returns the initial state.

### `done: bool = False`

If this variable is set to `True`, the game has finished and we can stop.
Initially this variable is set to `False`.

### `score: int = 0`

Setting the game score to 0.
This will be used as a measuremeht of the random actions taken.

### `action = random.choice([n for n in range(6)])`

Takes a random action in the range of 0 to 5.

The available actions are:

`['NOOP', 'FIRE', 'RIGHT', 'LEFT', 'RIGHTFIRE', 'LEFTFIRE']`.

Therefore the agents simply takes random actions in the environment and has no strategy for playing the game and will not learn one with this approach.

This allows us to see how a random set of choices perform inside of Space Invaders.


### `n_state, reward, done, info = env.step(action)`

`env.step(action)` applies the (randomly) chosen action and applies it on the agent.
As a result, the current state of the environment gets updated and this update is returned by this function.
Extract the next state, the reward for taking that action, if the game is done and some info about the game.


### `score += reward`

Each step taken increments our score with the given reward based on the current action.

### `print('Episode: {} -- Score: {}'.format(episode, score))`

Once the game of the current episode is done, the score of it gets printed.

### `env.close()`

Closes the environment.



## 2. Create a Deep Learning Model with Keras

In [12]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Convolution2D
from tensorflow.keras.optimizers import Adam

### `from tensorflow.keras.models import Sequential`

Imports the `Sequential` API.

This allows us to build sequential deep learning models.


### `from tensorflow.keras.layers import Dense, Flatten, Convolution2D`

Imports different types of layers needed for the deep learning model.


#### `Convolution2D`
Because we only receive an image from our Space-Invaders environment, the `Convolution2D` is needed for this type of input.

This scans through the image and analyses how the model is actually performing.

### from tensorflow.keras.optimizers import Adam

The `Adam` algorithm is used as an optimizer once the deep learning model is compiled.


In [None]:
def build_model(height, width, channels, actions):    model: Sequential = Sequential()
    model.add(Convolution2D(32, (8, 8), strides=(4, 4), activation='relu', input_shape=(3, height, width, channels)))
    model.add(Convolution2D(64, (4, 4), strides=(2, 2), activation='relu'))
    model.add(Convolution2D(64, (3, 3), activation='relu'))
    model.add(Flatten())

### `def build_model(height, width, channels, actions)`

The parameters in this function will define how the DL model will look like.

Because the model is image based, the model will start with convolution layers. 

#### `model.add(Convolution2D(32, (8, 8), strides=(4, 4), activation='relu', input_shape=(3, height, width, channels)))`

The `add` function is used to start stacking layers inside the deep neural network.


Now a short explanation of the parameters of `Convolution2D`:
- `32` -> is the number of filters. So the model has 32 convolutional 2d filters. These filters are trained to detect different things within the environments images.     Best case would be filters which detects enemies, the mothership, where the protecting bases/houses are, and so on.
- `(8, 8) -> specify how big this filters are going to be. So in this case, 8 units by 8 units.
- `strides=(4, 4)` -> The stride defines how the filter window passes through the image. In this case, 4 strides to the right, and 4 strides down, so the filter            window is moving diagonally.
- `activation='relu'` -> the activation function, in this case a `relu` function.
- `input_shape=(3, height, width, channels)` -> the 3 denotes that the model will pass through multiple frames from our reinforcement learning model to the deep learning model.

*After this, multiple Convolution2D layers will be added to the model*.


#### `model.add(Convolution2D(64, (4, 4), strides=(2, 2), activation='relu'))`

Uses more filters but the filters are going to be smaller than in the first layer.

The stride distance is also smaller than the one of the first layer and the activation function is yet again the `relu` activation function.


#### `model.add(Convolution2D(64, (3, 3), activation='relu'))`

Uses 3 by 3 filters and since the stride size is not defined for this Convolution2D layer, the stride distance defaults to 1 by 1.
So it is going pixel by pixel.


#### `model.add(Flatten())`

This step takes all convolutional layers and flattens them into a single layer.

This is done so the model can be passed to a dense layer.


### Dense layers

Dense layers are also known as **fully connected layers**.
So every element in a layer is connected with every element of the next layer.
As seen above, in order to use the current model structure, the convolutional layers have to be flattened into a single layer.

17:10

https://www.youtube.com/watch?v=hCeJeq8U0lo

https://www.youtube.com/watch?v=cO5g5qLrLSo



3. Build Agent with Keras-RL


4. Reloading Agent from Memory