In [88]:
import numpy
import matplotlib.pyplot as plt
import gymnasium
from gymnasium.wrappers import RecordVideo, FrameStack
from gymnasium.experimental.wrappers import GrayscaleObservationV0, FrameStackObservationV0
from gymnasium.utils.play import play

# Example of Basic Project Setup

In [89]:
#  sets the precision to 3 decimal places.
numpy.set_printoptions(precision=3)

In [90]:
env = gymnasium.make("ALE/SpaceInvaders-v5", render_mode='rgb_array')

# Getting the Observation Space in the Game

In [91]:
env.observation_space.shape

(210, 160, 3)

The first dimension, 210, represents the height of the observation image. It indicates that the observation is an image with a height of 210 pixels.

The second dimension, 160, represents the width of the observation image. It indicates that the observation is an image with a width of 160 pixels.

The third dimension, 3, represents the number of color channels in the observation image. In this case, 3 indicates that the observation image is in RGB format, meaning it has three color channels: red, green, and blue.

# Dimensionality Reductions

## Gray Scale

In [92]:
grayscale_env = GrayscaleObservationV0(env)
grayscale_env.observation_space.shape

(210, 160)

It indicates that the observation space is a 2-dimensional array.

The first dimension, 210, represents the height of the grayscale observation image. It indicates that the grayscale observation is an image with a height of 210 pixels.

The second dimension, 160, represents the width of the grayscale observation image. It indicates that the grayscale observation is an image with a width of 160 pixels.



## FrameSkip (Frameskip 4 is default)

Frame skipping is commonly used in reinforcement learning with Atari games to speed up training and reduce the computational requirements. By skipping frames, the agent's action is applied to the environment for every n-th frame, where n is the skip value.

There are a few reasons why frame skipping is used:

1. Reduced computation: Skipping frames allows the agent to process fewer frames per second, reducing the computational requirements and speeding up training. Atari games typically run at 60 frames per second, and processing all frames can be computationally expensive.
2. Temporal abstraction: Frame skipping provides a form of temporal abstraction by aggregating multiple frames into a single action. This can help the agent capture motion patterns and make decisions based on the movement of objects in the game.
3. Reacting to changing states: Skipping frames allows the agent to react to changing states in the game more quickly. If the agent only processed every frame, it might miss important changes in the environment between consecutive frames.
4. Exploration: Frame skipping can improve exploration by allowing the agent to take a wider range of actions and explore different game states more efficiently. It helps to prevent the agent from getting stuck in repetitive actions.

In [93]:
env = gymnasium.make("ALE/SpaceInvaders-v5", render_mode='rgb_array')
print(env.metadata)
#env.metadata['frame_skip'] = 4

{'render_modes': ['human', 'rgb_array'], 'obs_types': {'ram', 'rgb', 'grayscale'}, 'frame_skip': 4}


# Optimizing Data for Analysis

## Framestack (A form of log()the images)

In [94]:
frameStack = FrameStackObservationV0(env, 4)

In [95]:
frameStack.observation_space.shape

(4, 210, 160, 3)

 It indicates that the observation space is a 4-dimensional array with a shape of (4, 210, 160, 3).

 Let's break down the meaning of each dimension:
 
1. The first dimension, $4$, represents the number of frames stacked together. It indicates that the observation consists of a stack of 4 consecutive frames.
2. The second dimension, $210$, represents the height of each frame in the stack. It indicates that each frame has a height of 210 pixels.
3. The third dimension, $160$, represents the width of each frame in the stack. It indicates that each frame has a width of 160 pixels.
4. The fourth dimension, $3$, represents the number of color channels in each frame. It indicates that each frame has three color channels (red, green, and blue).

# Fram Stacking vs Frame Skipping

Frame stacking and frame skipping are two different techniques used in reinforcement learning to process and utilize sequential observations from an environment. Here's an explanation of each technique:
1. Frame Stacking:
- Frame stacking involves creating a stack of consecutive frames as the input observation for the agent. Instead of providing a single frame at each time step, multiple frames are stacked together.
- Stacking frames helps to capture temporal information and provide a sense of motion to the agent. By considering multiple frames in sequence, the agent can observe changes over time and make more informed decisions.
- The stacked frames are typically used as input to the agent's neural network. The depth of the stack determines how many previous frames the agent can see and utilize for decision making.
- Frame stacking is useful in scenarios where the dynamics and motion in the environment play an important role, such as in video games or tasks with fast-paced movements.
2. Frame Skipping:
- Frame skipping involves skipping a certain number of frames between each consecutive observation given to the agent. Instead of processing every frame, only a subset of frames is considered.
- Skipping frames helps to reduce the computational load and accelerate training by reducing the number of environment steps taken.
- The skipped frames are typically ignored in terms of observation and action selection. The agent only receives observations and takes actions based on the frames that are not skipped.
- Frame skipping is useful in scenarios where high-frequency actions or changes in the environment may not be necessary to process at every time step. Skipping frames can help in situations where the relevant information is spread out across longer time intervals.

It's important to note that frame stacking and frame skipping can be used together in reinforcement learning algorithms. Frame stacking captures temporal information within each observation, while frame skipping reduces computational overhead by processing fewer frames. These techniques can be combined to provide a more efficient and informative representation of the environment to the agent.

# SpaceInvaders has the action space Discrete(6)

| Value | Meaning    |
|-------|------------|
| 0     | NOOP       |
| 1     | FIRE       |
| 2     | RIGHT      |
| 3     | LEFT       |
| 4     | RIGHTFIRE  |
| 5     | LEFTFIRE   |


# ENV reset variables

In [96]:
obs, info = env.reset()

In [97]:
print(obs)

[[[ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]
  ...
  [ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]]

 [[ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]
  ...
  [ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]]

 [[ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]
  ...
  [ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]]

 ...

 [[80 89 22]
  [80 89 22]
  [80 89 22]
  ...
  [80 89 22]
  [80 89 22]
  [80 89 22]]

 [[80 89 22]
  [80 89 22]
  [80 89 22]
  ...
  [80 89 22]
  [80 89 22]
  [80 89 22]]

 [[80 89 22]
  [80 89 22]
  [80 89 22]
  ...
  [80 89 22]
  [80 89 22]
  [80 89 22]]]


The printed 'obs' represents the observation returned by the environment after taking a random action. Here's how to interpret the shape and values of the 'obs' array:
- The shape of 'obs' is '(height, width, channels) ", where:
- "height 'refers to the number of rows in the observation image.
- 'width' refers to the number of columns in the observation image.
- 'channels" refers to the number of color channels in the observation image (e.g., RGB images have 3 channels).
- Each element of the 'obs' array represents the pixel values of the observation image. In this case, it appears to be a grayscale image since each pixel value is a single integer.
- The pixel values range from 0 to 255 , representing the intensity or brightness of each pixel. In your example, most of the pixel values seem to be 0 , indicating a black or dark region in the image.
- The array is structured as a 3-dimensional NumPy array, with each dimension representing a different aspect of the observation image: rows, columns, and channels.
- The printed representation shows a subset of the pixels from the observation image. Each row represents a horizontal line of pixels in the image, and each column represents a vertical column of pixels.

In [98]:
print(info)

{'lives': 3, 'episode_frame_number': 0, 'frame_number': 0}


# ENV step variables

In [99]:
action = env.action_space.sample() # Just a random agent
obs, reward, terminated, truncated, info = env.step(action)

In [100]:
print(obs)

[[[ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]
  ...
  [ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]]

 [[ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]
  ...
  [ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]]

 [[ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]
  ...
  [ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]]

 ...

 [[80 89 22]
  [80 89 22]
  [80 89 22]
  ...
  [80 89 22]
  [80 89 22]
  [80 89 22]]

 [[80 89 22]
  [80 89 22]
  [80 89 22]
  ...
  [80 89 22]
  [80 89 22]
  [80 89 22]]

 [[80 89 22]
  [80 89 22]
  [80 89 22]
  ...
  [80 89 22]
  [80 89 22]
  [80 89 22]]]


AI gets reward everytime it shoots an alien, reward is accumlated as the total score.

In [101]:
print(reward)

0.0


In [102]:
print(terminated)

False


In [103]:
print(truncated)

False


In [104]:
print(info)

{'lives': 3, 'episode_frame_number': 4, 'frame_number': 4}


# Difficulty

In [105]:

# To add diffculty to the game we must declar it, 0 is default difculty or 1 is
difficulty=1

env = gymnasium.make("ALE/SpaceInvaders-v5", render_mode='rgb_array', difficulty = difficulty)