### 1. Installing and importing all necessary dependencies ###

In [1]:
%pip install gym[atari]
%pip install ale_py
%pip install autorom[accept-rom-license]

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.



In [3]:
import gym

### 2. Creating our environment ###
By using the function gym.make() we create our environment. We pass the following parameters to the function: 
| Parameter | Value | Explanation |
| :--- | :----: | :--- |
| name | "ALE/Freeway-v5" | This is the name of our game. We are using the newest version v5. ALE (Arcade Learning Environment) is a framework used to develop AI agents for Atari games. |
| difficulty | 1 | The game provides two different difficulties. Using difficulty = 1, the chicken is moved back to the start after each collision. For difficulty = 0, it is only thrown back. |
| mode | 3 | The game provides eight different modes. As the value increases, so does the number and the speed of the cars. |
| obs_type | "rgb" | This argument determines what observations are returned by the environment. By using "rgb" it returns a rgb image as an observation |
| frameskip |  | This argument influences frame skipping (for more information read the paragraph Stochasticity) |
| repeat_action_probability |  | This argument sets the probability for sticky actions (for more information read the paragraph Stochasticity) |
| render_mode | "human" | This will display the game while its running. During the training of the agent, this parameter is deactivated. |

In [4]:
env = gym.make("ALE/Freeway-v5", difficulty = 1, mode = 3, obs_type = "rgb", frameskip = 1, render_mode = "human")

#### Observation space ####

In [3]:
observation_space = env.observation_space
print("Our environment returns the following observation: {}".format(observation_space))
print("Since {} is the shape of our observation space, we recieve a 210px * 160px rgb image.".format(observation_space.shape))
print("The image gets stored as a three-dimensional array. The dimensions are 210, 160 and 3.")
print("If you want to get an example of an image try printing observation_space.sample().")
#print(observation_space.sample())
print("By using the uint data type (unsinged int using 8 bits) the lowest possible entry is 0 while 255 is the highest possible entry")

Our environment returns the following observation: Box(0, 255, (210, 160, 3), uint8)
Since (210, 160, 3) is the shape of our observation space, we recieve a 210px * 160px rgb image.
The image gets stored as a three-dimensional array. The dimensions are 210, 160 and 3.
If you want to get an example of an image try printing observation_space.sample().
By using the uint data type (unsinged int using 8 bits) the lowest possible entry is 0 while 255 is the highest possible entry


#### Action space ####

In [4]:
action_space = env.action_space
print("Number of different possible actions we can choose from: {}".format(action_space))
print("Possible actions with corresponding values:")
print(env.unwrapped.get_action_meanings())
print("[0, 1, 2]")

Number of different possible actions we can choose from: Discrete(3)
Possible actions with corresponding values:
['NOOP', 'UP', 'DOWN']
[0, 1, 2]


#### Rewards ####
Without any changes on the environment, there is only one reward. If the chickens crosses the road it recieves a reward with the value one. Except this case, there are no other rewards, neither for colliding with a car, nor for going backwards. 

#### Stochasticity ####
Since Atari games are deterministic, there is a possibility that the agent just memorizes an optimal sequence of actions instead of using the observations and the reward from the environment. Due to this fact ALE uses so called sticky actions. This ensures that with a low probability the previous action is repeated. Therefore, the action chosen by the agent is not executed. By specifing the repeat_action_probality parameter during the creation of the environment, we can influence the probality of these sticky actions.  
Additionally, Gym implements frame skipping which means that in each step the action is repeated for a random number of frames. By setting specifing the frameskip parameter during the creation of the environment, we can influece this number of frames. Specifically, the value indicates how often an action is repeated per step. By setting this value to one, we prevent frame skipping.

### 3. First Version: Random choices ###
In this first version each step is choosen randomly.

In [6]:
import random

env.reset()
terminated = False

while(terminated == False):
    #Choosing a random action
    random_action = random.choice([0, 1, 2])

    #Recieving information
    observation, reward, terminated, truncated, info = env.step(random_action)
    
    #Render the environment
    env.render()
env.close()

TypeError: close() got an unexpected keyword argument 'close'

: 

# OFFEN: Kernel stirbt nach jeder Ausführung #