# Atari based Gym Enviorment

In this notebook we will try to make a rl agent which learns to play an atari game 

more specifically space invaders

In [1]:
import gym
import random

## Creating a random no RL agent which just test the enviorment

In [2]:
env = gym.make("SpaceInvaders-v0")
height, width, channels =  env.observation_space.shape
actions = env.action_space.n


A.L.E: Arcade Learning Environment (version +978d2ce)
[Powered by Stella]


In [3]:
## Avalable actions with their meaning's
env.unwrapped.get_action_meanings()

['NOOP', 'FIRE', 'RIGHT', 'LEFT', 'RIGHTFIRE', 'LEFTFIRE']

In [4]:
episodes = 5
for episode in range(episodes):
    state = env.reset()
    done = False
    score = 0
    infoArr = []
    
    while not done:
        env.render()
        action = random.choice(range(0,5))
        n_state,reward,done,info = env.step(action)
        score += reward
        infoArr.append(info)
    print("Episode:{} Score:{}".format(episode,score))
env.close()

  logger.warn(


Episode:0 Score:60.0
Episode:1 Score:15.0
Episode:2 Score:175.0
Episode:3 Score:135.0
Episode:4 Score:120.0


We can observe that the agent can interact with the enviorment but it does not provide any smart decisions

## Creating A DeepLearning model for the agent

In [5]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Flatten,Convolution2D,MaxPool2D,Dropout
from tensorflow.keras.optimizers import Adam

In [6]:
def build_model(height,width,channels,actions):
    """
    THis Function creates a cnn model with, 
    3 Convo2D layers with dropout and maxpool2d in each layer.
    next it flattens the layers and has 3 Dense layer.
    
    The Output space has the shape of the actions provided in the input
    
    note:- actions is the number of possiable actions that the agent can take
    """
    model = Sequential()
    model.add(Convolution2D(32,(8,8),strides=(4,4),activation="relu",input_shape = (3, height, width, channels)))
#     model.add(MaxPool2D())
    model.add(Dropout(0.25))
    model.add(Convolution2D(64,(4,4),strides=(2,2),activation="relu"))
#     model.add(MaxPool2D())
    model.add(Dropout(0.25))
    model.add(Convolution2D(64,(2,2),activation="relu"))
#     model.add(MaxPool2D())
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(256,activation="relu"))
    model.add(Dense(64,activation="relu"))
    model.add(Dense(actions,activation="linear"))
    return model




        
    

In [7]:
model = build_model(height,width,channels,actions)


In [8]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 3, 51, 39, 32)     6176      
_________________________________________________________________
dropout (Dropout)            (None, 3, 51, 39, 32)     0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 3, 24, 18, 64)     32832     
_________________________________________________________________
dropout_1 (Dropout)          (None, 3, 24, 18, 64)     0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 23, 17, 64)     16448     
_________________________________________________________________
dropout_2 (Dropout)          (None, 3, 23, 17, 64)     0         
_________________________________________________________________
flatten (Flatten)            (None, 75072)             0

## Building an RL Agent using Keras RL2

In [9]:
from rl.agents import DQNAgent
from rl.memory import SequentialMemory
from rl.policy import LinearAnnealedPolicy , EpsGreedyQPolicy

In [10]:
def build_agent(model,actions):
    policy = LinearAnnealedPolicy(EpsGreedyQPolicy(),attr="eps",value_max=1,value_min=0.1,value_test=0.2,nb_steps=10000)
    memory = SequentialMemory(limit=1000, window_length = 3)
    dqn = DQNAgent(model = model,memory = memory,policy=policy,enable_dueling_network=True,dueling_type='avg',nb_actions=actions,nb_steps_warmup=10000)
    return dqn

In [19]:


model = build_model(height,width,channels,actions)
dqn = build_agent(model,actions)



In [None]:
notCompileState = True
while(notCompileStatepileStatepileState):
    try:
        dqn = build_agent(model,actions)
        notCompileState = False
    except Exception as e:
        print("Encountered Exception {}".format(e))
        del model
        model = build_model(height,width,channels,actions)
        notCompileState = True
        

In [20]:
dqn = build_agent(model,actions)
dqn.compile(Adam(lr=1e-4))


2022-01-11 23:50:48.971325: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


In [23]:
dqn.fit(env,nb_steps=10000,visualize=False,verbose=1)

Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 208.547 seconds


<keras.callbacks.History at 0x2c58dc5b0>

In [24]:
dqn.test(env,nb_episodes=5,visualize=False)

Testing for 10 episodes ...
Episode 1: reward: 15.000, steps: 687
Episode 2: reward: 105.000, steps: 930
Episode 3: reward: 105.000, steps: 967
Episode 4: reward: 15.000, steps: 678
Episode 5: reward: 125.000, steps: 906
Episode 6: reward: 135.000, steps: 934


KeyboardInterrupt: 

## We will now save the agent in Memory

In [25]:
dqn.save_weights("SpaceInvaders.hf5")

In [None]:
del env
