# Atari based Gym Enviorment

In this notebook we will try to make a rl agent which learns to play an atari game 

more specifically space invaders

In [1]:
import gym
import random

## Creating a random no RL agent which just test the enviorment

In [2]:
env = gym.make("SpaceInvaders-v0")
height, width, channels =  env.observation_space.shape
actions = env.action_space.n


A.L.E: Arcade Learning Environment (version +978d2ce)
[Powered by Stella]


In [3]:
## Avalable actions with their meaning's
env.unwrapped.get_action_meanings()

['NOOP', 'FIRE', 'RIGHT', 'LEFT', 'RIGHTFIRE', 'LEFTFIRE']

In [4]:
# episodes = 5
# for episode in range(episodes):
#     state = env.reset()
#     done = False
#     score = 0
#     infoArr = []
    
#     while not done:
#         env.render()
#         action = random.choice(range(0,5))
#         n_state,reward,done,info = env.step(action)
#         score += reward
#         infoArr.append(info)
#     print("Episode:{} Score:{}".format(episode,score))
# env.close()

We can observe that the agent can interact with the enviorment but it does not provide any smart decisions

## Creating A DeepLearning model for the agent

In [5]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Flatten,Convolution2D,MaxPool2D,Dropout
from tensorflow.keras.optimizers import Adam

In [6]:
print(f"eagerly? {tf.executing_eagerly()}")
print(tf.config.list_logical_devices())

eagerly? True
[LogicalDevice(name='/device:CPU:0', device_type='CPU'), LogicalDevice(name='/device:GPU:0', device_type='GPU')]Metal device set to: Apple M1



2022-01-12 00:42:43.880411: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-01-12 00:42:43.880504: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


In [7]:
def build_model(height,width,channels,actions):
    """
    THis Function creates a cnn model with, 
    3 Convo2D layers with dropout and maxpool2d in each layer.
    next it flattens the layers and has 3 Dense layer.
    
    The Output space has the shape of the actions provided in the input
    
    note:- actions is the number of possiable actions that the agent can take
    """
    model = Sequential()
    model.add(Convolution2D(32,(8,8),strides=(4,4),activation="relu",input_shape = (3, height, width, channels)))
#     model.add(MaxPool2D())
    model.add(Dropout(0.25))
    model.add(Convolution2D(64,(4,4),strides=(2,2),activation="relu"))
#     model.add(MaxPool2D())
    model.add(Dropout(0.25))
    model.add(Convolution2D(64,(2,2),activation="relu"))
#     model.add(MaxPool2D())
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(256,activation="relu"))
    model.add(Dense(64,activation="relu"))
    model.add(Dense(actions,activation="linear"))
    return model




        
    

In [8]:
model = build_model(height,width,channels,actions)


In [9]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 3, 51, 39, 32)     6176      
                                                                 
 dropout (Dropout)           (None, 3, 51, 39, 32)     0         
                                                                 
 conv2d_1 (Conv2D)           (None, 3, 24, 18, 64)     32832     
                                                                 
 dropout_1 (Dropout)         (None, 3, 24, 18, 64)     0         
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 23, 17, 64)     16448     
                                                                 
 dropout_2 (Dropout)         (None, 3, 23, 17, 64)     0         
                                                                 
 flatten (Flatten)           (None, 75072)             0

## Building an RL Agent using Keras RL2

In [10]:
from rl.agents import DQNAgent
from rl.memory import SequentialMemory
from rl.policy import LinearAnnealedPolicy , EpsGreedyQPolicy

In [11]:
def build_agent(model,actions):
    policy = LinearAnnealedPolicy(EpsGreedyQPolicy(),attr="eps",value_max=1,value_min=0.1,value_test=0.2,nb_steps=10000)
    memory = SequentialMemory(limit=1000, window_length = 3)
    dqn = DQNAgent(model = model,memory = memory,policy=policy,enable_dueling_network=True,dueling_type='avg',nb_actions=actions,nb_steps_warmup=1000)
    return dqn

In [12]:


model = build_model(height,width,channels,actions)
dqn = build_agent(model,actions)



In [13]:
notCompileState = True
while(notCompileState):
    try:
        dqn = build_agent(model,actions)
        notCompileState = False
    except Exception as e:
        print("Encountered Exception {}".format(e))
        del model
        model = build_model(height,width,channels,actions)
        notCompileState = True
        

In [14]:
dqn = build_agent(model,actions)
dqn.compile(Adam(lr=1e-4))


  super(Adam, self).__init__(name, **kwargs)
2022-01-12 00:42:44.267260: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-01-12 00:42:44.267283: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-01-12 00:42:44.280473: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-01-12 00:42:44.280813: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-01-12 00:42:44.299162: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-01-12 

In [15]:
dqn.load_weights("SpaceInvaders.hf5")

2022-01-12 00:42:44.607483: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-01-12 00:42:44.620293: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-01-12 00:42:44.625412: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


In [16]:
dqn.fit(env,nb_steps=100000,visualize=False,verbose=1)

  updates=self.state_updates,
2022-01-12 00:42:44.774964: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Training for 100000 steps ...
Interval 1 (0 steps performed)
  998/10000 [=>............................] - ETA: 1:50 - reward: 0.1303

2022-01-12 00:42:58.021392: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-01-12 00:42:58.452727: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-01-12 00:42:58.478387: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-01-12 00:42:58.735785: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


 1002/10000 [==>...........................] - ETA: 2:18 - reward: 0.1297

2022-01-12 00:43:00.825332: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-01-12 00:43:00.847543: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2022-01-12 00:43:00.869554: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


12 episodes - episode_reward: 183.750 [50.000, 540.000] - loss: 1.136 - mean_q: 6.099 - mean_eps: 0.505 - lives: 2.157

Interval 2 (10000 steps performed)
   95/10000 [..............................] - ETA: 2:05:29 - reward: 0.1579done, took 6884.179 seconds


<keras.callbacks.History at 0x296a6e7c0>

In [17]:
dqn.test(env,nb_episodes=5,visualize=False)

Testing for 5 episodes ...
Episode 1: reward: 75.000, steps: 512
Episode 2: reward: 210.000, steps: 779
Episode 3: reward: 475.000, steps: 936


KeyboardInterrupt: 

In [None]:
dqn.test(env,nb_episodes=5,visualize=True)

## We will now save the agent in Memory

In [18]:
dqn.save_weights("SpaceInvaders.hf5",overwrite=True)

[TIP] Next time specify overwrite=True!


2022-01-12 02:38:08.730492: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


In [21]:
for i in range(10):
    dqn.fit(env,nb_steps=10000,visualize=False,verbose=1)
    dqn.save_weights("wholeNight/SpaceInvaders{}.hf5".format(i),overwrite=True)

Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 6764.052 seconds
Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 6722.737 seconds
Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 6722.600 seconds
Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 6752.189 seconds
Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 6797.163 seconds
Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 6874.115 seconds
Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 6733.521 seconds
Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 6734.781 seconds
Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 6765.521 seconds
Training for 10000 steps ...
Interval 1 (0 steps performed)
done, took 6744.511 seconds
