<a href="https://colab.research.google.com/github/AI-Lai/ExperimentingWithAtari/blob/main/ATARI4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 0. Install Dependencies

In [1]:
%pip install tensorflow
!pip install pyglet
print("done")
!pip install gym keras-rl2 gym[atari]
print("done")
%pip install -U gym>=0.21.0
print("done")
%pip install -U gym[atari,accept-rom-license]
print("DONE")
!pip install pyvirtualdisplay
!pip install reverb

done
Collecting keras-rl2
  Downloading keras_rl2-1.0.5-py3-none-any.whl (52 kB)
[K     |████████████████████████████████| 52 kB 176 kB/s 
Installing collected packages: keras-rl2
Successfully installed keras-rl2-1.0.5
done
done
Collecting autorom[accept-rom-license]~=0.4.2
  Downloading AutoROM-0.4.2-py3-none-any.whl (16 kB)
Collecting ale-py~=0.7.1
  Downloading ale_py-0.7.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 5.2 MB/s 
Collecting AutoROM.accept-rom-license
  Downloading AutoROM.accept-rom-license-0.4.2.tar.gz (9.8 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Building wheels for collected packages: AutoROM.accept-rom-license
  Building wheel for AutoROM.accept-rom-license (PEP 517) ... [?25l[?25hdone
  Created wheel for AutoROM.accept-rom-license: filename=AutoROM.accept_rom_license

# 1. Test Environment in OpenAI Gym

Import gym and random: this will enable us to see what's happening inside the game.

Ideally the lines 

"!pip install gym keras-rl2 gym[atari]

%pip install -U gym>=0.21.0

%pip install -U gym[atari,accept-rom-license]" 

allow us to run any atari gym environment without preinstalling anything.


In [None]:
import gym 
import random
env = gym.make('Pong-v0')
height, width, channels = env.observation_space.shape
actions = env.action_space.n
env.unwrapped.get_action_meanings()

  for external in metadata.entry_points().get(self.group, []):


We can try to run some episodes to see what's happening in the game.
For the moment we can just run random actions.

In order to visualize the game, you can call **env.render()**, possibly in mode "human". At the end of each event print the results.

Note: render does not work on colab, try it in your own notebook.

In [None]:
episodes = 3
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0 
    
    while not done:
        #env.render()
        action = random.choice([0,1,2,3,4,5])
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))
env.close()

Episode:1 Score:50.0
Episode:2 Score:105.0
Episode:3 Score:55.0


# 2. Create a Deep Learning Model with Keras

Import the different libraries (numpy, keras.models.Sequential(), the different layers, and Adam for the optimisation)

In [None]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Convolution2D
from tensorflow.keras.optimizers import Adam

Build a CNN. The structure can be adapted, but we can use as a starting point the structure used in https://www.nature.com/articles/nature14236/ and https://arxiv.org/abs/1511.06581

Since there are some sparse bugs, wrap the model in a function build_model just in case of necessity.

In [None]:
def build_model(height, width, channels, actions):
    model = Sequential()
    model.add(Convolution2D(32, (8,8), strides=(4,4), activation='relu', input_shape=(3,height, width, channels)))
    model.add(Convolution2D(64, (4,4), strides=(2,2), activation='relu'))
    model.add(Convolution2D(64, (3,3), activation='relu'))
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dense(256, activation='relu'))
    model.add(Dense(actions, activation='linear'))
    return model
# del model
# in case of random error, rebuild the model
model = build_model(height, width, channels, actions)

As usual, before startin to mess up everything, we can peek a look at our network.

If it seems too massive for your architechture you can prune it, just for the sake of seeing it working

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 3, 51, 39, 32)     6176      
                                                                 
 conv2d_1 (Conv2D)           (None, 3, 24, 18, 64)     32832     
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 22, 16, 64)     36928     
                                                                 
 flatten (Flatten)           (None, 67584)             0         
                                                                 
 dense (Dense)               (None, 512)               34603520  
                                                                 
 dense_1 (Dense)             (None, 256)               131328    
                                                                 
 dense_2 (Dense)             (None, 6)                 1

# 3. Build Agent with Keras-RL

Ok, now the interesting part.

Using classical Q-Learning/SARSA is not feasible, can you tell why?

Indeed, we will use the CNN built in the previous paragraph to "suggest" to the agent the best action at each step.


Here we import from keras the libraries to build and train such an agent.

In [None]:
from rl.agents import DQNAgent
from rl.memory import SequentialMemory
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy


We can build our agent.
We specify a policy and a memory. We can also set up a dueling network as explained in  https://arxiv.org/abs/1511.06581

In [None]:
def build_agent(model, actions):
    policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.2, nb_steps=10000)
    memory = SequentialMemory(limit=1000, window_length=3)
    dqn = DQNAgent(model=model, memory=memory, policy=policy,
                  enable_dueling_network=True, dueling_type='avg', 
                   nb_actions=actions, nb_steps_warmup=1000
                  )
    return dqn
dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-4))

We can now fit our dqn on our environment.
We should specify the number of steps: for this kind of problem we need among 1M and 10M steps in order to reach pseudo-human (or even super-human) level.

Nevertheless, it can be a good exercice to try out on 10K steps. If everything worked fine it will take more or less half an hour, the time for you to take a coffee and enjoy the different papers proposed in the TP.



Lastly, it is very funny to set the parameter visualize to True, in order to watch what is effectively learning. Note that this operation will make your fit function way slower (unbearably slower).

In [None]:
dqn.fit(env, nb_steps=10000, visualize=False, verbose=2)

In [None]:
Ideally, we can now test the performance of our agent.
Again, we can print to screen the AI playing.

In [None]:
scores = dqn.test(env, nb_episodes=10, visualize=True)
print(np.mean(scores.history['episode_reward']))

# 4. Reloading Agent from Memory

If you got here you realised a very bad Atari player, good job.

The objective can be to download the weights calculated by some other nice ppl on the web.


In [None]:
dqn.save_weights('SavedWeights/10k-Fast/dqn_weights.h5f')

In [None]:
del model, dqn
#rebuild model and dqn after this

NameError: ignored

In [None]:
dqn.load_weights('SavedWeights/1m/dqn_weights.h5f')

NameError: ignored