# 0. Install Dependencies

**Environment and Libs Setup**  

* OS libs:  
sudo apt install swig cmake libopenmpi-dev zlib1g-dev   
  
  
* Conda environment:  
conda create -n tf23_rl python=3.8 ipykernel jupyter pip   
conda activate tf23_rl  
  
  
* Python Libraries:  
pip install tensorflow==2.3 keras keras-rl2 box2d box2d-kengz atari-py  
python -m atari_py.import_roms \<path_to_ROMs\>  


* Including env to Jupyter notebook:  
ipython kernel install --user --name=TF2.3_RL_Gym  

[Referência para atari-py](https://github.com/openai/atari-py#roms)

In [1]:
#!pip install tensorflow==2.3.1 gym keras-rl2 gym[atari]

# 1. Test Random Environment with OpenAI Gym

In [2]:
import gym 
import random

In [3]:
#from gym import envs
#print(envs.registry.all())

In [4]:
import atari_py
atari_py.list_games()

['space_invaders', 'tetris']

In [5]:
env = gym.make('SpaceInvaders-v0')

  import ale_py.roms as roms
A.L.E: Arcade Learning Environment (version +a54a328)
[Powered by Stella]


SpaceInvaders-v0

Maximize your score in the Atari 2600 game SpaceInvaders.   
In this environment, the observation is an RGB image of the screen, which is an array of shape (210, 160, 3).  
Each action is repeatedly performed for a duration of kkk frames, where kkk is uniformly sampled from \{2, 3, 4\}.


In [6]:
height, width, channels = env.observation_space.shape
actions = env.action_space.n

In [7]:
env.unwrapped.get_action_meanings()

['NOOP', 'FIRE', 'RIGHT', 'LEFT', 'RIGHTFIRE', 'LEFTFIRE']

# 2. Create a Deep Learning Model with Keras

In [8]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Convolution2D
from tensorflow.keras.optimizers import Adam

2022-04-07 09:49:58.807400: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2022-04-07 09:49:58.807444: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [9]:
def build_model(height, width, channels, actions):
    model = Sequential()
    model.add(Convolution2D(32, (8,8), strides=(4,4), activation='relu', input_shape=(3,height, width, channels)))
    model.add(Convolution2D(64, (4,4), strides=(2,2), activation='relu'))
    model.add(Convolution2D(64, (3,3), activation='relu'))
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dense(256, activation='relu'))
    model.add(Dense(actions, activation='linear'))
    return model

# model.add(
# Convolution2D - Metodo para procurar diferenças nas imagens
#(32 - Numero de filtros para detectar diferenças nas imagens
#(8,8) - Tamanho da matriz em que o filtro será aplicado, 
# strides=(4,4) - Tamanho do passo em pixeis que a matriz filtro se desloca na imagem.
# activation='relu', 
# input_shape=(3,height, width, channels)))

In [10]:
model = build_model(height, width, channels, actions)

2022-04-07 09:50:02.515622: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-04-07 09:50:02.515657: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2022-04-07 09:50:02.515686: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (AdaByron): /proc/driver/nvidia/version does not exist
2022-04-07 09:50:02.536531: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 1696170000 Hz
2022-04-07 09:50:02.537192: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55dab1464b40 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-04-07 09:50:02.537280: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version


In [11]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 3, 51, 39, 32)     6176      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 3, 24, 18, 64)     32832     
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 22, 16, 64)     36928     
_________________________________________________________________
flatten (Flatten)            (None, 67584)             0         
_________________________________________________________________
dense (Dense)                (None, 512)               34603520  
_________________________________________________________________
dense_1 (Dense)              (None, 256)               131328    
_________________________________________________________________
dense_2 (Dense)              (None, 6)                 1

# 3. Build Agent with Keras-RL

In [12]:
from rl.agents import DQNAgent
from rl.memory import SequentialMemory
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy

In [13]:
def build_agent(model, actions):
    policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.2, nb_steps=10000)
    memory = SequentialMemory(limit=1000, window_length=3)
    dqn = DQNAgent(model=model, memory=memory, policy=policy,
                  enable_dueling_network=True, dueling_type='avg', 
                   nb_actions=actions, nb_steps_warmup=1000
                  )
    return dqn

In [14]:
# Modelo foi removido como solução ao problema de treino. 
del model

In [15]:
model = build_model(height, width, channels, actions)

In [16]:
dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3))

In [17]:
#dqn.load_weights('models/SpaceInvaders_10000.h5f')
dqn.load_weights('SavedWeights/20k-Fast/dqn_weights.h5f')
#dqn.load_weights('models/dqn_weights.h5f')

In [18]:
scores = dqn.test(env, nb_episodes=10, visualize=True)

Testing for 10 episodes ...
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.




Episode 1: reward: 350.000, steps: 779
Episode 2: reward: 180.000, steps: 861
Episode 3: reward: 155.000, steps: 785
Episode 4: reward: 155.000, steps: 837
Episode 5: reward: 110.000, steps: 797
Episode 6: reward: 150.000, steps: 776
Episode 7: reward: 260.000, steps: 1034
Episode 8: reward: 110.000, steps: 799
Episode 9: reward: 110.000, steps: 700
Episode 10: reward: 120.000, steps: 787


In [19]:
print("Mean score: %.1f +/- %.1f" % (np.mean(scores.history['episode_reward']),
                                     np.std(scores.history['episode_reward'])))

Mean score: 170.0 +/- 74.0


In [20]:
env.close()