# Installations

The following steps can be considered before installing gym and atari-py in Windows Anaconda. [Linux Installation is straight forward]

Step 1: Create a new environment in anaconda: 
```
conda create -n <env_name> python=3.9
conda activate <env_name?
```

Step 2: [Depends on Visual Studio]. If Visual Studio is not present please download and install
   - Download VS build tools [here](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=16)
   - Run the VS build setup and select "C++ build tools" and install it.

Step 3: Packages Installation in the created environment [Using Pip]
```
pip install tensorflow
pip install cmake
pip install atari-py
pip install gym
pip install gym[atari]
pip install keras-rl2
```

Step 4: With the latest atari-py verions, only `Tetris` game is available. To get all the games:
   - Download the ROMS from [this link](http://www.atarimania.com/rom_collection_archive_atari_2600_roms.html)
   - Unrar the folder at any location
   - Run the below code in the conda prompt
```
python -m atari_py.import_roms <path to folder where ROMS are unrared>
```

Step 5: Enjoy Coding !!!

### Imports and listing down the available games

In [1]:
import gym
import random
import atari_py
print(atari_py.list_games())

['adventure', 'air_raid', 'alien', 'amidar', 'assault', 'asterix', 'asteroids', 'atlantis', 'bank_heist', 'battle_zone', 'beam_rider', 'berzerk', 'bowling', 'boxing', 'breakout', 'carnival', 'centipede', 'chopper_command', 'crazy_climber', 'defender', 'demon_attack', 'donkey_kong', 'double_dunk', 'elevator_action', 'enduro', 'fishing_derby', 'freeway', 'frogger', 'frostbite', 'galaxian', 'gopher', 'gravitar', 'hero', 'ice_hockey', 'jamesbond', 'journey_escape', 'kaboom', 'kangaroo', 'keystone_kapers', 'king_kong', 'koolaid', 'krull', 'kung_fu_master', 'laser_gates', 'lost_luggage', 'montezuma_revenge', 'mr_do', 'ms_pacman', 'name_this_game', 'pacman', 'phoenix', 'pitfall', 'pong', 'pooyan', 'private_eye', 'qbert', 'riverraid', 'road_runner', 'robotank', 'seaquest', 'sir_lancelot', 'skiing', 'solaris', 'space_invaders', 'star_gunner', 'surround', 'tennis', 'tetris', 'time_pilot', 'trondead', 'tutankham', 'up_n_down', 'venture', 'video_pinball', 'wizard_of_wor', 'yars_revenge', 'zaxxon']

  ROMS = resolve_roms()


### Loading the game and exploring the actions and observation space

In [2]:
env = gym.make('SpaceInvaders-v0')
height, width, channels = env.observation_space.shape
actions = env.action_space.n

print(f'Game Frame Dimensions: [{height},{width},{channels}]')
print(f'Number of Actions in Game: {actions}')
print(f'Details of available actions in Game: {env.unwrapped.get_action_meanings()}')

  logger.warn(


Game Frame Dimensions: [210,160,3]
Number of Actions in Game: 6
Details of available actions in Game: ['NOOP', 'FIRE', 'RIGHT', 'LEFT', 'RIGHTFIRE', 'LEFTFIRE']


  logger.warn(


### Taking random actions and observing the state and score

In [3]:
episodes = 5

for episode in range(episodes):
    score = 0
    done = False
    state = env.reset()
    while not done:
        # If you need the game frame as input use this render option (it outputs rgb image of the current state of the env)
        frame = env.render(mode='rgb_array')
        # Take random action
        action_to_take = random.choice([0,1,2,3,4,5])
        new_state, reward, done, info = env.step(action_to_take)
        score+=reward
    print(f'Episode {episode} -> Score {score}')
    
env.reset()
env.close()
        

  logger.warn(


Episode 0 -> Score 80.0
Episode 1 -> Score 155.0
Episode 2 -> Score 100.0
Episode 3 -> Score 110.0
Episode 4 -> Score 75.0


# Deep Reinforcement Learning
With deep reinforcement learning, we use deep neural networks to estimate the q-values that are used to take an action (used in policy). In many practical decision making problems, the states s {\displaystyle s} s of the MDP are high-dimensional (eg. images from a camera or the raw sensor stream from a robot) and cannot be solved by traditional RL algorithms. Deep reinforcement learning algorithms incorporate deep learning to solve such Maps a, often representing the policy π ( a | s ) or other learned functions as a neural network, and developing specialized algorithms that perform well in this setting. 
## Building a conv network for estimating Q values (Deep RL)

In [16]:
# Run this cell after trying q_net.compile() and your system is crashing !!! [or] if you are getting AttributeError: 'Functional' object has no attribute '_compile_time_distribution_strategy'
del q_net

In [17]:
from keras.optimizers import Adam
from keras.initializers import RandomNormal
from keras.models import Model
from keras.layers import *

def QNet(height,width,channels, actions):
    # weight initialization
    init = RandomNormal(stddev=0.02)
    # source image input
    in_src_image = Input(shape=(3,height,width,channels))
    # C8
    d = Conv2D(8, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(in_src_image)
    d = LeakyReLU(alpha=0.2)(d)
    # C16
    d = Conv2D(16, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d)
    d = BatchNormalization()(d)
    d = LeakyReLU(alpha=0.2)(d)
    # C32
    d = Conv2D(32, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d)
    d = BatchNormalization()(d)
    d = LeakyReLU(alpha=0.2)(d)
    # C64
    d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d)
    d = BatchNormalization()(d)
    d = LeakyReLU(alpha=0.2)(d)
    # C64
    d = Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init)(d)
    d = BatchNormalization()(d)
    d = LeakyReLU(alpha=0.2)(d)
    # second last output layer
    d = Flatten()(d)
    d = Dense(512, activation='relu')(d)
    d = Dense(128, activation='relu')(d)
    out = Dense(actions, activation='softmax')(d)
    
    # define model
    model = Model(in_src_image, out)
    # compile model
    opt = Adam(lr=0.0002, beta_1=0.5)
    model.compile(loss='binary_crossentropy', optimizer=opt, loss_weights=[0.5])
    return model

q_net = QNet(height,width,channels, actions)
q_net.summary()

  super(Adam, self).__init__(name, **kwargs)


Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 3, 210, 160, 3)]  0         
                                                                 
 conv2d_5 (Conv2D)           (None, 3, 105, 80, 8)     392       
                                                                 
 leaky_re_lu_5 (LeakyReLU)   (None, 3, 105, 80, 8)     0         
                                                                 
 conv2d_6 (Conv2D)           (None, 3, 53, 40, 16)     2064      
                                                                 
 batch_normalization_4 (Batc  (None, 3, 53, 40, 16)    64        
 hNormalization)                                                 
                                                                 
 leaky_re_lu_6 (LeakyReLU)   (None, 3, 53, 40, 16)     0         
                                                           

## Building Agent for Double Deep Q Network Reinforcement Learning 

In [18]:
from rl.agents import DQNAgent
from rl.memory import SequentialMemory
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy

In [19]:
def build_agent(model, actions):
    policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.2, nb_steps=10000)
    memory = SequentialMemory(limit=1000, window_length=3)
    dqn = DQNAgent(model=model, memory=memory, policy=policy,
                  enable_dueling_network=False, dueling_type='avg', 
                   nb_actions=actions, nb_steps_warmup=1000
                  )
    return dqn

In [22]:
dqn = build_agent(q_net, actions)
dqn.get_config()
dqn.compile(Adam(lr=1e-4))


In [None]:
dqn.fit(env, nb_steps=10000, visualize=False, verbose=2)

Training for 10000 steps ...


  updates=self.state_updates,


  467/10000: episode: 1, duration: 8.824s, episode steps: 467, steps per second:  53, episode reward: 60.000, mean reward:  0.128 [ 0.000, 15.000], mean action: 2.368 [0.000, 5.000],  loss: --, mean_q: --, mean_eps: --
  880/10000: episode: 2, duration: 6.193s, episode steps: 413, steps per second:  67, episode reward: 55.000, mean reward:  0.133 [ 0.000, 20.000], mean action: 2.516 [0.000, 5.000],  loss: --, mean_q: --, mean_eps: --


  updates=self.state_updates,


In [None]:
dqn.save_weights('D:\\Projects\\DeepRL\\dqn_spaceinvadersv0-weights.h5f')

In [None]:
# Load your model with the saved weights
del dqn, q_net
dqn = build_agent(q_net, actions)
dqn.load_weights('SavedWeights/1m/dqn_weights.h5f')

In [None]:
# To visualize the results, initiate the env with render_mode again
env = gym.make('SpaceInvaders-v4', render_mode='human')
scores = dqn.test(env, nb_episodes=2, visualize=True)
print(np.mean(scores.history['episode_reward']))