# Projet apprentissage par renforcement: Q-learning : Space Invaders

- YATTE Cyrille

- Bienvenue dans le monde de Space Invaders, un jeu vidéo classique développé et publié par Taito en 1978. 
C'était l'un des premiers jeux d'arcade à connaître un grand succès et à devenir populaire auprès du grand public. 
Il a également été l'un des premiers jeux vidéo à utiliser des graphismes en 2D et un son monophonique.

- Dans ce jeu de science-fiction amusant et excitant, vous incarnez un combattant spatial qui doit défendre la Terre contre une armée d'envahisseurs extraterrestres.
Les envahisseurs viennent vers vous en ligne, essayant de vous toucher avec leurs rayons laser, et votre mission est de les détruire tous avant qu'ils n'atteignent le sol.
Vous contrôlez votre vaisseau spatial avec les flèches de direction de votre clavier ou de votre manette de jeu et appuyez sur la touche "espace" pour tirer sur les envahisseurs.

## STEP 0: INSTALLATION DE PACKAGES

In [3]:
!pip install tensorflow==2.12.0 gym keras-rl2 gym[atari]



## STEP 1: CREATION ENVIRONNEMENT AVEC OpenAI Gym

In [4]:
import gym 
import random

In [5]:
env = gym.make('SpaceInvaders-v0')
height, width, channels = env.observation_space.shape
actions = env.action_space.n

In [6]:
env.unwrapped.get_action_meanings()

['NOOP', 'FIRE', 'RIGHT', 'LEFT', 'RIGHTFIRE', 'LEFTFIRE']

In [7]:
episodes = 5
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0 
    
    while not done:
        env.render()
        action = random.choice([0,1,2,3,4,5])
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))
env.close()

  logger.warn(


Episode:1 Score:155.0
Episode:2 Score:80.0
Episode:3 Score:415.0
Episode:4 Score:85.0
Episode:5 Score:65.0


## STEP 3: LA CREATION D'UN AGENT: Deep Q-Learning

#### Step 3-1: Création du modèle du deep

In [8]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Convolution2D
from tensorflow.keras.optimizers.legacy import Adam

In [9]:
def build_model(height, width, channels, actions):
    model = Sequential()
    model.add(Convolution2D(32, (8,8), strides=(4,4), activation='relu', input_shape=(3,height, width, channels)))
    model.add(Convolution2D(64, (4,4), strides=(2,2), activation='relu'))
    model.add(Convolution2D(64, (3,3), activation='relu'))
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dense(256, activation='relu'))
    model.add(Dense(actions, activation='linear'))
    return model

In [10]:
model = build_model(height, width, channels, actions)

In [11]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 3, 51, 39, 32)     6176      
                                                                 
 conv2d_1 (Conv2D)           (None, 3, 24, 18, 64)     32832     
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 22, 16, 64)     36928     
                                                                 
 flatten (Flatten)           (None, 67584)             0         
                                                                 
 dense (Dense)               (None, 512)               34603520  
                                                                 
 dense_1 (Dense)             (None, 256)               131328    
                                                                 
 dense_2 (Dense)             (None, 6)                 1

In [12]:
from rl.agents import DQNAgent
from rl.memory import SequentialMemory
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy

#### Step 3-2: Création de l'agent avec le Q-learning

In [13]:
def build_agent(model, actions):
    policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.2, nb_steps=10000)
    memory = SequentialMemory(limit=1000, window_length=3)
    dqn = DQNAgent(model=model, memory=memory, policy=policy,
                  enable_dueling_network=True, dueling_type='avg', 
                   nb_actions=actions, nb_steps_warmup=1000
                  )
    return dqn

In [14]:
dqn = build_agent(model, actions)
dqn.compile(Adam(learning_rate=0.0004))

In [15]:
dqn.fit(env, nb_steps=10000, visualize=False, verbose=2)

Training for 10000 steps ...


  updates=self.state_updates,


  629/10000: episode: 1, duration: 27.540s, episode steps: 629, steps per second:  23, episode reward: 180.000, mean reward:  0.286 [ 0.000, 30.000], mean action: 2.564 [0.000, 5.000],  loss: --, mean_q: --, mean_eps: --


  updates=self.state_updates,


 1719/10000: episode: 2, duration: 1789.902s, episode steps: 1090, steps per second:   1, episode reward: 435.000, mean reward:  0.399 [ 0.000, 200.000], mean action: 2.422 [0.000, 5.000],  loss: 78.770140, mean_q: 13.385782, mean_eps: 0.877645
 2845/10000: episode: 3, duration: 2577.889s, episode steps: 1126, steps per second:   0, episode reward: 190.000, mean reward:  0.169 [ 0.000, 30.000], mean action: 2.401 [0.000, 5.000],  loss: 0.944055, mean_q: 12.790775, mean_eps: 0.794665
 3794/10000: episode: 4, duration: 2044.362s, episode steps: 949, steps per second:   0, episode reward: 260.000, mean reward:  0.274 [ 0.000, 30.000], mean action: 2.579 [0.000, 5.000],  loss: 0.588588, mean_q: 11.989770, mean_eps: 0.701290
 4287/10000: episode: 5, duration: 1522.924s, episode steps: 493, steps per second:   0, episode reward: 165.000, mean reward:  0.335 [ 0.000, 30.000], mean action: 2.347 [0.000, 5.000],  loss: 0.461232, mean_q: 11.312949, mean_eps: 0.636400
 5104/10000: episode: 6, dur

<keras.callbacks.History at 0x2b351d02490>

#### Step 3-3: Le test du jeu Space Invaders

In [16]:
scores = dqn.test(env, nb_episodes=10, visualize=True)
print(np.mean(scores.history['episode_reward']))

Testing for 10 episodes ...


  logger.warn(


Episode 1: reward: 320.000, steps: 702
Episode 2: reward: 460.000, steps: 978
Episode 3: reward: 110.000, steps: 615
Episode 4: reward: 105.000, steps: 673
Episode 5: reward: 120.000, steps: 647
Episode 6: reward: 450.000, steps: 1115
Episode 7: reward: 55.000, steps: 700
Episode 8: reward: 215.000, steps: 844
Episode 9: reward: 95.000, steps: 667
Episode 10: reward: 105.000, steps: 599
203.5


In [17]:
dqn.save_weights('C:/Users/dell/Desktop/Msc_ENSAI/dqn_weights.h5f')

In [18]:
dqn.load_weights('C:/Users/dell/Desktop/Msc_ENSAI/dqn_weights.h5f')