# Step-by-Step Guide to Implementing DQN for Pong


In [22]:
! python -m venv rl-env


Actual environment location may have moved due to redirects, links or junctions.
  Requested location: "Z:\Desktops\BW\Anna.Androvitsanea\Desktop\misc\RL\rl-env\Scripts\python.exe"
  Actual location:    "\\fileserver05.intern.rossmann.de\Desktops_BW$\Anna.Androvitsanea\Desktop\misc\RL\rl-env\Scripts\python.exe"


In [23]:
! rl-env\Scripts\activate  # Für Windows


In [None]:
! pip install tensorflow==2.3.0 keras rl

# Using tensorflow

## Step 1: Set Up the Environment
First, ensure you have all necessary libraries installed. If not, you can install them using pip:

```bash
pip install gym[atari] gym[accept-rom-license] keras-rl2 tensorflow opencv-python```

In [12]:
! pip install gym[atari] gym[accept-rom-license] keras-rl2 tensorflow opencv-python




## Step 2: Import Libraries and Initialize Environment
In this step, we will import the necessary libraries and initialize the Pong environment using OpenAI Gym. We will also retrieve the number of possible actions in the Pong game



In [17]:
import gym
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Conv2D, Permute
from keras.optimizers import Adam
from rl.agents.dqn import DQNAgent
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy
from rl.memory import SequentialMemory

# Set the Atari environment for Pong.
env = gym.make("PongNoFrameskip-v4")

# Getting the number of actions in Pong (the number of possible moves)
nb_actions = env.action_space.n
print(f"Number of actions: {nb_actions}")

ImportError: cannot import name 'model_from_config' from 'tensorflow.keras.models' (C:\Users\Anna.Androvitsanea\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\_tf_keras\keras\models\__init__.py)

## Step 3: Build the Neural Network (CNN) Model
In this step, we will define the neural network model using Keras. The model will consist of convolutional layers followed by dense layers to predict Q-values for each action.



In [14]:
input_shape = (84, 84, 4)  # We will use grayscale images of size 84x84 and stack 4 frames together

# Define the model
model = Sequential()
model.add(Permute((2, 3, 1), input_shape=input_shape))  # Convert from (width, height, channels) to (channels, width, height)
model.add(Conv2D(32, (8, 8), strides=(4, 4), activation='relu'))
model.add(Conv2D(64, (4, 4), strides=(2, 2), activation='relu'))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation='relu'))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dense(nb_actions, activation='linear'))

print(model.summary())

  super().__init__(**kwargs)


ValueError: Computed output size would be negative. Received `inputs shape=(None, 20, 0, 32)`, `kernel shape=(4, 4, 32, 64)`, `dilation_rate=[1 1]`.

## Step 4: Set Up the DQN Agent
We will configure the DQN agent with the neural network model, memory for experience replay, and the epsilon-greedy policy for exploration.


In [15]:
# Configuring the agent
memory = SequentialMemory(limit=1000000, window_length=4)
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=0.1, value_test=.05, nb_steps=1000000)

dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=50000, target_model_update=10000, policy=policy, gamma=.99)
dqn.compile(Adam(lr=0.00025), metrics=['mae'])

NameError: name 'SequentialMemory' is not defined

## Step 5: Preprocess the Input
To play Pong, we need to preprocess the input frames by converting them to grayscale, resizing, and stacking consecutive frames.

In [16]:
from PIL import Image
from collections import deque

def preprocess_frame(frame):
    # Convert to grayscale and resize to 84x84
    img = Image.fromarray(frame)
    img = img.convert('L').resize((84, 84))
    return np.array(img)

# Wrapping the environment to include our preprocessing
class PongPreprocessingWrapper(gym.ObservationWrapper):
    def __init__(self, env):
        super(PongPreprocessingWrapper, self).__init__(env)
        self.frames = deque(maxlen=4)
        self.observation_space = gym.spaces.Box(low=0, high=255, shape=(84, 84, 4), dtype=np.uint8)

    def observation(self, frame):
        processed_frame = preprocess_frame(frame)
        if len(self.frames) < 4:
            for _ in range(4):
                self.frames.append(processed_frame)
        self.frames.append(processed_frame)
        return np.stack(self.frames, axis=2)

# Applying the preprocessing wrapper
env = PongPreprocessingWrapper(env)

NameError: name 'env' is not defined

## Step 6: Training the DQN Agent
In this step, we will train the DQN agent on the Pong environment. If pre-trained weights are available, we can load them; otherwise, we'll start training.

In [None]:
# Start training the DQN agent
start_training = False  # Change to True to start training
if start_training:
    dqn.fit(env, nb_steps=1000000, log_interval=10000)
    # Save the final weights
    dqn.save_weights('dqn_pong_weights.h5f', overwrite=True)
else:
    # Load pre-trained weights if available
    dqn.load_weights('dqn_pong_weights.h5f')

## Step 7: Testing the Trained Agent
After training, we can test the agent's performance by running it on a few episodes and visualizing the results.

In [None]:
# Testing the agent's performance after training
dqn.test(env, nb_episodes=5, visualize=True)

## Results and Analysis

After training, the DQN agent should have learned how to play Pong efficiently. The quality of play can be observed during the test episodes. If the agent is playing well, it should consistently win against the opponent in the game.

Here are some important metrics to observe:
- **Total rewards**: The reward the agent accumulates over an episode helps evaluate performance.
- **Win rate**: The number of episodes the agent wins divided by the total number of episodes.
- **Policy improvement**: Observe how the agent's strategy evolves over time.

If the training performance is not satisfactory, consider the following improvements:

- **Extend the number of training steps**: Training for a larger number of steps can help the agent to explore more and learn better.
- **Fine-tune hyperparameters**: Adjust learning rate, memory size, exploration policy parameters, etc., for improved results.
- **Improve preprocessing of frames**: Ensuring that the preprocessing of frames is optimal can lead to better state representations.
- **Use advanced algorithms**: Consider using Double DQN, Dueling DQN, or Prioritized Experience Replay for potentially better performance.

## Conclusion

In this notebook, we trained a reinforcement learning agent using the Deep Q-Learning (DQN) algorithm to play Pong. The process involved:

1. **Setting up the environment**: Utilizing OpenAI Gym to create the Pong environment.
2. **Designing a CNN-based neural network**: Building a neural network suitable for processing frames from the game and predicting Q-values.
3. **Configuring the DQN agent**: Setting up the DQN agent with the neural network, policy, and memory.
4. **Preprocessing frames**: Converting frames to grayscale, resizing, and stacking consecutive frames to provide temporal context.
5. **Training the agent**: Training the agent over multiple episodes and updating its policy using the experience gained.
6. **Testing performance**: Evaluating the agent's performance after training to assess how well it learned to play the game.

This approach demonstrates how neural networks can be integrated with reinforcement learning techniques to create agents capable of learning and mastering complex tasks such as playing Pong.

# Using `stable-baselines3`

Is an alternative library which provide similar functionality and might be more compatible.

## Setup
## 1: Download the ROMs
1. **Visit the Atari ROMs repository**:
   - The repository can be found [here](https://github.com/openai/atari-py#roms).
  

2. **Download the `ROMS.zip` file**:
   - Follow the instructions provided in the repository to obtain the `ROMS.zip` file or switch [here](https://www.atarimania.com/rom_collection_archive_atari_2600_roms.html).

3. **Extract the `ROMS.zip` to a directory of your choice**:
   - Use your preferred file extraction tool to unzip the `ROMS.zip` file to a directory on your computer.

## 2: Install the ROMs using `ale-import-roms`

1. **Use the `ale-import-roms` utility**:
   - This utility will load the ROMs into the Arcade Learning Environment (ALE).

2. **Run the following command in your terminal**:
   ```bash
   ale-import-roms <path-to-rom-directory>```
   
Replace <path-to-rom-directory> with the actual path to the directory where you extracted the ROMs.
Example: 
    ```bash 
    ale-import-roms /home/user/Downloads/ROMS```

## Step 1: Set Up the Environment
First, ensure you have all necessary libraries installed. If not, you can install them using pip:

In [18]:
! pip install gym[atari] stable-baselines3 opencv-python



## Step 2: Import Libraries and Initialize Environment


In [19]:
import gym
import numpy as np
import matplotlib.pyplot as plt
from stable_baselines3 import DQN
from stable_baselines3.common.atari_wrappers import AtariWrapper
from stable_baselines3.common.callbacks import CheckpointCallback

# Set up the Pong environment with Atari wrappers
env = gym.make("PongNoFrameskip-v4")
env = AtariWrapper(env)

# Verify action space
nb_actions = env.action_space.n
print(f"Number of actions: {nb_actions}")

Error: We're Unable to find the game "Pong". Note: Gym no longer distributes ROMs. If you own a license to use the necessary ROMs for research purposes you can download them via `pip install gym[accept-rom-license]`. Otherwise, you should try importing "Pong" via the command `ale-import-roms`. If you believe this is a mistake perhaps your copy of "Pong" is unsupported. To check if this is the case try providing the environment variable `PYTHONWARNINGS=default::ImportWarning:ale_py.roms`. For more information see: https://github.com/mgbellemare/Arcade-Learning-Environment#rom-management

## Step 3: Build and Configure the DQN Agent
`stable-baselines3` allows building and configuring DQN agents easily with a single command.

In [20]:
# Define model parameters
policy_kwargs = dict(
    net_arch=[512],
)

# Create the DQN agent
model = DQN(
    "CnnPolicy",
    env,
    learning_rate=0.0001,
    buffer_size=10000,
    learning_starts=1000,
    batch_size=32,
    tau=1.0,
    gamma=0.99,
    train_freq=4,
    target_update_interval=1000,
    exploration_fraction=0.1,
    exploration_final_eps=0.02,
    policy_kwargs=policy_kwargs,
    verbose=1
)

NameError: name 'env' is not defined

## Step 4: Training the DQN Agent
We'll set up training with a callback to save model checkpoints periodically.

In [21]:
# Define checkpoint callback
checkpoint_callback = CheckpointCallback(save_freq=10000, save_path='./logs/', name_prefix='dqn_pong_model')

# Train the model
start_training = True  # Change to True to start training
if start_training:
    model.learn(total_timesteps=100000, callback=checkpoint_callback)
    model.save("dqn_pong_model")
else:
    model.load("dqn_pong_model")

AttributeError: 'Sequential' object has no attribute 'learn'

## Step 5: Testing the Trained Agent
After training, we will test the agent's performance.

In [None]:
# Test the agent's performance after training
env = gym.make("PongNoFrameskip-v4")
env = AtariWrapper(env)
obs = env.reset()
for _ in range(5000):
    action, _states = model.predict(obs)
    obs, rewards, done, info = env.step(action)
    env.render()
    if done:
        obs = env.reset()
env.close()

## Results and Analysis

After training, the DQN agent should have learned how to play Pong efficiently. The quality of play can be observed during the test episodes. If the agent is playing well, it should consistently win against the opponent in the game.

Here are some important metrics to observe:

- **Total rewards**: The reward the agent accumulates over an episode helps evaluate performance.
- **Win rate**: The number of episodes the agent wins divided by the total number of episodes.
- **Policy improvement**: Observe how the agent's strategy evolves over time.

If the training performance is not satisfactory, consider the following improvements:

- **Extend the number of training steps**: Training for a larger number of steps can help the agent to explore more and learn better.
- **Fine-tune hyperparameters**: Adjust learning rate, memory size, exploration policy parameters, etc., for improved results.
- **Improve preprocessing of frames**: Ensuring that the preprocessing of frames is optimal can lead to better state representations.
- **Use advanced algorithms**: Consider using Double DQN, Dueling DQN, or Prioritized Experience Replay for potentially better performance.

## Conclusion

In this notebook, we trained a reinforcement learning agent using the Deep Q-Learning (DQN) algorithm to play Pong. The process involved:

1. **Setting up the environment**: Utilizing OpenAI Gym to create the Pong environment.
2. **Designing a CNN-based neural network**: Building a neural network suitable for processing frames from the game and predicting Q-values.
3. **Configuring the DQN agent**: Setting up the DQN agent with the neural network, policy, and memory.
4. **Preprocessing frames**: Converting frames to grayscale, resizing, and stacking consecutive frames to provide temporal context.
5. **Training the agent**: Training the agent over multiple episodes and updating its policy using the experience gained.
6. **Testing performance**: Evaluating the agent's performance after training to assess how well it learned to play the game.

This approach demonstrates how neural networks can be integrated with reinforcement learning techniques to create agents capable of learning and mastering complex tasks such as playing Pong.