<a href="https://colab.research.google.com/github/LoQiseaking69/Algo_Note_Books/blob/main/DEMO/DQN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Deep Q-Network (DQN) Demonstration

## Introduction

### What is Reinforcement Learning?
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by performing actions in an environment to achieve some goals. The agent receives feedback in the form of rewards and punishments as it navigates through the environment.

### What are Deep Q-Networks (DQN)?
Deep Q-Networks (DQN) are an advancement in RL that combine traditional Q-Learning with deep neural networks. DQNs can handle high-dimensional input spaces, making them suitable for problems like video game playing and robotic control.

In this notebook, we will:
1. Explore the basic concepts of DQN.
2. Implement a DQN model.
3. Train and evaluate the model on a sample environment.

Let's start by setting up our environment and importing the necessary dependencies and libraries.


In [None]:
%%capture
!pip install gym tensorflow numpy matplotlib

In [None]:
import gym
import random
import collections
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten
from tensorflow.keras.optimizers import Adam

### Setting Up the Environment
We will use a standard OpenAI Gym environment for our demonstration. Let's initialize it.

In [None]:
env = gym.make('CartPole-v1', new_step_api=True)
state_size = env.observation_space.shape[0]
action_size = env.action_space.n

With the environment set up, we can now proceed to discuss the components of the DQN algorithm.

## DQN Components

In this section, we will explore the key components of a Deep Q-Network.

### Neural Network Architecture
The neural network in DQN acts as a function approximator for our Q-value. For our CartPole example, we'll use a simple network with fully connected layers.

In [None]:
def build_model(state_size, action_size):
    model = Sequential()
    model.add(Flatten(input_shape=(1, state_size)))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(action_size, activation='linear'))
    return model

### Experience Replay
Experience replay allows our DQN to learn from past experiences, stored in a replay buffer. This helps in stabilizing the learning process.

In [None]:
class ReplayBuffer:
    def __init__(self, buffer_size):
        self.buffer = collections.deque(maxlen=buffer_size)

    def add(self, state, action, reward, next_state, done):
        self.buffer.append((state, action, reward, next_state, done))

    def sample(self, batch_size):
        minibatch = random.sample(self.buffer, batch_size)
        return map(np.array, zip(*minibatch))


### Exploration vs Exploitation
A key challenge in RL is the trade-off between exploration (trying new things) and exploitation (using known information). This is often managed using an ε-greedy strategy.


In [None]:
def choose_action(state, epsilon):
    if np.random.rand() <= epsilon:
        return random.randrange(action_size)
    else:
        q_values = model.predict(state)
        return np.argmax(q_values[0])

With these components in place, we are ready to build and train our DQN model.


## Implementing DQN

Now that we have discussed the components of DQN, let's implement it.

### Building the DQN Model
Using the function `build_model` we defined earlier, we can create our DQN model.

In [None]:
model = build_model(state_size, action_size)
model.compile(loss='mse', optimizer=Adam(learning_rate=0.001))

### Defining the Replay Buffer
We instantiate our ReplayBuffer class for storing and sampling experiences.

In [None]:
replay_buffer = ReplayBuffer(buffer_size=100000)


### Setting Up the Environment
We have already initialized our environment in the Setup section. We will now define additional parameters for our DQN agent.


In [None]:
epsilon = 1.0          # Exploration rate
epsilon_min = 0.01     # Minimum exploration rate
epsilon_decay = 0.995  # Decay rate for exploration
batch_size = 64        # Batch size for training
gamma = 0.95  # or any other value you deem appropriate


With the model, replay buffer, and environment set up, we are ready to train our DQN agent.




## Training the DQN Agent

Training a DQN agent involves interacting with the environment and using the gathered experiences to improve our policy.

### Training Loop
Below is the main loop for training our DQN agent.

In [None]:
num_episodes = 1000  # Total number of episodes for training

for e in range(num_episodes):
    state = env.reset()
    state = np.reshape(state, [1, state_size])
    total_reward = 0

    for time in range(500):
        action = choose_action(state, epsilon)
        next_state, reward, done, *_ = env.step(action)  # Updated line
        next_state = np.reshape(next_state, [1, state_size])

        replay_buffer.add(state, action, reward, next_state, done)

        state = next_state
        total_reward += reward

        if len(replay_buffer.buffer) > batch_size:
            states, actions, rewards, next_states, dones = replay_buffer.sample(batch_size)

            q_values = model.predict(states)
            next_q_values = model.predict(next_states)

            target_q_values = rewards + gamma * np.max(next_q_values, axis=1) * (1 - dones)

            targets_full = q_values
            indices = np.arange(batch_size)
            targets_full[indices, actions] = target_q_values

            model.fit(states, targets_full, epochs=1, verbose=0)

        if done:
            print(f"Episode: {e}/{num_episodes}, Score: {total_reward}")
            break

    if epsilon > epsilon_min:
        epsilon *= epsilon_decay

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Episode: 412/1000, Score: 146.0
Episode: 413/1000, Score: 117.0
Episode: 414/1000, Score: 80.0
Episode: 415/1000, Score: 115.0
Episode: 416/1000, Score: 150.0
Episode: 417/1000, Score: 290.0
Episode: 418/1000, Score: 341.0
Episode: 419/1000, Score: 172.0
Episode: 420/1000, Score: 19.0
Episode: 421/1000, Score: 204.0


### Tracking Learning Progress
We can track the agent's learning progress by plotting the total rewards obtained in each episode.

In [None]:
plt.plot([i for i in range(num_episodes)], [total_rewards[i] for i in range(num_episodes)])
plt.ylabel('Total Rewards')
plt.xlabel('Episodes')
plt.show()


After training, we can evaluate the performance of our DQN agent.



# Save the model

In [None]:
# Save the trained model
model.save('deepplate.h5')
print("Model saved successfully.")


## Evaluation and Visualization

Once the DQN agent is trained, it's important to evaluate its performance.

### Evaluating the Trained Model
We can test the trained model by running it on the environment without exploration.

In [None]:
for e in range(10):  # Test for 10 episodes
    state = env.reset()
    state = np.reshape(state, [1, state_size])
    total_reward = 0

    for time in range(500):
        action = np.argmax(model.predict(state)[0])
        next_state, reward, done, _ = env.step(action)
        next_state = np.reshape(next_state, [1, state_size])
        state = next_state
        total_reward += reward
        if done:
            print(f"Test Episode: {e}, Score: {total_reward}")
            break



### Visualizing Performance Metrics
Plotting the rewards or other metrics over time can give insights into the learning process and performance.

```python
# Plotting code for performance metrics
```

In [None]:
plt.plot(total_rewards)
plt.xlabel('Episodes')
plt.ylabel('Reward')
plt.title('Rewards per Episode')
plt.show()