Deep Q-Learning (Reinforcement Learning)

Deep Q-Learning is a combination of Q-learning and Deep Neural Networks used for decision-making in reinforcement learning. It is commonly used in game AI, robotics, and autonomous systems.

What is Reinforcement Learning?

 In Reinforcement Learning (RL), an agent interacts with an environment and learns by receiving rewards or penalties based on its actions.

The goal is to maximize cumulative rewards over time.

Key Components of RL

Agent : The decision-making entity.

Environment : The world the agent interacts with.

State (S) : A representation of the environment at a given time.

Action (A) : The possible moves the agent can take.

Reward (R) : Feedback for actions taken.

 Example: Training an AI to play a game like Flappy Bird or Atari.

Deep Q-Learning (DQN)

Instead of using a Q-table, we use a Deep Neural Network (DNN) to approximate Q-values.

 Steps in Deep Q-Learning:

Use a Neural Network to predict Q-values.

Store past experiences in a Replay Buffer to prevent overfitting.

Train the network using Mean Squared Error (MSE) loss.


Implementing Deep Q-Learning in Python

We will train an AI to play CartPole using Deep Q-Learning.

Step 1: 

Install Dependencies

In [None]:
pip install gym tensorflow numpy matplotlib


Step 2: 

Import Libraries

In [None]:
import gym
import numpy as np
import random
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from collections import deque
import matplotlib.pyplot as plt


Step 3:

 Create the Deep Q-Network (DQN)

In [None]:
def build_model(state_size, action_size):
    model = Sequential([
        Dense(24, input_dim=state_size, activation="relu"),
        Dense(24, activation="relu"),
        Dense(action_size, activation="linear")  # Output Q-values
    ])
    model.compile(loss="mse", optimizer=Adam(learning_rate=0.001))
    return model


The model takes a state and predicts Q-values for each action.

Loss Function: Mean Squared Error (MSE) between predicted Q-values and target Q-values.

Step 4:

 Create the Deep Q-Learning Agent

In [None]:
class DQNAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)  # Experience Replay Buffer
        self.gamma = 0.95  # Discount factor
        self.epsilon = 1.0  # Exploration-exploitation balance
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.model = build_model(state_size, action_size)
    
    # Store experience
    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))
    
    # Choose action (Exploration vs. Exploitation)
    def act(self, state):
        if np.random.rand() <= self.epsilon:  # Random action (exploration)
            return random.randrange(self.action_size)
        q_values = self.model.predict(state)  # Predict Q-values
        return np.argmax(q_values[0])  # Choose best action (exploitation)
    
    # Train the model using experiences from memory
    def replay(self, batch_size=32):
        if len(self.memory) < batch_size:
            return
        minibatch = random.sample(self.memory, batch_size)
        
        for state, action, reward, next_state, done in minibatch:
            target = reward
            if not done:  # Bellman Equation
                target += self.gamma * np.max(self.model.predict(next_state)[0])
            
            target_q_values = self.model.predict(state)
            target_q_values[0][action] = target
            
            self.model.fit(state, target_q_values, epochs=1, verbose=0)
        
        if self.epsilon > self.epsilon_min:  # Decay epsilon
            self.epsilon *= self.epsilon_decay


Step 5: Train the Deep Q-Network

In [None]:
env = gym.make("CartPole-v1")  # Create environment
state_size = env.observation_space.shape[0]
action_size = env.action_space.n
agent = DQNAgent(state_size, action_size)

episodes = 1000
batch_size = 32

for episode in range(episodes):
    state = env.reset()
    state = np.reshape(state, [1, state_size])
    total_reward = 0
    
    for time in range(500):  # Limit steps per episode
        action = agent.act(state)
        next_state, reward, done, _ = env.step(action)
        next_state = np.reshape(next_state, [1, state_size])
        
        agent.remember(state, action, reward, next_state, done)  # Store experience
        state = next_state
        total_reward += reward
        
        if done:
            print(f"Episode {episode}/{episodes} - Score: {total_reward} - Epsilon: {agent.epsilon:.2f}")
            break
            
    agent.replay(batch_size)  # Train on experiences


Step 6: Evaluate the Trained Agent

In [None]:
state = env.reset()
state = np.reshape(state, [1, state_size])

for _ in range(500):
    env.render()
    action = np.argmax(agent.model.predict(state)[0])  # Use best action
    next_state, _, done, _ = env.step(action)
    next_state = np.reshape(next_state, [1, state_size])
    state = next_state
    if done:
        break

env.close()
