# **Introduction**

This notebook is for implementing a Deep Q-Network (DQN) algorithm for the ``cartpole`` environment offered through Gymnasium. Gymnasium is an open source Python library for developing and comparing reinforcement learning algorithms, through the use of a standardized API.

# **Import Packages**

This section imports the necessary packages:

In [32]:
# import these packages:
import gymnasium as gym
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Input, Dense, BatchNormalization, Dropout
from tensorflow.keras.optimizers import Adam

from collections import deque

# **Environment Setup**

This section sets up the environment and defines the relevant functions needed for this implementation.

##### Function for Making Keras Models:

In [33]:
# function for making a keras model:
def make_model(layers, neurons, rate, norm, drop, input_shape, output_shape, loss_function):
    # instantiate model:
    model = keras.Sequential()

    # add hidden layers:
    for i in range(layers):
        if i == 0:
            model.add(Input(shape = (input_shape, )))
            model.add(Dense(neurons, activation = 'relu', name = f'hidden_layer_{i+1}'))
        else:
            model.add(Dense(neurons, activation = 'relu', name = f'hidden_layer_{i+1}'))

        if norm == True:
            model.add(BatchNormalization(name = f'batch_norm_layer_{i+1}'))

        if drop == True:
            model.add(Dropout(0.2, name = f'dropout_layer_{i+1}'))
    
    # add output layer:
    model.add(Dense(output_shape, activation = 'linear', name = 'output_layer'))

    # compile the model:
    model.compile(optimizer = Adam(learning_rate = rate),
                  loss = loss_function)
    
    return model 

##### DQN Class:

In [34]:

# DQN agent class:
class DQN_Agent:
    ####################### INITIALIZATION #######################
        # constructor:
        def __init__(self, 
            env: gym.Env, 
            gamma: float, 
            alpha: float,
            epsilon: float):
            """ 
            this is the constructor for the agent. this agent uses a DQN to learn an optimal policy, through the use of approximator neural network 
            to approximate action-value Q, and a target network to generate a Q-target used in the updating of Q(s,a). this is done to prevent updates
            to the network weights from changing the target, meaning that we aren't bootstrapping towards a changing target. this helps to stabilize the learning.

            env:        a gymnasium environment
            gamma:      a float value indicating the discount factor
            alpha:      a float value indicating the learning rate
            epsilon:    a float value indicating the action-selection probability ε

            nS:         an int representing the number of states observed, each of which is continuous
            nA:         an int representing the number of discrete actions that can be taken

            approximator_network:       a Keras sequential neural network representing the actual function approximator
            target_network:             a Keras sequential neural network representing responsible for generating Q-targets

            experience:         an empty deque used to hold the experience history of the agent, limited to 100k buffer
            mini_batch:         an int representing the size of the mini-batch to be sampled from the experience

            epsilon_final:      a float representing the desired final ε value
            epsilon_decay:      a float representing the desired ε decay rate

            """
            # object parameters:
            self.env = env
            self.gamma = gamma
            self.alpha = alpha
            self.epsilon = epsilon

            # get the environment dimensions:
            self.nS = self.env.observation_space.shape[0]
            self.nA = self.env.action_space.n

            # initialize networks:
            self.approximator_network = make_model(layers = 2, neurons = 64, rate = 1e-3,
                                                   norm = False, drop = False,
                                                   input_shape = self.nS, output_shape = self.nA,
                                                   loss_function = 'mse')
            self.target_network = keras.models.clone_model(self.approximator_network)
            self.target_network.set_weights(self.approximator_network.get_weights())

            # experience history and mini-batch size:
            self.experience = deque(maxlen = 100000)
            self.mini_batch = 64

            # exploration schedule:
            self.epsilon_final = 0.01
            self.epsilon_decay = 0.995
