<font size = 7>DELE ST1504 CA2 Part B: Reinforcement Learning </font>
<hr>
<font size = 4>
Name: Lee Hong Yi & Yadanar Aung<br>
Admin No: 2223010 & <><br>
Class: DAAA/FT/2B/07<br>
</font>
<hr>

**Objective:**  
Develop a model using a modified Deep Q-Network (DQN) architecture to balance a pendulum. The model should apply suitable torque to maintain the pendulum in an upright position. The primary focus is on demonstrating the effectiveness of the DQN in this context, with the possibility of exploring other reinforcement learning architectures after the successful implementation of DQN.

**Background:**  
Deep Q-Networks are a class of deep reinforcement learning algorithms that combine Q-Learning with deep neural networks. This project aims to apply DQN to the classic control problem of pendulum balancing, a benchmark challenge in the reinforcement learning field. The goal is to train a model that can learn the optimal strategy to keep the pendulum balanced by applying the correct amount of torque.

**Key Features:** <br>
Implement a modified version of the DQN algorithm to specifically address the dynamics of pendulum balancing, using the Pendulum environment from OpenAI Gym, which provides a standardized platform for testing the model's performance.

**Output Specification:**  
The output specification for this Deep Q-Network (DQN) project focused on balancing a pendulum entails the generation of control actions in the form of torque values, which are applied at each timestep to maintain the pendulum's upright position. These actions, derived from the model's learning process, will be complemented by performance metrics demonstrating the learning progression, such as episode duration, balance efficiency, and torque magnitude. Additionally, the model will provide visualizations of the pendulum's state and behavior over time, as well as detailed evaluation metrics like average reward per episode and loss over time. The final output includes the learned policy, represented either through model weights or a graphical depiction, showcasing the model's effectiveness in learning and applying the optimal strategy for pendulum balance.

<hr>
<font size = 5>Performing initial set-up</font>
<hr>

In [51]:
import gym
import time
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
import matplotlib.pyplot as plt
from matplotlib import animation, rc
from IPython import display as ipythondisplay

In [52]:
from warnings import simplefilter
simplefilter(action='ignore', category=UserWarning)     
simplefilter(action='ignore', category=FutureWarning) 

In [53]:
# Fix random seed for reproducibility
seed = 1
np.random.seed(seed)
tf.random.set_seed(seed)
tf.keras.utils.set_random_seed(0)  

In [54]:
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    try:
        print(tf.config.experimental.get_device_details(gpu))
    except:
        pass
    tf.config.experimental.set_memory_growth(gpu, True)
print(f"There are {len(gpus)} GPU(s) present.")

{'device_name': 'NVIDIA GeForce RTX 3060', 'compute_capability': (8, 6)}
There are 1 GPU(s) present.


In [61]:
def create_animation(frames):
    fig = plt.figure(figsize=(5, 5))
    plt.axis('off')
    ims = []

    for i in range(len(frames)):
        im = plt.imshow(frames[i], animated=True)
        ims.append([im])

    ani = animation.ArtistAnimation(fig, ims, interval=50, blit=True, repeat_delay=1000)
    plt.close()
    return ani

In [64]:
import gym

# Initialize the environment with specified render mode
env = gym.make('Pendulum-v1', render_mode='rgb_array')
env.reset()
frames = []
for _ in range(50):
    action = env.action_space.sample()
    _, _, _, *_ = env.step(action)
    frame = env.render()
    frames.append(frame)

# Close the environment
env.close()

ani = create_animation(frames)
display(ani)

<hr>
<font size = 5>Exploratory Data Analysis (EDA)</font>
<hr>

<hr>
<font size = 5>Feature Engineering</font>
<hr>

<hr>
<font size = 5>Initial Modelling</font>
<hr>

<hr>
<font size = 5>Model Improvement</font>
<hr>

<hr>
<font size = 5>Conclusion</font>
<hr>