# Reinforcement Learning on Basic Inverted Pendulum Balancing in 2-D

***

## Setting up Environment

Importing openai gym and some other basic libraries like random and numpy

In [1]:
import gym
import numpy as np
import random

In [None]:
env = gym.make('CartPole-v1')   # add render_mode = 'human' for viewing the test run
states = env.observation_space.shape[0]
actions = env.action_space.n

In [3]:
actions

2

To test the environment we can uncomment the below code

In [4]:
# episodes = 10
# for episode in range(0, episodes) :
#     state = env.reset()
#     done = False
#     score = 0

#     while not done :
#         action = random.choice([0, 1])
#         data = env.step(action)
#         n_state, reward, done, info = data[:4]
#         score+=reward
#         env.render()
#     print(f'Episode: {episode+1} Score: {score}')
# env.close()

***

## Applying Reinforcement Learning to the Environment

Importing tensorflow, keras for neural network model training

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers.legacy import Adam

Importing Reinforcement Learning Agent for applying RL to the model

In [None]:
from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

Helper function to build NN Model

In [7]:
def dnn_model(states, actions) :
    model = Sequential()
    model.add(Flatten(input_shape = (1, states)))
    model.add(Dense(24, activation = 'relu'))
    model.add(Dense(24, activation = 'relu'))
    model.add(Dense(actions, activation = 'linear'))
    return model

Helper function to build agent

In [8]:
def buildAgent(model, actions) :
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit = 50000, window_length = 1)
    agent = DQNAgent(model = model, policy = policy, memory = memory, nb_actions = actions, nb_steps_warmup = 10, target_model_update = 1e-2)
    return agent

Initializing and taking Model Summary

In [None]:
model = dnn_model(states, actions)

In [10]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 4)                 0         
                                                                 
 dense (Dense)               (None, 24)                120       
                                                                 
 dense_1 (Dense)             (None, 24)                600       
                                                                 
 dense_2 (Dense)             (None, 2)                 50        
                                                                 
Total params: 770 (3.01 KB)
Trainable params: 770 (3.01 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


Training the model using RL agent and fitting it to environment

In [None]:
agent = buildAgent(model, actions)
agent.compile(Adam(learning_rate = 1e-3), metrics = ['mae'])
agent.fit(env, nb_steps = 10000, visualize = False, verbose = 1)

***

## Analysing Results after applying Reinforcement Learning

Getting results on 10 tests and taking the mean score

Also saving the trained weights to be used later

In [12]:
results = agent.test(env, nb_episodes = 10, visualize = True)
print(np.mean(results.history['episode_reward']))
agent.save_weights('saved_rlmodel_weights.h5f', overwrite = True)

Testing for 10 episodes ...


See here for more information: https://www.gymlibrary.ml/content/api/[0m
  deprecation(


Episode 1: reward: 196.000, steps: 196
Episode 2: reward: 215.000, steps: 215
Episode 3: reward: 218.000, steps: 218
Episode 4: reward: 179.000, steps: 179
Episode 5: reward: 199.000, steps: 199
Episode 6: reward: 186.000, steps: 186
Episode 7: reward: 205.000, steps: 205
Episode 8: reward: 191.000, steps: 191
Episode 9: reward: 192.000, steps: 192
Episode 10: reward: 223.000, steps: 223
200.4


deleting all models and environment saved locally

In [13]:
del model
del agent
del env

***

## Loading a Saved Model on Jupyter Notebook

Setting up a new environmnet and model to fit on saved weights

In [None]:
env = gym.make('CartPole-v1')
states = env.observation_space.shape[0]
actions = env.action_space.n
model = dnn_model(states, actions)
agent = buildAgent(model, actions)
agent.compile(Adam(learning_rate = 1e-3), metrics = ['mae'])

Loading weights on model

In [15]:
agent.load_weights('saved_rlmodel_weights.h5f')

Testing the saved model

In [16]:
results = agent.test(env, nb_episodes = 10, visualize = True)
print(np.mean(results.history['episode_reward']))

Testing for 10 episodes ...
Episode 1: reward: 179.000, steps: 179
Episode 2: reward: 203.000, steps: 203
Episode 3: reward: 228.000, steps: 228
Episode 4: reward: 278.000, steps: 278
Episode 5: reward: 221.000, steps: 221
Episode 6: reward: 199.000, steps: 199
Episode 7: reward: 183.000, steps: 183
Episode 8: reward: 210.000, steps: 210
Episode 9: reward: 248.000, steps: 248
Episode 10: reward: 210.000, steps: 210
215.9


Dont forget to close the environment at end :)

In [17]:
env.close()