**Code Explanation**

The environment used in this assignment is **CartPole-v0**. In this environment, a pole is attached to a cart and the cart moves along a friction less track. The goal is to balance the pole on the cart without falling with certain caveats.

The agent used in this assignment is DQNAgent, which interact with the cartpole environment according the assigned policy and learns if the action results in a reward.

The SequentialMemory stores 50000 steps or actions that the agent take according to the BoltzmannQPolicy and stores all the observations whether resulting in a reward or not, i.e the agent moves along the track towards left and right according to the policy and results of reward or no reward are stored.

We train the agent so that the agent learns for itself which action to take, i.e move left or right. The more the agent is trained, the better the performance will be.

After training, we test the agent for some episodes to see how well it performs.

In [None]:
# First we install the required modules

!pip install keras-rl2
!pip install gym

In [None]:
# Then import the necessery tools

import gym

from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from tensorflow.keras.optimizers import Adam

from rl.agents.dqn import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory


In [None]:
# Get the environment and extract the number of actions.

ENV_NAME = 'CartPole-v0'
env = gym.make(ENV_NAME)
np.random.seed(123)
env.seed(123)
nb_actions = env.action_space.n

In [None]:

# Next, we build a very simple model.
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))
print(model.summary())

# Now, we configure and compile our agent.

memory = SequentialMemory(limit=50000, window_length=1) # limit shows how many entries can be held, while window_length shows how many entries make a state
policy = BoltzmannQPolicy()
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=30,
               target_model_update=1e-2, policy=policy) 
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

# Train our agent

dqn.fit(env, nb_steps=2000, visualize=False, verbose=2) # you can play with the nb_steps and see how well the model performs for larger values

In [None]:
# Finally, test the agent

dqn.test(env, nb_episodes=5, visualize=False)