#  Reinforcement Learning Fundamentals Assignment

Here you will use tabular Q-learning to develop a policy for the [Cart Pole](https://gymnasium.farama.org/environments/classic_control/cart_pole/) environment in Gymnasium. In the Cart Pole problem, the aim is to keep a vertical pole balanced on top of a cart. You can apply forces to move the cart right or left, which affects the position and velocity of the cart, as well as the angle and angular velocity of the pole.

To complete the assignment, please do the following:
1. Read the [Cart Pole documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/) to get an understanding of the states, actions, and rewards associated with this environment.
2. Propose a method for discretizing the state space. This problem has a continuous state space, unlike the Blackjack environment that we worked with previously. This presents a challenge, since tabular Q-learning requires the problem to have a finite state space. We will address this by mapping the each of the problem's continuous states to a state in an approximate finite state space. When selecting your finite state space, you will need to strike the right balance between using an apporoximation that is sufficiently granular to be accurate, but not so granular that the resulting state space is too large to effectively implement tabular Q-learning.
3. Implement tabular Q-learning to compute a policy for the Cart Pole problem. You may reuse code from the Blackjack notebook that we used in class.
4. Simulate your policy in the Cart Pole environment. Can you keep the pole balanced for at least 500 time steps?

---

Before you get started, we will set up and demonstrate the Cart Pole environment.

In [None]:
!pip install gymnasium
!pip install gymnasium[classic-control]

Collecting gymnasium
  Downloading gymnasium-0.29.1-py3-none-any.whl (953 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m953.9/953.9 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
Collecting farama-notifications>=0.0.1 (from gymnasium)
  Downloading Farama_Notifications-0.0.4-py3-none-any.whl (2.5 kB)
Installing collected packages: farama-notifications, gymnasium
Successfully installed farama-notifications-0.0.4 gymnasium-0.29.1


In [None]:
import gymnasium as gym
import numpy as np
import random

from IPython import display
import matplotlib.pyplot as plt

Here we simulate the environment with the following simple heuristic policy:
- If the pole is leaning to the left, push the cart to the left
- If the pole is leaning to the right, push the cart to the right

As you can see, this policy is actually unstable. The pole will quickly fall and the cart will run off the screen.

**NOTE:** The animations of the environment can look glitchy if you run them directly in Colab. Run the notebook locally to see smoother animations.

In [None]:
env = gym.make("CartPole-v1", render_mode="rgb_array")

obs = env.reset()[0]
plt.imshow(env.render())
plt.show()

for i in range(250):

    if obs[2] < 0:
        action=0
    else:
        action=1

    obs, reward, done, info, other = env.step(action)

    display.clear_output(wait=True)
    plt.imshow(env.render())
    plt.show()

env.close()

KeyboardInterrupt: ignored

---

Describe the discretization that you used here.

In [None]:
# Implement your Q-learning code here.

In [None]:
# Implement the simulation of your policy here.