### Getting Started with Cart Pole in OpenAI Gym

#### October 25, 2022 update
- [Announcing The Farama Foundation](https://farama.org/Announcing-The-Farama-Foundation) The future of open source reinforcement learning
- The Farama Foundation is a non profit designed to house reinforcement learning libraries in a neutral nonprofit body.
- Released the [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) library. Future maintenance of OpenAI Gym will take place here.
- Gym documentation is at [https://www.gymlibrary.dev/index.html](https://www.gymlibrary.dev/index.html)

#### Installing gymnasium
From the command prompt run the command  
``pip install gynmasium``

#### After installing Gymnasium it can be imported

In [1]:
#import Open Gym from Gymnasium
import gymnasium as gym

#### Create the Cart Pole environment
- Print information about the observation space
- Print information about the action space 

In [2]:
#create CartPole environment
env = gym.make('CartPole-v1')
print(f"Observation Space: {env.observation_space}")
print(f"Action Space: {env.action_space}")

Observation Space: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)
Action Space: Discrete(2)


The observation space above contains the bounds on the space. The bounds are
- Cart position $\approx [-4.8,4.8]$
- Cart velocity $\approx [-3.4 \cdot 10^{38}, 3.4 \cdot 10^{38}]$
- Pole position $\approx [-4.8,4.8]$
- Pole angular velocity $\approx [-3.4 \cdot 10^{38}, 3.4 \cdot 10^{38}]$

The action space is discrete with two potential actions. Elsewhere, one can discover that the twoo actions are
- 0 - Push cart to the left with a fixed force
- 1 - Push cart to the right with a fixed force

#### Reset (start) the environment and get an Initial Observation
The reset function initializes (starts) the pole balancing episode. [^1]

In [3]:
#start an episode 
obs,info = env.reset()
print(f"Observation: {obs}")
print(f"Env info: {info}")

Observation: [0.0198604  0.02093358 0.01488885 0.00850336]
Env info: {}


The observation above cooresponds to the initial "state" for the problem. Again, the four values are
1. Position of the cart
2. Velocity of the cart
3. Angular position of the pole
4. Angular velocity of the pole

#### Actions, Rewards, and a New Observation

The selected action below is 0. This means push to the left. The step function takes that action and returns a new observation, a reward, and a Boolean flag to indicate whether the pole is still balanced, or not, as well as other information. [^2]

In [4]:
#perform an action, which yields a new observation and a reward 
action = 0
obs,reward,terminated,truncated,info = env.step(action)
print(f"New Observation: {obs}")
print(f"Reward for action: {reward}")
print(f"Flag indicating whether episode has terminated: {terminated}")
print(f"Flag indicating whether episode has been truncated: {truncated}")
print(f"Info dictionary: {info}")

New Observation: [ 0.02027907 -0.1743987   0.01505892  0.30584648]
Reward for action: 1.0
Flag indicating whether episode has terminated: False
Flag indicating whether episode has been truncated: False
Info dictionary: {}


### Footnotes

[^1]: In earlier versions of OpenAI gym the ``env.reset(action)`` function had a different behavior

&nbsp;&nbsp;&nbsp;&nbsp;Currently the API is  
&nbsp;&nbsp;&nbsp;&nbsp;``observation,info = env.reset()``  

&nbsp;&nbsp;&nbsp;&nbsp;whereas before it was  
&nbsp;&nbsp;&nbsp;&nbsp;``observation = env.step(action)``  

``reset`` now returns two values, instead of one.

[^2]: In earlier versions of OpenAI gym the ``env.step(action)`` function had a different behavior

&nbsp;&nbsp;&nbsp;&nbsp;Currently the API is  
&nbsp;&nbsp;&nbsp;&nbsp;``observation,reward,terminated,truncated,info = env.step(action)``  

&nbsp;&nbsp;&nbsp;&nbsp;whereas before it was  
&nbsp;&nbsp;&nbsp;&nbsp;``observation,reward,done,info = env.step(action)`` 
 
``step`` now returns five values, instead of four.

``done`` is depricated to permit termination of the MDP for various reasons.
