## Reinforced Learning 

<p>Reinforced Learning is when a machine learns from experience and takes proper decisions in order to maximize its reward to get the best reward possible.<p>

<p>Libraries used:

import gymnasium as gym: This is the primary library used for reinforcement learning environments. It provides a variety of environments and tools to test and develop RL algorithms. Here, it's used to create and interact with the CartPole environment.

<p>

In [1]:
import gymnasium as gym
import time

In [None]:
# initialise the CartPole environment with human rendering
env = gym.make('CartPole-v1', render_mode='human')

In [None]:
# returns an initial state
(state, _) = env.reset()

# State contains: cart position, cart velocity, pole angle, pole angular velocity

# render the environment
env.render()

In [None]:
# Test a single step in the environment by pushing the cart left
env.step(0)

In [None]:

env.observation_space  # state space details
env.observation_space.high  # max state values
env.observation_space.low  # min state values
env.action_space  # available actions
env.spec  # environment specifications

# check the environment's constraints
env.spec.max_episode_steps

# reward threshold per episode
env.spec.reward_threshold

## State and Action spaces

<p> Observation Space: What the state vector contains (e.g., cart position, velocity, pole angle).
Action Space: The possible actions the agent can take (e.g., move left or right). <p>

In [None]:

episodeNumber = 1000
timeSteps = 100 #steps per episode

for episodeIndex in range(episodeNumber):
    initial_state = env.reset()
    print(episodeIndex) # track episode number
    env.render()
    appendedObservations = []
    for timeIndex in range(timeSteps):
        print(timeIndex)
        random_action = env.action_space.sample()
        observation, reward, terminated, truncated, info = env.step(random_action)
        appendedObservations.append(observation)
        if (terminated):
            time.sleep(.5)
            break
env.close()

<img src="https://images.ctfassets.net/xjan103pcp94/5h5ZwNAqLHAIRZ9jPGvRU1/6ceb65f718883cf2e7b8ca9dcd0a5fc4/Blog_-_intro_reinforcement_2.png" width="50%" />


Flowchart of Simulation Loop:

<p>
[Start Simulation]
  -> [For Each Episode]
    -> [Reset Environment]
    -> [For Each Time Step]
      -> [Sample Random Action]
      -> [Step Environment]
      -> [Store Observation]
      -> [Check Termination]
    -> [Render]
    -> [Pause if Terminated]
  -> [Close Environment]
 <p>

<https://livebook.manning.com/concept/reinforcement-learning/cart-pole-environment>