<a href="https://colab.research.google.com/github/adithya1010/Naan-Mudhalvan-Labs/blob/main/Reinforcement-Learning/RL_Intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Reinforcement Learning Intro:**

Done with inputs from Gemini 2.0 Flash

**Links to Chats:**

1. https://g.co/gemini/share/632437b0bab4
2. https://extension.getmerlin.in/chat/share/a7bbbd23-42e1-402e-b294-70353b022b1d


**References:**

1.https://www.geeksforgeeks.org/what-is-reinforcement-learning/

2.https://www.imperial.ac.uk/media/imperial-college/faculty-of-engineering/computing/public/2021-ug-projects/State-space-decomposition-for-Reinforcement-Learning.pdf

3.https://medium.com/@walkerastro41/action-space-state-space-observation-space-demystified-6c9c00a355b4

### Importing libraries

In [None]:
import gym
import numpy as np
import warnings

In [None]:
# Suppress specific deprecation warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [None]:
# Load the environment with render mode specified
env = gym.make('CartPole-v1', render_mode="human")

In [None]:
# Initialize the environment to get the initial state
state = env.reset()


In reinforcement learning, the state space encompasses all possible situations an agent can find itself in, while the action space represents all the valid moves or choices an agent can make within that environment .

Here's a more detailed explanation:

State Space: The state space is the set of all possible configurations of the environment . A state should provide enough information to predict future rewards and states accurately, adhering to the Markov Property . In simpler terms, the state should contain all the relevant information to make a decision without needing to refer to the history of previous states or actions . The representation of the state is a crucial aspect of reinforcement learning . For complex problems, states are often represented as a vector of features . For example, in robotics, this could include positions, angles, and velocities of mechanical parts .

Action Space: The action space is the set of all possible actions that the agent can take in a given state . Like state spaces, action spaces can be discrete or continuous . A discrete action space means the agent can only choose from a finite set of actions . A continuous action space means the agent can select actions from a continuous range of values .

Key Considerations:

Observation vs. State: It's important to distinguish between observation and state . An observation is the data an agent perceives directly from the environment, while the state is a processed representation that should satisfy the Markov Property .
Curse of Dimensionality: In complex problems, the state space can become very large, leading to the "curse of dimensionality" . This makes it difficult to explore and learn effectively.
Simplifying State Spaces: If possible, simplifying the problem to have a smaller, discrete state space (less than a few million states) can allow the use of tabular learning algorithms, which offer theoretical guarantees of convergence .
Function Approximation: When dealing with large state spaces, function approximation techniques (such as neural networks) are used to estimate values . Feature engineering becomes important in these cases .
Gym Standard: It is beneficial to adhere to the Gym standard for defining action spaces to ensure compatibility with various reinforcement learning algorithms .
In essence, defining the state and action spaces correctly is fundamental to designing a reinforcement learning solution . The agent uses these spaces to learn an optimal policy, which maps states to actions to maximize rewards over time

In [None]:
# Print the state space and action space
print("State space:", env.observation_space)
print("Action space:", env.action_space)

State space: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)
Action space: Discrete(2)


This code snippet handles the unpacking of results from an environment's step() function, addressing the variation in return values between different environment versions or types. Here's a breakdown:

Purpose:

The primary goal is to reliably extract the essential information from the environment's step() output, regardless of whether the environment returns 4 or 5 values.

Code Explanation:

if len(step_result) == 4:

This checks if the step_result tuple contains 4 elements. This usually indicates an older OpenAI Gym environment or a similar environment that doesn't explicitly return a truncated flag.
next_state, reward, done, info = step_result: If the length is 4, it unpacks the tuple into the variables:
next_state: The new state of the environment after the action.
reward: The reward received for taking the action.
done: A boolean indicating1 whether the episode has terminated2 (reached a terminal state).
1.
github.com
github.com
2.
github.com
github.com
info: A dictionary containing additional information (e.g., debugging information).
terminated = False : The terminated variable is set to false because the truncated variable is not available, so only the done variable will be used to end the episode.
else:

This handles the case where step_result contains 5 elements, which is typical for newer OpenAI Gym environments that include a truncated flag.
next_state, reward, done, truncated, info = step_result: The tuple is unpacked into:
next_state, reward, done, and info (same as above).
truncated: A boolean indicating whether the episode was truncated due to a time limit or other conditions.
terminated = done or truncated: The terminated variable is set to True if either done or truncated is True, signifying the end of the episode.
print(f"Action: {action}, Reward: {reward}, Next State: {next_state}, Done: {done}, Info: {info}")

This line prints the key information obtained from the step() function, providing a trace of the agent's interaction with the environment.
if terminated:

This condition checks if the episode has ended (either done or truncated is True).
state = env.reset(): If the episode is finished, the environment is reset to its initial state, preparing for a new episode.
env.close()

This line closes the environment after all episodes are completed, releasing any resources it was using.

In [None]:
# Run a few steps in the environment with random actions
for _ in range(10):
    env.render()  # Render the environment for visualization
    action = env.action_space.sample()  # Take a random action

    # Take a step in the environment
    step_result = env.step(action)

    # Check the number of values returned and unpack accordingly
    if len(step_result) == 4:
        next_state, reward, done, info = step_result
        terminated = False
    else:
        next_state, reward, done, truncated, info = step_result
        terminated = done or truncated

    print(f"Action: {action}, Reward: {reward}, Next State: {next_state}, Done: {done}, Info: {info}")

    if terminated:
        state = env.reset()  # Reset the environment if the episode is finished

env.close()  # Close the environment when done

Action: 0, Reward: 1.0, Next State: [ 0.01550099 -0.03892491 -0.04219836 -0.03132464], Done: False, Info: {}
Action: 0, Reward: 1.0, Next State: [ 0.01472249 -0.23341711 -0.04282485  0.24775131], Done: False, Info: {}
Action: 0, Reward: 1.0, Next State: [ 0.01005415 -0.42790213 -0.03786983  0.5266247 ], Done: False, Info: {}
Action: 0, Reward: 1.0, Next State: [ 0.0014961  -0.62247133 -0.02733733  0.8071382 ], Done: False, Info: {}
Action: 0, Reward: 1.0, Next State: [-0.01095332 -0.8172081  -0.01119457  1.0910981 ], Done: False, Info: {}
Action: 0, Reward: 1.0, Next State: [-0.02729749 -1.0121808   0.01062739  1.3802476 ], Done: False, Info: {}
Action: 0, Reward: 1.0, Next State: [-0.0475411  -1.2074337   0.03823234  1.676235  ], Done: False, Info: {}
Action: 0, Reward: 1.0, Next State: [-0.07168978 -1.4029778   0.07175704  1.9805743 ], Done: False, Info: {}
Action: 1, Reward: 1.0, Next State: [-0.09974933 -1.2086802   0.11136852  1.7109562 ], Done: False, Info: {}
Action: 1, Reward: 