# RL in OpenAI Gym 🏋️‍♂️

This notebook demonstrates how to use **OpenAI Gym** environments for testing and training Reinforcement Learning (RL) agents.

We’ll cover:
- Introduction to OpenAI Gym
- Running a random policy
- Interacting with environments
- Training a simple agent using Q-learning
- Visualizing agent performance

## 1. Introduction to OpenAI Gym

**OpenAI Gym** is a toolkit for developing and comparing reinforcement learning algorithms. It provides environments for various tasks, from simple games to robotic control.

Let's start by installing and importing the required libraries.

In [None]:
!pip install gym numpy matplotlib --quiet

In [None]:
import gym
import numpy as np
import matplotlib.pyplot as plt

env = gym.make('CartPole-v1')
state = env.reset()
print('Initial State:', state)

## 2. Running a Random Policy

A *policy* defines the behavior of the agent. Let’s see how a **random policy** performs.

In [None]:
episodes = 5
for ep in range(episodes):
    state = env.reset()
    done = False
    total_reward = 0
    while not done:
        action = env.action_space.sample()  # random action
        state, reward, done, _, _ = env.step(action)
        total_reward += reward
    print(f"Episode {ep+1}: Total Reward = {total_reward}")

## 3. Building a Simple Q-Learning Agent

We'll use a discretized version of the `CartPole` environment for tabular Q-learning.

In [None]:
from collections import defaultdict

def discretize(obs):
    bins = [np.linspace(-4.8, 4.8, 10), np.linspace(-4, 4, 10), np.linspace(-0.418, 0.418, 10), np.linspace(-4, 4, 10)]
    state_disc = tuple(np.digitize(o, b) for o, b in zip(obs, bins))
    return state_disc

env = gym.make('CartPole-v1')
q_table = defaultdict(lambda: np.zeros(env.action_space.n))
alpha = 0.1
gamma = 0.99
epsilon = 0.1
episodes = 500
rewards = []

for ep in range(episodes):
    state = discretize(env.reset()[0])
    done = False
    total_reward = 0
    while not done:
        if np.random.rand() < epsilon:
            action = env.action_space.sample()
        else:
            action = np.argmax(q_table[state])
        next_state, reward, done, _, _ = env.step(action)
        next_state = discretize(next_state)
        q_table[state][action] += alpha * (reward + gamma * np.max(q_table[next_state]) - q_table[state][action])
        state = next_state
        total_reward += reward
    rewards.append(total_reward)
    if (ep + 1) % 50 == 0:
        print(f"Episode {ep+1}, Average Reward: {np.mean(rewards[-50:]):.2f}")

## 4. Visualizing Results

In [None]:
plt.plot(rewards)
plt.title('Q-Learning Rewards over Episodes')
plt.xlabel('Episode')
plt.ylabel('Reward')
plt.show()

## 5. Summary

In this notebook, we:
- Explored OpenAI Gym environments.
- Ran a random policy.
- Implemented a simple Q-learning algorithm.
- Visualized agent performance.

Next, we can move on to **Deep Q-Learning** and **Policy Gradient** methods for more complex environments!