# ðŸ§  Unit 5.1: Reinforcement Learning Fundamentals

**Course:** Advanced Machine Learning (AICC 303)  
**Topic:** 5.1 RL Fundamentals, MDP, Q-Learning (Tabular)

**Goal:** Understand how agents learn from interaction with an environment.

---

In [None]:
import numpy as np
import gymnasium as gym
import matplotlib.pyplot as plt

# Create Environment (FrozenLake-v1)
# is_slippery=False makes it deterministic for easier learning
env = gym.make('FrozenLake-v1', is_slippery=False, render_mode=None)

print("Action Space:", env.action_space)  # Discrete(4): Left, Down, Right, Up
print("State Space:", env.observation_space) # Discrete(16): 4x4 Grid

## 1. Q-Learning (Tabular)

**Bellman Equation:**
$Q(s,a) \leftarrow Q(s,a) + \alpha [R + \gamma \max_{a'} Q(s',a') - Q(s,a)]$

*   $\alpha$: Learning Rate
*   $\gamma$: Discount Factor
*   $\epsilon$: Exploration Rate

In [None]:
# Initialize Q-Table with zeros
q_table = np.zeros([env.observation_space.n, env.action_space.n])

# Hyperparameters
alpha = 0.8
gamma = 0.95
epsilon = 0.1
episodes = 1000

rewards_all = []

for i in range(episodes):
    state, _ = env.reset()
    done = False
    trunc = False
    total_reward = 0
    
    while not (done or trunc):
        # Exploration vs Exploitation
        if np.random.rand() < epsilon:
            action = env.action_space.sample()
        else:
            action = np.argmax(q_table[state, :])
        
        # Take Action
        next_state, reward, terminated, truncated, _ = env.step(action)
        done = terminated or truncated
        
        # Update Q-Table
        q_table[state, action] = q_table[state, action] + alpha * (reward + gamma * np.max(q_table[next_state, :]) - q_table[state, action])
        
        state = next_state
        total_reward += reward
        
    rewards_all.append(total_reward)

print("Training Finished")
print(f"Success Rate: {sum(rewards_all)/episodes:.2f}")
print("\nFinal Q-Table Values:\n", np.round(q_table, 2))