# Reinforcement Learning Example with Q-Learning

## What is **Reinforcement Learning**?
An agent (like a robot or program) learns to make decisions through **trial and error**, receiving **rewards** or **penalties**.

## Example: Simple Maze
We’ll create a 4x4 maze where the agent starts at (0,0) and must reach the goal (3,3).

### Code Steps:
1. Define the maze and rewards.
2. Implement the Q-Learning algorithm.
3. Train the agent and visualize the learned policy.


In [None]:
# Import libraries
import numpy as np

# Maze settings
n_rows = 4
n_cols = 4
goal = (3, 3)

# Q-Table (initialized with zeros)
Q = np.zeros((n_rows, n_cols, 4))  # 4 actions: up, down, left, right

# Q-Learning parameters
alpha = 0.1  # Learning rate
gamma = 0.9  # Discount factor
episodes = 1000

# Simulation
for _ in range(episodes):
    state = (0, 0)  # Initial state
    while state != goal:
        action = np.random.randint(0, 4)  # Random action (exploration)
        next_row, next_col = state
        
        if action == 0: next_row -= 1  # Up
        elif action == 1: next_row += 1  # Down
        elif action == 2: next_col -= 1  # Left
        else: next_col += 1  # Right
        
        # Keep agent within maze bounds
        next_row = max(0, min(next_row, n_rows - 1))
        next_col = max(0, min(next_col, n_cols - 1))
        
        # Reward: +10 for goal, -1 otherwise
        reward = 10 if (next_row, next_col) == goal else -1
        
        # Update Q-Table
        Q[state][action] += alpha * (reward + gamma * np.max(Q[next_row, next_col]) - Q[state][action])
        state = (next_row, next_col)

# Optimal policy (best action per state)
policy = np.argmax(Q, axis=2)
print("Learned Policy (0=Up, 1=Down, 2=Left, 3=Right):")
print(policy)
