# Mini Projects: CartPole, FrozenLake, Q-learning, and DQN

## ðŸ“š Learning Objectives

By completing this notebook, you will:
- Implement Q-learning algorithm
- Apply Q-learning to FrozenLake environment
- Apply Q-learning to CartPole environment (with state discretization)
- Understand Deep Q-Network (DQN) concepts
- Train and evaluate RL agents on classic environments

## ðŸ”— Prerequisites

- âœ… OpenAI Gym setup
- âœ… Understanding of states, actions, rewards
- âœ… Epsilon-Greedy exploration strategy
- âœ… Python knowledge (functions, classes, loops, dictionaries)
- âœ… NumPy knowledge
- âœ… Basic understanding of neural networks (for DQN section)

---

## Official Structure Reference

This notebook covers practical activities from **Course 09, Unit 1**:
- Mini projects: applying RL in games like CartPole and FrozenLake, implementing Q-learning and DQN
- **Source:** `DETAILED_UNIT_DESCRIPTIONS.md` - Unit 1 Practical Content

---

## Introduction

This notebook combines multiple mini projects:
1. **Q-learning on FrozenLake**: Classic grid-world problem with discrete states
2. **Q-learning on CartPole**: Continuous state space requiring discretization
3. **DQN Introduction**: Deep reinforcement learning for high-dimensional states

These projects demonstrate practical RL applications on classic benchmark environments.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import gym
import random
from collections import defaultdict

print("âœ… Libraries imported!")
print("\nMini Projects: CartPole, FrozenLake, Q-learning, and DQN")
print("=" * 60)

## Part 1: Q-Learning Algorithm Implementation


In [None]:
print("=" * 60)
print("Part 1: Q-Learning Algorithm Implementation")
print("=" * 60)


## Part 2: Q-Learning on FrozenLake


In [None]:
print("\n" + "=" * 60)
print("Part 2: Q-Learning on FrozenLake")
print("=" * 60)


## Part 3: Q-Learning on CartPole (with State Discretization)


In [None]:
print("\n" + "=" * 60)
print("Part 3: Q-Learning on CartPole (with State Discretization)")
print("=" * 60)


## Part 4: Introduction to Deep Q-Network (DQN)


In [None]:
print("\n" + "=" * 60)
print("Part 4: Introduction to Deep Q-Network (DQN)")
print("=" * 60)

print("\nDQN Overview:")
print(" - Uses neural network to approximate Q-function (instead of Q-table)")
print(" - Handles high-dimensional state spaces (e.g., images)")
print(" - Key innovations:")
print(" 1. Experience Replay: Store transitions, sample randomly for training")
print(" 2. Target Network: Separate network for stable Q-targets")
print(" 3. Neural Network: Approximate Q(s,a) for continuous/ high-dim states")

print("\nDQN Architecture:")
print(" Input: State (e.g., image, observation vector)")
print(" Network: Fully connected or CNN layers")
print(" Output: Q-values for each action")

print("\nDQN vs Q-Learning:")
print(" Q-Learning:")
print(" - Uses Q-table (discrete states only)")
print(" - Limited to small state spaces")
print(" - Fast for tabular problems")
print(" DQN:")
print(" - Uses neural network (continuous/high-dim states)")
print(" - Scales to complex problems (e.g., Atari games)")
print(" - Requires more computation and tuning")

print("\nDQN Algorithm (High-Level):")
print(" 1. Initialize Q-network and target network")
print(" 2. For each episode:")
print(" a. Observe state")
print(" b. Choose action using epsilon-greedy (using Q-network)")
print(" c. Take action, store transition in replay buffer")
print(" d. Sample batch from replay buffer")
print(" e. Compute targets using target network")
print(" f. Update Q-network using loss: (Q(s,a) - target)^2")
print(" g. Periodically update target network")

print("\nNote: Full DQN implementation will be covered in Unit 3 (Deep RL)")
print("This is an introduction to the concepts.")

print("\nâœ… DQN introduction complete!")

## Summary

### Key Concepts:
1. **Q-Learning**: Off-policy TD learning algorithm
   - Updates: Q(s,a) = Q(s,a) + Î±[r + Î³*max(Q(s',a')) - Q(s,a)]
   - Uses Q-table for discrete states
   - Epsilon-greedy exploration

2. **FrozenLake**: Discrete grid-world environment
   - Perfect for tabular Q-learning
   - 16 states, 4 actions
   - Slippery/unslippery variants

3. **CartPole**: Continuous state space
   - Requires state discretization for Q-learning
   - 4 continuous state variables â†’ discrete bins
   - Alternative: Use DQN for continuous states

4. **DQN**: Deep Q-Network
   - Neural network approximates Q-function
   - Handles high-dimensional/continuous states
   - Experience replay and target networks for stability

### Implementation Highlights:
- **Q-Learning**: Tabular method, fast for discrete problems
- **State Discretization**: Convert continuous to discrete for Q-learning
- **Epsilon Decay**: Reduce exploration over time
- **DQN**: Deep learning extension for complex problems

### Best Practices:
- Start with high epsilon (exploration), decay over time
- Tune learning rate (alpha) and discount factor (gamma)
- Monitor learning curves and success rates
- Use experience replay and target networks for DQN stability

### Next Steps:
- Unit 2: Advanced Q-learning (SARSA, TD methods)
- Unit 3: Deep RL (DQN, Actor-Critic, PPO)
- Unit 4: Exploration strategies
- Unit 5: Advanced applications

**Reference:** Course 09, Unit 1: "Introduction to Reinforcement Learning" - Mini projects practical content