<div align="center">

<img src="https://pytorch.org/assets/images/pytorch-logo.png" width="200" alt="PyTorch">

Author: [Sanyam Bhutani](http://twitter.com/bhutanisanyam1/)

# OpenEnv: Production RL Made Simple

### *From "Hello World" to RL Training in 5 Minutes* ✨

---

**What if RL environments were as easy to use as REST APIs?**

That's OpenEnv. Type-safe. Isolated. Production-ready. 🎯

[![GitHub](https://img.shields.io/badge/GitHub-meta--pytorch%2FOpenEnv-blue?logo=github)](https://github.com/meta-pytorch/OpenEnv)
[![License](https://img.shields.io/badge/License-BSD%203--Clause-green.svg)](https://opensource.org/licenses/BSD-3-Clause)
[![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?logo=pytorch&logoColor=white)](https://pytorch.org/)

</div>

---

## 📋 What You'll Learn

<table>
<tr>
<td width="50%">

**🎯 Part 1-2: The Fundamentals**
- ⚡ RL in 60 seconds
- 🤔 Why existing solutions fall short
- 💡 The OpenEnv solution

</td>
<td width="50%">

**🏗️ Part 3-5: The Architecture**
- 🔧 How OpenEnv works
- 🔍 Exploring real code
- 🎮 OpenSpiel integration example

</td>
</tr>
<tr>
<td width="50%">

**🎮 Part 6-8: Hands-On Demo**
- 🔨 Build a game environment
- 🤖 Test 4 different policies
- 👀 Watch learning happen live

</td>
<td width="50%">

**🔧 Part 9-10: Going Further**
- 🚀 Use real OpenSpiel
- ✨ Create your own integration
- 🌐 Deploy to production

</td>
</tr>
</table>

> 💡 **Pro Tip**: This notebook is designed to run top-to-bottom in Google Colab with zero setup!
> 
> ⏱️ **Time**: ~5 minutes | 📊 **Difficulty**: Beginner-friendly | 🎯 **Outcome**: Production-ready RL knowledge

---

# Part 1: RL in 60 Seconds ⏱️

<div style="background-color: #f0f7ff; padding: 20px; border-left: 5px solid #2196F3; margin: 20px 0;">

**Reinforcement Learning is simpler than you think.**

It's just a loop:

```
while not done:
    observation = environment.observe()
    action = policy.choose(observation)
    reward = environment.step(action)
    policy.learn(reward)
```

That's it. That's RL.

</div>

Let's see it in action:

In [None]:
import random

print("🎲 " + "="*58 + " 🎲")
print("   Number Guessing Game - The Simplest RL Example")
print("🎲 " + "="*58 + " 🎲")

# Environment setup
target = random.randint(1, 10)
guesses_left = 3

print(f"\n🎯 I'm thinking of a number between 1 and 10...")
print(f"💭 You have {guesses_left} guesses. Let's see how random guessing works!\n")

# The RL Loop - Pure random policy (no learning!)
while guesses_left > 0:
    # Policy: Random guessing (no learning yet!)
    guess = random.randint(1, 10)
    guesses_left -= 1
    
    print(f"💭 Guess #{3-guesses_left}: {guess}", end=" → ")
    
    # Reward signal (but we're not using it!)
    if guess == target:
        print("🎉 Correct! +10 points")
        break
    elif abs(guess - target) <= 2:
        print("🔥 Warm! (close)")
    else:
        print("❄️  Cold! (far)")
else:
    print(f"\n💔 Out of guesses. The number was {target}.")

print("\n" + "="*62)
print("💡 This is RL: Observe → Act → Reward → Repeat")
print("   But this policy is terrible! It doesn't learn from rewards.")
print("="*62 + "\n")

---

<a id="part-2"></a>
# Part 2: The Problem with Traditional RL 😤

<div style="background-color: #fff3e0; padding: 20px; border-radius: 10px; margin: 20px 0;">

## 🤔 Why Can't We Just Use OpenAI Gym?

Good question! Gym is great for research, but production needs more...

</div>

<table>
<tr>
<th>Challenge</th>
<th>Traditional Approach</th>
<th>OpenEnv Solution</th>
</tr>
<tr>
<td><b>Type Safety</b></td>
<td>❌ <code>obs[0][3]</code> - what is this?</td>
<td>✅ <code>obs.info_state</code> - IDE knows!</td>
</tr>
<tr>
<td><b>Isolation</b></td>
<td>❌ Same process (can crash your training)</td>
<td>✅ Docker containers (fully isolated)</td>
</tr>
<tr>
<td><b>Deployment</b></td>
<td>❌ "Works on my machine" 🤷</td>
<td>✅ Same container everywhere 🐳</td>
</tr>
<tr>
<td><b>Scaling</b></td>
<td>❌ Hard to distribute</td>
<td>✅ Deploy to Kubernetes ☸️</td>
</tr>
<tr>
<td><b>Language</b></td>
<td>❌ Python only</td>
<td>✅ Any language (HTTP API) 🌐</td>
</tr>
<tr>
<td><b>Debugging</b></td>
<td>❌ Cryptic numpy errors</td>
<td>✅ Clear type errors 🐛</td>
</tr>
</table>

<div style="background-color: #d4edda; padding: 20px; border-left: 5px solid #28a745; margin: 20px 0;">

## 💡 The OpenEnv Philosophy

**"RL environments should be like microservices"**

Think of it like this: You don't run your database in the same process as your web server, right? Same principle!

- 🔒 **Isolated**: Run in containers (security + stability)
- 🌐 **Standard**: HTTP API, works everywhere
- 📦 **Versioned**: Docker images (reproducibility!)
- 🚀 **Scalable**: Deploy to cloud with one command
- 🛡️ **Type-safe**: Catch bugs before they happen
- 🔄 **Portable**: Works on Mac, Linux, Windows, Cloud

</div>

### The Architecture

```
┌────────────────────────────────────────────────────────────┐
│  YOUR TRAINING CODE                                        │
│                                                            │
│  env = OpenSpielEnv(...)        ← Import the client      │
│  result = env.reset()           ← Type-safe!             │
│  result = env.step(action)      ← Type-safe!             │
│                                                            │
└─────────────────┬──────────────────────────────────────────┘
                  │
                  │  HTTP/JSON (Language-Agnostic)
                  │  POST /reset, POST /step, GET /state
                  │
┌─────────────────▼──────────────────────────────────────────┐
│  DOCKER CONTAINER                                          │
│                                                            │
│  ┌──────────────────────────────────────────────┐         │
│  │  FastAPI Server                              │         │
│  │  └─ Environment (reset, step, state)         │         │
│  │     └─ Your Game/Simulation Logic            │         │
│  └──────────────────────────────────────────────┘         │
│                                                            │
│  Isolated • Reproducible • Secure                          │
└────────────────────────────────────────────────────────────┘
```

<div style="background-color: #e7f3ff; padding: 15px; border-left: 5px solid #0366d6; margin: 20px 0;">

**🎯 Key Insight**: You never see HTTP details - just clean Python methods!

```python
env.reset()    # Under the hood: HTTP POST to /reset
env.step(...)  # Under the hood: HTTP POST to /step
env.state()    # Under the hood: HTTP GET to /state
```

The magic? OpenEnv handles all the plumbing. You focus on RL! ✨

</div>

---

# Part 2: The Problem with Traditional RL 😤

<div style="background-color: #fff3e0; padding: 20px; border-radius: 10px; margin: 20px 0;">

## 🤔 Why Can't We Just Use OpenAI Gym?

Good question! Gym is great for research, but production needs more...

</div>

<table>
<tr>
<th>Challenge</th>
<th>Traditional Approach</th>
<th>OpenEnv Solution</th>
</tr>
<tr>
<td><b>Type Safety</b></td>
<td>❌ <code>obs[0][3]</code> - what is this?</td>
<td>✅ <code>obs.info_state</code> - IDE knows!</td>
</tr>
<tr>
<td><b>Isolation</b></td>
<td>❌ Same process (can crash your training)</td>
<td>✅ Docker containers (fully isolated)</td>
</tr>
<tr>
<td><b>Deployment</b></td>
<td>❌ "Works on my machine" 🤷</td>
<td>✅ Same container everywhere 🐳</td>
</tr>
<tr>
<td><b>Scaling</b></td>
<td>❌ Hard to distribute</td>
<td>✅ Deploy to Kubernetes ☸️</td>
</tr>
<tr>
<td><b>Language</b></td>
<td>❌ Python only</td>
<td>✅ Any language (HTTP API) 🌐</td>
</tr>
<tr>
<td><b>Debugging</b></td>
<td>❌ Cryptic numpy errors</td>
<td>✅ Clear type errors 🐛</td>
</tr>
</table>

<div style="background-color: #d4edda; padding: 20px; border-left: 5px solid #28a745; margin: 20px 0;">

## 💡 The OpenEnv Philosophy

**"RL environments should be like microservices"**

Think of it like this: You don't run your database in the same process as your web server, right? Same principle!

- 🔒 **Isolated**: Run in containers (security + stability)
- 🌐 **Standard**: HTTP API, works everywhere
- 📦 **Versioned**: Docker images (reproducibility!)
- 🚀 **Scalable**: Deploy to cloud with one command
- 🛡️ **Type-safe**: Catch bugs before they happen
- 🔄 **Portable**: Works on Mac, Linux, Windows, Cloud

</div>

---

# Part 3: Setup 🛠️

<div style="background-color: #f8f9fa; padding: 15px; border-radius: 5px; margin: 20px 0;">

**Running in Colab?** This cell will clone OpenEnv and install dependencies automatically.

**Running locally?** Make sure you're in the OpenEnv directory.

</div>

---

# Part 4: The OpenEnv Pattern 🏗️

<div style="background-color: #f0f7ff; padding: 20px; border-radius: 10px; margin: 20px 0;">

## Every OpenEnv Environment Has 3 Components:

```
src/envs/your_env/
├── 📝 models.py          ← Type-safe contracts
│                           (Action, Observation, State)
│
├── 📱 client.py          ← What YOU import
│                           (HTTPEnvClient implementation)
│
└── 🖥️  server/
    ├── environment.py    ← Game/simulation logic
    ├── app.py            ← FastAPI server
    └── Dockerfile        ← Container definition
```

</div>

Let's explore the actual OpenEnv code to see how this works:

---

# Part 5: Example Integration - OpenSpiel 🎮

<div style="background-color: #fff3e0; padding: 20px; border-radius: 10px; margin: 20px 0;">

## What is OpenSpiel?

**OpenSpiel** is a library from DeepMind with **70+ game environments** for RL research.

## OpenEnv's Integration

We've wrapped **6 OpenSpiel games** following the OpenEnv pattern:

<table>
<tr>
<td width="50%">

**🎯 Single-Player**
1. **Catch** - Catch falling ball
2. **Cliff Walking** - Navigate grid
3. **2048** - Tile puzzle
4. **Blackjack** - Card game

</td>
<td width="50%">

**👥 Multi-Player**
5. **Tic-Tac-Toe** - Classic 3×3
6. **Kuhn Poker** - Imperfect info poker

</td>
</tr>
</table>

This shows how OpenEnv can wrap **any** existing RL library!

</div>

In [None]:
# Import OpenSpiel integration models
from envs.openspiel_env.models import (
    OpenSpielAction,
    OpenSpielObservation,
    OpenSpielState
)
from dataclasses import fields

print("="*70)
print("   🎮 OPENSPIEL INTEGRATION - TYPE-SAFE MODELS")
print("="*70)

print("\n📤 OpenSpielAction (what you send):")
print("   " + "─" * 64)
for field in fields(OpenSpielAction):
    print(f"   • {field.name:20s} : {field.type}")

print("\n📥 OpenSpielObservation (what you receive):")
print("   " + "─" * 64)
for field in fields(OpenSpielObservation):
    print(f"   • {field.name:20s} : {field.type}")

print("\n📊 OpenSpielState (episode metadata):")
print("   " + "─" * 64)
for field in fields(OpenSpielState):
    print(f"   • {field.name:20s} : {field.type}")

print("\n" + "="*70)
print("\n💡 Type safety means:")
print("   ✅ Your IDE autocompletes these fields")
print("   ✅ Typos are caught before running")
print("   ✅ Refactoring is safe")
print("   ✅ Self-documenting code\n")

### How the Client Works

<div style="background-color: #e7f3ff; padding: 15px; border-radius: 5px; margin: 20px 0;">

The client **inherits from HTTPEnvClient** and implements 3 methods:

1. `_step_payload()` - Convert action → JSON
2. `_parse_result()` - Parse JSON → typed observation  
3. `_parse_state()` - Parse JSON → state

That's it! The base class handles all HTTP communication.

</div>

In [None]:
---

<a id="part-6"></a>
<div style="text-align: center; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 30px; border-radius: 15px; margin: 30px 0;">

# 🎮 Part 6: Interactive Demo

### Now let's BUILD something!

We'll create a **Catch game** following OpenEnv patterns,<br>
then watch **4 different AI policies** compete for the championship! 🏆

<br>

**Get ready for:**
- ⚡ Live gameplay visualization
- 🤖 AI policy showdown
- 📊 Real-time learning metrics
- 🎯 Production-ready patterns

</div>

---

<div style="text-align: center; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 30px; border-radius: 15px; margin: 30px 0;">

# 🎮 Part 6: Interactive Demo

### Now let's BUILD something!

We'll create a **Catch game** following OpenEnv patterns,<br>
then watch **4 different AI policies** compete for the championship! 🏆

<br>

**Get ready for:**
- ⚡ Live gameplay visualization
- 🤖 AI policy showdown
- 📊 Real-time learning metrics
- 🎯 Production-ready patterns

</div>

## The Game: Catch 🔴🏓

<table>
<tr>
<td width="40%" style="text-align: center;">

```
⬜ ⬜ 🔴 ⬜ ⬜   
⬜ ⬜ ⬜ ⬜ ⬜   Ball
⬜ ⬜ ⬜ ⬜ ⬜   falls
⬜ ⬜ ⬜ ⬜ ⬜   down
⬜ ⬜ 🏓 ⬜ ⬜   
     Paddle
```

</td>
<td width="60%">

**Rules:**
- 5×5 grid
- Ball falls from random column
- Move paddle to catch it

**Actions:**
- `0` = Move LEFT ⬅️
- `1` = STAY 🛑
- `2` = Move RIGHT ➡️

**Reward:**
- `+1` if caught 🎉
- `0` if missed 😢

</td>
</tr>
</table>

<div style="background-color: #d4edda; padding: 15px; border-left: 5px solid #28a745; margin: 20px 0;">

**🎯 Why This Game?**
- Simple rules (easy to understand)
- Visual (see what's happening)
- Fast episodes (~5 steps)
- Clear success/failure
- Perfect for testing policies!

</div>

In [None]:
import random
from dataclasses import dataclass
from typing import List, Tuple

# ============================================================================
# MODELS - Type-safe contracts (following OpenEnv pattern)
# ============================================================================

@dataclass
class CatchObservation:
    """Type-safe observation following OpenEnv Observation base class."""
    info_state: List[float]      # Grid as flat array
    legal_actions: List[int]     # [0, 1, 2] always
    done: bool                   # Episode finished?
    reward: float                # +1 or 0
    # Extra fields for visualization
    ball_position: Tuple[int, int]
    paddle_position: int


# ============================================================================
# ENVIRONMENT - Server-side logic (following OpenEnv Environment pattern)
# ============================================================================

class CatchEnvironment:
    """
    Catch game following OpenEnv's Environment pattern.
    
    In production:
      • Runs in Docker container
      • Accessed via HTTPEnvClient
      • Exposed via FastAPI server
    
    For this demo:
      • We run it locally to see internals
      • But the structure is identical!
    """
    
    def __init__(self, grid_size=5):
        self.grid_size = grid_size
    
    def reset(self) -> CatchObservation:
        """Start new episode (implements Environment.reset())."""
        self.ball_row = 0
        self.ball_col = random.randint(0, self.grid_size - 1)
        self.paddle_col = self.grid_size // 2
        self.done = False
        return self._make_observation()
    
    def step(self, action: int) -> CatchObservation:
        """Execute action (implements Environment.step()).
        
        Args:
            action: 0=LEFT, 1=STAY, 2=RIGHT
        """
        # Move paddle
        if action == 0 and self.paddle_col > 0:
            self.paddle_col -= 1
        elif action == 2 and self.paddle_col < self.grid_size - 1:
            self.paddle_col += 1
        
        # Move ball down
        self.ball_row += 1
        
        # Check if episode done
        if self.ball_row >= self.grid_size - 1:
            self.done = True
            reward = 1.0 if self.ball_col == self.paddle_col else 0.0
        else:
            reward = 0.0
        
        return self._make_observation(reward)
    
    def _make_observation(self, reward=0.0) -> CatchObservation:
        """Create type-safe observation."""
        # Flatten grid to vector (like real RL environments do)
        info_state = [0.0] * (self.grid_size * self.grid_size)
        ball_idx = self.ball_row * self.grid_size + self.ball_col
        paddle_idx = (self.grid_size - 1) * self.grid_size + self.paddle_col
        info_state[ball_idx] = 1.0      # Ball = 1.0
        info_state[paddle_idx] = 0.5    # Paddle = 0.5
        
        return CatchObservation(
            info_state=info_state,
            legal_actions=[0, 1, 2],
            done=self.done,
            reward=reward,
            ball_position=(self.ball_row, self.ball_col),
            paddle_position=self.paddle_col
        )
    
    def render(self):
        """Visualize current state."""
        for row in range(self.grid_size):
            line = "  "
            for col in range(self.grid_size):
                if row == self.ball_row and col == self.ball_col:
                    line += "🔴 "
                elif row == self.grid_size - 1 and col == self.paddle_col:
                    line += "🏓 "
                else:
                    line += "⬜ "
            print(line)


print("🎉 " + "="*64 + " 🎉")
print("   ✅ Environment Created Following OpenEnv Pattern!")
print("🎉 " + "="*64 + " 🎉")
print("\n📋 What we just built:")
print("   • reset() → CatchObservation (type-safe!)")
print("   • step(action) → CatchObservation (type-safe!)")
print("   • render() → Visual display")
print("\n🚀 In production: This would run in Docker + FastAPI")
print("   But the structure is EXACTLY the same!")
print("\n💡 This is your blueprint for creating ANY OpenEnv environment!\n")

### Test the Environment

In [None]:
---

<a id="part-7"></a>
# Part 7: Four Policies 🤖

<div style="background-color: #f8f9fa; padding: 20px; border-radius: 10px; margin: 20px 0;">

## Let's test 4 different AI strategies:

<table>
<tr>
<th width="25%">Policy</th>
<th width="50%">Strategy</th>
<th width="25%">Expected Performance</th>
</tr>
<tr>
<td><b>🎲 Random</b></td>
<td>Pick random action every step</td>
<td>~20% (pure luck)</td>
</tr>
<tr>
<td><b>🛑 Always Stay</b></td>
<td>Never move, hope ball lands in center</td>
<td>~20% (terrible!)</td>
</tr>
<tr>
<td><b>🧠 Smart</b></td>
<td>Move paddle toward ball</td>
<td>100% (optimal!)</td>
</tr>
<tr>
<td><b>📈 Learning</b></td>
<td>Start random, learn smart strategy</td>
<td>~85% (improves over time)</td>
</tr>
</table>

</div>

---

# Part 7: Four Policies 🤖

<div style="background-color: #f8f9fa; padding: 20px; border-radius: 10px; margin: 20px 0;">

## Let's test 4 different AI strategies:

<table>
<tr>
<th width="25%">Policy</th>
<th width="50%">Strategy</th>
<th width="25%">Expected Performance</th>
</tr>
<tr>
<td><b>🎲 Random</b></td>
<td>Pick random action every step</td>
<td>~20% (pure luck)</td>
</tr>
<tr>
<td><b>🛑 Always Stay</b></td>
<td>Never move, hope ball lands in center</td>
<td>~20% (terrible!)</td>
</tr>
<tr>
<td><b>🧠 Smart</b></td>
<td>Move paddle toward ball</td>
<td>100% (optimal!)</td>
</tr>
<tr>
<td><b>📈 Learning</b></td>
<td>Start random, learn smart strategy</td>
<td>~85% (improves over time)</td>
</tr>
</table>

</div>

In [None]:
# ============================================================================
# POLICIES - Different AI strategies
# ============================================================================

class RandomPolicy:
    """Baseline: Pure random guessing."""
    name = "🎲 Random Guesser"
    
    def select_action(self, obs: CatchObservation) -> int:
        return random.choice(obs.legal_actions)


class AlwaysStayPolicy:
    """Bad strategy: Never moves."""
    name = "🛑 Always Stay"
    
    def select_action(self, obs: CatchObservation) -> int:
        return 1  # STAY


class SmartPolicy:
    """Optimal: Move paddle toward ball."""
    name = "🧠 Smart Heuristic"
    
    def select_action(self, obs: CatchObservation) -> int:
        ball_col = obs.ball_position[1]
        paddle_col = obs.paddle_position
        
        if paddle_col < ball_col:
            return 2  # Move RIGHT
        elif paddle_col > ball_col:
            return 0  # Move LEFT
        else:
            return 1  # STAY (already aligned)


class LearningPolicy:
    """Simulated RL: Epsilon-greedy exploration."""
    name = "📈 Learning Agent"
    
    def __init__(self):
        self.steps = 0
    
    def select_action(self, obs: CatchObservation) -> int:
        self.steps += 1
        
        # Decay exploration rate over time
        epsilon = max(0.1, 1.0 - (self.steps / 100))
        
        if random.random() < epsilon:
            # Explore: random action
            return random.choice(obs.legal_actions)
        else:
            # Exploit: use smart strategy
            ball_col = obs.ball_position[1]
            paddle_col = obs.paddle_position
            if paddle_col < ball_col:
                return 2
            elif paddle_col > ball_col:
                return 0
            else:
                return 1


print("🤖 " + "="*64 + " 🤖")
print("   ✅ 4 Policies Created!")
print("🤖 " + "="*64 + " 🤖\n")

policies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]
for i, policy in enumerate(policies, 1):
    print(f"   {i}. {policy.name}")

print("\n💡 Each policy represents a different approach to solving the game!")
print("   Let's see who performs best! 🏆\n")

### Watch a Policy Play!

In [None]:
import time

def run_episode(env, policy, visualize=True, delay=0.4):
    """Run one episode with a policy."""
    
    # RESET
    obs = env.reset()
    
    if visualize:
        print(f"\n{'='*60}")
        print(f"   🎮 {policy.name}")
        print(f"   🔴 Ball will fall at column: {obs.ball_position[1]}")
        print('='*60 + '\n')
        env.render()
        time.sleep(delay)
    
    total_reward = 0
    step = 0
    action_names = ["⬅️  LEFT", "🛑 STAY", "➡️  RIGHT"]
    
    # THE RL LOOP
    while not obs.done:
        # 1. Policy chooses action
        action = policy.select_action(obs)
        
        # 2. Environment executes
        obs = env.step(action)
        
        # 3. Collect reward
        total_reward += obs.reward
        
        if visualize:
            print(f"\n📍 Step {step + 1}: {action_names[action]}")
            env.render()
            time.sleep(delay)
        
        step += 1
    
    if visualize:
        result = "🎉 CAUGHT!" if total_reward > 0 else "😢 MISSED"
        print(f"\n{'='*60}")
        print(f"   {result} Reward: {total_reward}")
        print('='*60)
    
    return total_reward > 0


# Demo: Watch Smart Policy in action
env = CatchEnvironment()
policy = SmartPolicy()
run_episode(env, policy, visualize=True, delay=0.4)

---

<a id="part-8"></a>
# Part 8: Policy Competition! 🏆

<div style="background-color: #e7f3ff; padding: 20px; border-radius: 10px; margin: 20px 0;">

Let's run **50 episodes** for each policy and see who wins!

</div>

---

# Part 8: Policy Competition! 🏆

<div style="background-color: #e7f3ff; padding: 20px; border-radius: 10px; margin: 20px 0;">

Let's run **50 episodes** for each policy and see who wins!

</div>

In [None]:
def evaluate_policies(num_episodes=50):
    """Compare all policies over many episodes."""
    policies = [
        RandomPolicy(),
        AlwaysStayPolicy(),
        SmartPolicy(),
        LearningPolicy(),
    ]
    
    print("\n🏆 " + "="*66 + " 🏆")
    print(f"   POLICY SHOWDOWN - {num_episodes} Episodes Each")
    print("🏆 " + "="*66 + " 🏆\n")
    
    results = []
    for policy in policies:
        print(f"⚡ Testing {policy.name}...", end=" ")
        env = CatchEnvironment()
        successes = sum(run_episode(env, policy, visualize=False) 
                       for _ in range(num_episodes))
        success_rate = (successes / num_episodes) * 100
        results.append((policy.name, success_rate, successes))
        print(f"✓ Done!")
    
    print("\n" + "="*70)
    print("   📊 FINAL RESULTS")
    print("="*70 + "\n")
    
    # Sort by success rate (descending)
    results.sort(key=lambda x: x[1], reverse=True)
    
    # Award medals to top 3
    medals = ["🥇", "🥈", "🥉", "  "]
    
    for i, (name, rate, successes) in enumerate(results):
        medal = medals[i]
        bar = "█" * int(rate / 2)
        print(f"{medal} {name:25s} [{bar:<50}] {rate:5.1f}% ({successes}/{num_episodes})")
    
    print("\n" + "="*70)
    print("\n✨ Key Insights:")
    print("   • Random (~20%):      Baseline - pure luck 🎲")
    print("   • Always Stay (~20%): Bad strategy - stays center 🛑")
    print("   • Smart (100%):       Optimal - perfect play! 🧠")
    print("   • Learning (~85%):    Improves over time 📈")
    print("\n🎓 This is Reinforcement Learning in action:")
    print("   1. Start with exploration (trying random things)")
    print("   2. Learn from rewards (what works, what doesn't)")
    print("   3. Converge to optimal behavior (smart strategy)")
    print("\n🎯 The Learning Agent gets smarter with every episode!\n")

# Run the epic competition!
print("🎮 Starting the showdown...")
evaluate_policies(num_episodes=50)

---

<a id="part-9"></a>
# Part 9: Using Real OpenSpiel 🎮

<div style="background-color: #d4edda; padding: 20px; border-radius: 10px; margin: 20px 0;">

## What We Just Built vs Production OpenSpiel

<table>
<tr>
<th>Component</th>
<th>Our Demo</th>
<th>OpenEnv + OpenSpiel</th>
</tr>
<tr>
<td><b>Environment</b></td>
<td>Local Python class</td>
<td>Docker container</td>
</tr>
<tr>
<td><b>Communication</b></td>
<td>Direct function calls</td>
<td>HTTP/JSON</td>
</tr>
<tr>
<td><b>Client</b></td>
<td>Direct access</td>
<td>HTTPEnvClient</td>
</tr>
<tr>
<td><b>Type Safety</b></td>
<td>✅ Dataclasses</td>
<td>✅ Dataclasses</td>
</tr>
<tr>
<td><b>API</b></td>
<td>reset(), step()</td>
<td>reset(), step() <em>(same!)</em></td>
</tr>
</table>

**🎯 Same structure, production features!**

</div>

### Using OpenSpiel Integration:

```python
# 1. Install OpenSpiel
!pip install open_spiel

# 2. Import OpenEnv's integration
from envs.openspiel_env import OpenSpielEnv, OpenSpielAction

# 3. Connect to server (HTTP!)
env = OpenSpielEnv(base_url="http://localhost:8000")

# 4. Same API you just learned!
result = env.reset()
result = env.step(OpenSpielAction(action_id=2, game_name="catch"))
state = env.state()

# 5. Switch games by changing game_name:
result = env.step(OpenSpielAction(action_id=4, game_name="tic_tac_toe"))
```

<div style="background-color: #fff3e0; padding: 15px; border-radius: 5px; margin: 20px 0;">

**🎮 6 Games Available:**

1. `"catch"` - What we just built!
2. `"tic_tac_toe"` - Classic 3×3
3. `"kuhn_poker"` - Imperfect information poker
4. `"cliff_walking"` - Grid navigation
5. `"2048"` - Tile puzzle
6. `"blackjack"` - Card game

**All use the exact same interface!**

</div>

---

<a id="part-10"></a>
# Part 10: Create Your Own Integration 🛠️

<div style="background-color: #e7f3ff; padding: 20px; border-radius: 10px; margin: 20px 0;">

## The 5-Step Pattern

Want to wrap your own environment in OpenEnv? Here's how:

</div>

### Step 1: Define Types (`models.py`)

```python
from dataclasses import dataclass
from core.env_server import Action, Observation, State

@dataclass
class YourAction(Action):
    action_value: int
    # Add your action fields

@dataclass
class YourObservation(Observation):
    state_data: List[float]
    done: bool
    reward: float
    # Add your observation fields

@dataclass
class YourState(State):
    episode_id: str
    step_count: int
    # Add your state fields
```

### Step 2: Implement Environment (`server/environment.py`)

```python
from core.env_server import Environment

class YourEnvironment(Environment):
    def reset(self) -> Observation:
        # Initialize your game/simulation
        return YourObservation(...)
    
    def step(self, action: Action) -> Observation:
        # Execute action, update state
        return YourObservation(...)
    
    @property
    def state(self) -> State:
        return self._state
```

### Step 3: Create Client (`client.py`)

```python
from core.http_env_client import HTTPEnvClient
from core.types import StepResult

class YourEnv(HTTPEnvClient[YourAction, YourObservation]):
    def _step_payload(self, action: YourAction) -> dict:
        """Convert action to JSON"""
        return {"action_value": action.action_value}
    
    def _parse_result(self, payload: dict) -> StepResult:
        """Parse JSON to observation"""
        return StepResult(
            observation=YourObservation(...),
            reward=payload['reward'],
            done=payload['done']
        )
    
    def _parse_state(self, payload: dict) -> YourState:
        return YourState(...)
```

### Step 4: Create Server (`server/app.py`)

```python
from core.env_server import create_fastapi_app
from .your_environment import YourEnvironment

env = YourEnvironment()
app = create_fastapi_app(env)

# That's it! OpenEnv creates all endpoints for you.
```

### Step 5: Dockerize (`server/Dockerfile`)

```dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
```

<div style="background-color: #d4edda; padding: 20px; border-left: 5px solid #28a745; margin: 20px 0;">

### 🎓 Examples to Study

OpenEnv includes 3 complete examples:

1. **`src/envs/echo_env/`**
   - Simplest possible environment
   - Great for testing and learning

2. **`src/envs/openspiel_env/`**
   - Wraps external library (OpenSpiel)
   - Shows integration pattern
   - 6 games in one integration

3. **`src/envs/coding_env/`**
   - Python code execution environment
   - Shows complex use case
   - Security considerations

**💡 Study these to understand the patterns!**

</div>

---

<a id="summary"></a>
<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 40px; border-radius: 15px; margin: 40px 0; text-align: center;">

# 🎓 Summary: Your Journey

</div>

---

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 40px; border-radius: 15px; margin: 40px 0; text-align: center;">

# 🎓 Summary: Your Journey

</div>

## What You Learned

<table>
<tr>
<td width="50%" style="vertical-align: top;">

### 📚 Concepts

✅ **RL Fundamentals**
- The observe-act-reward loop
- What makes good policies
- Exploration vs exploitation

✅ **OpenEnv Architecture**
- Client-server separation
- Type-safe contracts
- HTTP communication layer

✅ **Production Patterns**
- Docker isolation
- API design
- Reproducible deployments

</td>
<td width="50%" style="vertical-align: top;">

### 🛠️ Skills

✅ **Using Environments**
- Import OpenEnv clients
- Call reset/step/state
- Work with typed observations

✅ **Building Environments**
- Define type-safe models
- Implement Environment class
- Create HTTPEnvClient

✅ **Testing & Debugging**
- Compare policies
- Visualize episodes
- Measure performance

</td>
</tr>
</table>

## OpenEnv vs Traditional RL

<table>
<tr>
<th>Feature</th>
<th>Traditional (Gym)</th>
<th>OpenEnv</th>
<th>Winner</th>
</tr>
<tr>
<td><b>Type Safety</b></td>
<td>❌ Arrays, dicts</td>
<td>✅ Dataclasses</td>
<td>🏆 OpenEnv</td>
</tr>
<tr>
<td><b>Isolation</b></td>
<td>❌ Same process</td>
<td>✅ Docker</td>
<td>🏆 OpenEnv</td>
</tr>
<tr>
<td><b>Deployment</b></td>
<td>❌ Manual setup</td>
<td>✅ K8s-ready</td>
<td>🏆 OpenEnv</td>
</tr>
<tr>
<td><b>Language</b></td>
<td>❌ Python only</td>
<td>✅ Any (HTTP)</td>
<td>🏆 OpenEnv</td>
</tr>
<tr>
<td><b>Reproducibility</b></td>
<td>❌ "Works on my machine"</td>
<td>✅ Same everywhere</td>
<td>🏆 OpenEnv</td>
</tr>
<tr>
<td><b>Community</b></td>
<td>✅ Large ecosystem</td>
<td>🟡 Growing</td>
<td>🤝 Both!</td>
</tr>
</table>

<div style="background-color: #e7f3ff; padding: 20px; border-radius: 10px; margin: 20px 0;">

**🎯 The Bottom Line**

OpenEnv brings **production engineering** to RL:
- Same environments work locally and in production
- Type safety catches bugs early
- Docker isolation prevents conflicts
- HTTP API works with any language

**It's RL for 2024 and beyond.**

</div>

<a id="resources"></a>
## 📚 Resources

<div style="background-color: #f8f9fa; padding: 20px; border-radius: 10px; margin: 20px 0;">

### 🔗 Essential Links

- **🏠 OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv
- **🎮 OpenSpiel**: https://github.com/google-deepmind/open_spiel
- **⚡ FastAPI Docs**: https://fastapi.tiangolo.com/
- **🐳 Docker Guide**: https://docs.docker.com/get-started/
- **🔥 PyTorch**: https://pytorch.org/

### 📖 Documentation Deep Dives

- **Environment Creation Guide**: `src/envs/README.md`
- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`
- **Example Scripts**: `examples/`
- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)

### 🎓 Community & Support

**Supported by amazing organizations:**
- 🔥 Meta PyTorch
- 🤗 Hugging Face
- ⚡ Unsloth AI
- 🌟 Reflection AI
- 🚀 And many more!

**License**: BSD 3-Clause (very permissive!)

**Contributions**: Always welcome! Check out the issues tab.

</div>

---

### 🌈 What's Next?

1. ⭐ **Star the repo** to show support and stay updated
2. 🔄 **Try modifying** the Catch game (make it harder? bigger grid?)
3. 🎮 **Explore** other OpenSpiel games
4. 🛠️ **Build** your own environment integration
5. 💬 **Share** what you build with the community!

## 📚 Resources

<div style="background-color: #f8f9fa; padding: 20px; border-radius: 10px; margin: 20px 0;">

### 🔗 Essential Links

- **🏠 OpenEnv GitHub**: https://github.com/meta-pytorch/OpenEnv
- **🎮 OpenSpiel**: https://github.com/google-deepmind/open_spiel
- **⚡ FastAPI Docs**: https://fastapi.tiangolo.com/
- **🐳 Docker Guide**: https://docs.docker.com/get-started/
- **🔥 PyTorch**: https://pytorch.org/

### 📖 Documentation Deep Dives

- **Environment Creation Guide**: `src/envs/README.md`
- **OpenSpiel Integration**: `src/envs/openspiel_env/README.md`
- **Example Scripts**: `examples/`
- **RFC 001**: [Baseline API Specs](https://github.com/meta-pytorch/OpenEnv/pull/26)

### 🎓 Community & Support

**Supported by amazing organizations:**
- 🔥 Meta PyTorch
- 🤗 Hugging Face
- ⚡ Unsloth AI
- 🌟 Reflection AI
- 🚀 And many more!

**License**: BSD 3-Clause (very permissive!)

**Contributions**: Always welcome! Check out the issues tab.

</div>

---

### 🌈 What's Next?

1. ⭐ **Star the repo** to show support and stay updated
2. 🔄 **Try modifying** the Catch game (make it harder? bigger grid?)
3. 🎮 **Explore** other OpenSpiel games
4. 🛠️ **Build** your own environment integration
5. 💬 **Share** what you build with the community!

---

<div style="background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); color: white; padding: 50px; border-radius: 20px; margin: 40px 0; text-align: center;">

# 🎉 Congratulations! You Did It! 🎉

### You're now an OpenEnv expert!

<br>

## ✅ What You've Mastered:

**🧠 Concepts**
- How RL works (the observe-act-reward loop)
- Why OpenEnv matters (production-ready RL)
- How to use existing environments

**🛠️ Practical Skills**
- Creating new integrations
- Building type-safe environments
- Deploying to production

**🎯 Real Experience**
- Built a complete RL environment
- Tested multiple policies
- Watched learning happen in real-time!

---

### Now go build something amazing! 🚀

**Welcome to the future of RL with PyTorch & OpenEnv**

<br>

[![Star on GitHub](https://img.shields.io/badge/⭐_Star_on_GitHub-gray?style=for-the-badge)](https://github.com/meta-pytorch/OpenEnv)

</div>

---

<div style="background-color: #f0f7ff; padding: 20px; border-radius: 10px; margin: 20px 0;">

## 🌟 Want to Learn More?

- 📖 Check out the [docs](https://github.com/meta-pytorch/OpenEnv)
- 🎮 Try the other example games
- 💬 Join the community discussions
- 🛠️ Build your own integration
- 🚀 Deploy to production
- ⭐ Star the repo to stay updated!

**Happy coding! 🎊**

</div>