# OpenEnv: Production-Ready RL Environments

**Learn how OpenEnv standardizes RL environments for production use**

---

## What You'll Learn

This notebook teaches you:

1. **RL Fundamentals** - The core loop in 5 minutes
2. **OpenEnv Framework** - Why we built it and how it works
3. **Using Integrations** - Work with existing environments (OpenSpiel example)
4. **Interactive Demo** - See policies in action
5. **Adding Integrations** - Wrap your own environments

---

## Part 1: RL Fundamentals - The Core Loop

Reinforcement Learning boils down to a simple loop:

```
Agent observes → chooses action → gets reward → repeat
```

Let's see it:

In [None]:
import random

# Simple RL: Guess a number
target = random.randint(1, 10)
guesses = 3

print("🎯 Guess a number (1-10)\n")

while guesses > 0:
    guess = random.randint(1, 10)  # Policy: random
    guesses -= 1
    
    print(f"Guess: {guess}", end=" → ")
    
    if guess == target:
        print("🎉 Correct! Reward: +1")
        break
    elif abs(guess - target) <= 2:
        print("🔥 Warm")
    else:
        print("❄️ Cold")
else:
    print(f"\nIt was {target}. Reward: 0")

print("\n💡 That's RL: observe → act → reward → repeat")

**The Problem**: How do we make this production-ready?
- Need type safety
- Need isolation
- Need deployment
- Need standardization

**Enter OpenEnv.**

---

## Part 2: OpenEnv - The Framework

### What is OpenEnv?

OpenEnv is a **framework for creating, deploying, and using isolated RL environments**.

Think "Docker for RL environments" with:
- ✅ Standardized API (reset, step, state)
- ✅ Type-safe dataclasses
- ✅ Docker isolation
- ✅ HTTP communication (language-agnostic)
- ✅ Production-ready deployment

### The Architecture

```
┌────────────────────────────────────┐
│  Your Training Code                │  Python, Rust, Julia...
│                                    │
│  env = SomeEnv(...)                │  ← Import OpenEnv client
│  result = env.reset()              │  ← Type-safe!
│  result = env.step(action)         │  ← Type-safe!
└──────────┬─────────────────────────┘
           │
           │ HTTP/JSON
           │
┌──────────▼─────────────────────────┐
│  Docker Container                  │
│                                    │
│  FastAPI Server                    │
│  └─ Environment Logic              │
│     └─ Your game/simulation        │
└────────────────────────────────────┘
```

### The Pattern - Every Environment Has:

```
src/envs/your_env/
├── models.py         ← Type-safe contracts (Action, Observation, State)
├── client.py         ← Client API (what you import)
└── server/
    ├── environment.py ← Environment logic
    ├── app.py         ← FastAPI server
    └── Dockerfile     ← Container
```

### Current Integrations

OpenEnv already integrates several environments:
- **OpenSpiel** (6 games from DeepMind)
- **Echo** (test environment)
- **Coding** (Python code execution)
- **Atari** (classic games)
- More coming!

Let's explore one integration to see how it all works...

---

## Part 3: Setup

In [None]:
# Check if in Colab
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    !git clone https://github.com/meta-pytorch/OpenEnv.git
    %cd OpenEnv
    !pip install -q fastapi uvicorn requests
    import sys
    sys.path.insert(0, './src')
    print("✅ OpenEnv ready!")
else:
    import sys
    from pathlib import Path
    sys.path.insert(0, str(Path.cwd() / 'src'))
    print("✅ Using local OpenEnv")

---

## Part 4: Exploring OpenEnv's Structure

Let's look at the actual OpenEnv code to understand how it works.

### The Base Classes

In [None]:
from core.env_server import Environment, Action, Observation, State
from core.http_env_client import HTTPEnvClient

print("=" * 70)
print("OpenEnv Core Abstractions")
print("=" * 70)

print("""
SERVER SIDE (runs in Docker):

  class Environment(ABC):
      '''Base class for all environment implementations'''
      
      @abstractmethod
      def reset(self) -> Observation:
          '''Start new episode'''
      
      @abstractmethod
      def step(self, action: Action) -> Observation:
          '''Execute action'''
      
      @property
      def state(self) -> State:
          '''Episode metadata'''

CLIENT SIDE (your training code):

  class HTTPEnvClient(ABC):
      '''Base class for HTTP clients'''
      
      def reset(self) -> StepResult:
          # HTTP POST to /reset
      
      def step(self, action) -> StepResult:
          # HTTP POST to /step
      
      def state(self) -> State:
          # HTTP GET to /state
""")

print("=" * 70)
print("💡 Same interface, communication via HTTP")
print("=" * 70)

---

## Part 5: Example Integration - OpenSpiel

### What is OpenSpiel?

OpenSpiel is a **library from DeepMind** with 70+ game environments for RL research.

### Our Integration

**OpenEnv wraps 6 OpenSpiel games** following our standard pattern:

1. **Catch** - Catch falling ball (single-player)
2. **Tic-Tac-Toe** - Classic 3×3 (2-player)
3. **Kuhn Poker** - Imperfect info poker (2-player)
4. **Cliff Walking** - Grid navigation (single-player)
5. **2048** - Tile puzzle (single-player)
6. **Blackjack** - Card game (single-player)

Let's see how the integration is structured:

In [None]:
# Import the OpenSpiel integration models
from envs.openspiel_env.models import (
    OpenSpielAction,
    OpenSpielObservation,
    OpenSpielState
)
from dataclasses import fields

print("=" * 70)
print("OpenSpiel Integration - Type-Safe Models")
print("=" * 70)

print("\n📤 OpenSpielAction (what you send):")
for field in fields(OpenSpielAction):
    print(f"   • {field.name}: {field.type}")

print("\n📥 OpenSpielObservation (what you receive):")
for field in fields(OpenSpielObservation):
    print(f"   • {field.name}: {field.type}")

print("\n📊 OpenSpielState (episode metadata):")
for field in fields(OpenSpielState):
    print(f"   • {field.name}: {field.type}")

print("\n" + "=" * 70)
print("💡 This is how OpenEnv integrates external libraries:")
print("   1. Wrap in standardized types")
print("   2. Expose via HTTPEnvClient")
print("   3. Package in Docker")
print("=" * 70)

### How the Client Works

In [None]:
from envs.openspiel_env.client import OpenSpielEnv

print("=" * 70)
print("OpenSpielEnv Client (HTTPEnvClient Implementation)")
print("=" * 70)

print("""
How OpenEnv wraps OpenSpiel:

class OpenSpielEnv(HTTPEnvClient[OpenSpielAction, OpenSpielObservation]):
    
    def _step_payload(self, action: OpenSpielAction) -> dict:
        '''Convert action to JSON for HTTP request'''
        return {
            "action_id": action.action_id,
            "game_name": action.game_name,
        }
    
    def _parse_result(self, payload: dict) -> StepResult:
        '''Parse HTTP response into typed observation'''
        return StepResult(
            observation=OpenSpielObservation(...),
            reward=payload['reward'],
            done=payload['done']
        )

Usage (same for ALL OpenEnv environments):

  env = OpenSpielEnv(base_url="http://localhost:8000")
  result = env.reset()  # Returns StepResult[OpenSpielObservation]
  result = env.step(OpenSpielAction(action_id=2, game_name="catch"))
  state = env.state()   # Returns OpenSpielState
""")

print("=" * 70)
print("💡 This pattern works for ANY environment you want to wrap!")
print("=" * 70)

---

## Part 6: Interactive Demo - See It In Action

Let's build a **Catch game** environment following OpenEnv's pattern.

This shows you:
- How to structure an environment
- How the RL loop works
- How different policies perform

### The Game:
- 5×5 grid, ball falls from top 🔴
- Control paddle at bottom 🏓
- **Actions**: 0=LEFT, 1=STAY, 2=RIGHT
- **Reward**: +1 caught, 0 missed

In [None]:
import random
from dataclasses import dataclass
from typing import List, Tuple

# Define types (following OpenEnv pattern)
@dataclass
class CatchObservation:
    """Type-safe observation."""
    info_state: List[float]
    legal_actions: List[int]
    done: bool
    reward: float
    # For visualization
    ball_position: Tuple[int, int]
    paddle_position: int


class CatchEnvironment:
    """
    Catch game following OpenEnv Environment pattern.
    
    In production: This would run in Docker, accessed via HTTPEnvClient
    For demo: We run it locally to see the internals
    """
    
    def __init__(self, grid_size=5):
        self.grid_size = grid_size
    
    def reset(self) -> CatchObservation:
        """Start new episode (implements Environment.reset())."""
        self.ball_row = 0
        self.ball_col = random.randint(0, self.grid_size - 1)
        self.paddle_col = self.grid_size // 2
        self.done = False
        return self._make_observation()
    
    def step(self, action: int) -> CatchObservation:
        """Execute action (implements Environment.step())."""
        # Move paddle
        if action == 0 and self.paddle_col > 0:
            self.paddle_col -= 1
        elif action == 2 and self.paddle_col < self.grid_size - 1:
            self.paddle_col += 1
        
        # Move ball
        self.ball_row += 1
        
        # Check done
        if self.ball_row >= self.grid_size - 1:
            self.done = True
            reward = 1.0 if self.ball_col == self.paddle_col else 0.0
        else:
            reward = 0.0
        
        return self._make_observation(reward)
    
    def _make_observation(self, reward=0.0) -> CatchObservation:
        """Create type-safe observation."""
        info_state = [0.0] * (self.grid_size * self.grid_size)
        ball_idx = self.ball_row * self.grid_size + self.ball_col
        paddle_idx = (self.grid_size - 1) * self.grid_size + self.paddle_col
        info_state[ball_idx] = 1.0
        info_state[paddle_idx] = 0.5
        
        return CatchObservation(
            info_state=info_state,
            legal_actions=[0, 1, 2],
            done=self.done,
            reward=reward,
            ball_position=(self.ball_row, self.ball_col),
            paddle_position=self.paddle_col
        )
    
    def render(self):
        """Visualize."""
        for row in range(self.grid_size):
            line = "  "
            for col in range(self.grid_size):
                if row == self.ball_row and col == self.ball_col:
                    line += "🔴 "
                elif row == self.grid_size - 1 and col == self.paddle_col:
                    line += "🏓 "
                else:
                    line += "⬜ "
            print(line)


print("✅ Environment created following OpenEnv pattern!")
print("\n   Implements: reset(), step()")
print("   Returns: Type-safe observations")
print("   In production: Would run in Docker + FastAPI")

### Test It

In [None]:
env = CatchEnvironment()
obs = env.reset()

print("Initial State:")
print("=" * 50)
env.render()
print(f"\nBall: column {obs.ball_position[1]}")
print(f"Paddle: column {obs.paddle_position}")
print(f"Legal actions: {obs.legal_actions} (0=LEFT, 1=STAY, 2=RIGHT)")

---

## Part 7: Different Policies

A policy maps observations → actions. Let's test 4 strategies:

In [None]:
class RandomPolicy:
    name = "Random"
    def select_action(self, obs): 
        return random.choice(obs.legal_actions)

class AlwaysStayPolicy:
    name = "Always Stay"
    def select_action(self, obs): 
        return 1

class SmartPolicy:
    name = "Smart Heuristic"
    def select_action(self, obs):
        ball_col = obs.ball_position[1]
        paddle_col = obs.paddle_position
        if paddle_col < ball_col: return 2  # RIGHT
        elif paddle_col > ball_col: return 0  # LEFT
        else: return 1  # STAY

class LearningPolicy:
    name = "Learning Agent"
    def __init__(self):
        self.steps = 0
    
    def select_action(self, obs):
        self.steps += 1
        epsilon = max(0.1, 1.0 - (self.steps / 100))
        
        if random.random() < epsilon:  # Explore
            return random.choice(obs.legal_actions)
        else:  # Exploit
            ball_col = obs.ball_position[1]
            paddle_col = obs.paddle_position
            if paddle_col < ball_col: return 2
            elif paddle_col > ball_col: return 0
            else: return 1

print("✅ 4 Policies created:")
print("   1. Random - Baseline")
print("   2. Always Stay - Bad strategy")
print("   3. Smart - Optimal heuristic")
print("   4. Learning - Simulated RL")

### Watch Them Play

In [None]:
import time

def run_episode(env, policy, visualize=True, delay=0.4):
    obs = env.reset()
    
    if visualize:
        print(f"\n{'='*50}")
        print(f"Policy: {policy.name} | Ball: col {obs.ball_position[1]}")
        print('='*50 + '\n')
        env.render()
        time.sleep(delay)
    
    total_reward = 0
    step = 0
    
    while not obs.done:
        action = policy.select_action(obs)
        obs = env.step(action)
        total_reward += obs.reward
        
        if visualize:
            print(f"\nStep {step + 1}: {['LEFT','STAY','RIGHT'][action]}")
            env.render()
            time.sleep(delay)
        
        step += 1
    
    if visualize:
        print(f"\n{'🎉 CAUGHT!' if total_reward > 0 else '😢 MISSED'} Reward: {total_reward}")
    
    return total_reward > 0

# Demo
env = CatchEnvironment()
run_episode(env, SmartPolicy(), visualize=True, delay=0.3)

### Compare All Policies

In [None]:
def evaluate_policies(num_episodes=50):
    policies = [RandomPolicy(), AlwaysStayPolicy(), SmartPolicy(), LearningPolicy()]
    
    print("\n" + "="*70)
    print(f"🏆 POLICY COMPARISON ({num_episodes} episodes)")
    print("="*70 + "\n")
    
    results = []
    for policy in policies:
        env = CatchEnvironment()
        successes = sum(run_episode(env, policy, visualize=False) 
                       for _ in range(num_episodes))
        rate = (successes / num_episodes) * 100
        results.append((policy.name, rate))
        print(f"{policy.name:20s}: {rate:5.1f}%")
    
    print("\n" + "="*70)
    results.sort(key=lambda x: x[1], reverse=True)
    for name, rate in results:
        bar = "█" * int(rate / 2)
        print(f"{name:20s} [{bar:<50}] {rate:.1f}%")
    
    print("\n" + "="*70)
    print("💡 RL in action: Random → Learning → Optimal")
    print("="*70)

evaluate_policies(50)

---

## Part 8: Using Real OpenSpiel Integration

What we just built **is how OpenEnv works**!

### Demo vs Production:

| Component | Our Demo | OpenEnv + OpenSpiel |
|-----------|----------|---------------------|
| Environment | Local class | Docker container |
| Communication | Direct | HTTP |
| Client | Direct | HTTPEnvClient |
| Type Safety | ✅ | ✅ |
| API | reset/step | reset/step |

### Using OpenSpiel Integration:

```python
# Install OpenSpiel
!pip install open_spiel

# Import OpenEnv's integration
from envs.openspiel_env import OpenSpielEnv, OpenSpielAction

# Connect to server
env = OpenSpielEnv(base_url="http://localhost:8000")

# Same API!
result = env.reset()
result = env.step(OpenSpielAction(action_id=2, game_name="catch"))
state = env.state()
```

### Available Games:
1. Catch (what we demoed!)
2. Tic-Tac-Toe
3. Kuhn Poker
4. Cliff Walking
5. 2048
6. Blackjack

---

## Part 9: Adding Your Own Integration

Want to wrap your own environment? Follow the pattern:

### 1. Define Types (models.py)
```python
@dataclass
class YourAction(Action):
    # Your action fields

@dataclass
class YourObservation(Observation):
    # Your observation fields
```

### 2. Implement Environment (server/environment.py)
```python
class YourEnvironment(Environment):
    def reset(self) -> Observation:
        return YourObservation(...)
    
    def step(self, action: Action) -> Observation:
        return YourObservation(...)
```

### 3. Create Client (client.py)
```python
class YourEnv(HTTPEnvClient[YourAction, YourObservation]):
    def _step_payload(self, action):
        return {"field": action.field}
    
    def _parse_result(self, payload):
        return StepResult(observation=YourObservation(...))
```

### 4. Create Server (server/app.py)
```python
from core.env_server import create_fastapi_app

env = YourEnvironment()
app = create_fastapi_app(env)
```

### 5. Dockerize (server/Dockerfile)
```dockerfile
FROM python:3.11
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["uvicorn", "app:app", "--host", "0.0.0.0"]
```

### Examples to Study:
- `src/envs/echo_env/` - Simple test environment
- `src/envs/openspiel_env/` - Our OpenSpiel integration
- `src/envs/coding_env/` - Python code execution

---

## Summary

### What You Learned:

1. **RL Basics** - The core loop
2. **OpenEnv Framework** - Standardized, production-ready RL environments
3. **Example Integration** - How OpenSpiel is wrapped
4. **Interactive Demo** - Policies in action
5. **Adding Integrations** - The pattern to follow

### OpenEnv's Value:

| Feature | Traditional | OpenEnv |
|---------|------------|----------|
| Type Safety | ❌ | ✅ |
| Isolation | ❌ | ✅ Docker |
| Deployment | ❌ | ✅ K8s-ready |
| Language | Python only | Any (HTTP) |
| Reproducibility | ❌ | ✅ |

### Next Steps:

1. Try OpenSpiel integration
2. Implement real RL (Q-learning, DQN, PPO)
3. Wrap your own environments
4. Deploy to production
5. Use with RL libraries (TorchRL, etc.)

### Resources:

- **OpenEnv**: https://github.com/meta-pytorch/OpenEnv
- **Docs**: `src/envs/README.md`
- **Examples**: `examples/` directory

---

## 🎉 You're Ready!

You now understand:
- ✅ OpenEnv framework
- ✅ How integrations work
- ✅ Using existing environments
- ✅ Creating new integrations
- ✅ Production deployment

**Welcome to production-ready RL!** 🚀