# RL in Robotics Simulation

This notebook demonstrates how to use physics-based robotics simulators (PyBullet) with reinforcement learning. It covers:
- setting up a PyBullet environment,
- running a random agent to interact with the simulator,
- collecting trajectories,
- a short example of plugging in an RL library (optional),
- practical notes and tips for robotics experiments.

This notebook focuses on **practical, reproducible workflows** for learning and prototyping RL algorithms in robotics simulations.

## 🔧 Install (if needed)

Uncomment and run the following if your environment doesn't already have the packages. Installing PyBullet and the PyBullet Gym wrappers is usually enough for many robotics examples.

If you prefer using `stable-baselines3` for quick experiments, uncomment its install too (but it is optional and larger).

In [None]:
# !pip install pybullet pybullet_envs gymnasium --quiet
# Optional: a lightweight RL library for quick experiments
# !pip install stable-baselines3[extra] --quiet

## 🧩 Imports and helper utilities

In [None]:
import time
import numpy as np
import gym
import pybullet_envs  # registers PyBullet environments with Gym
from collections import deque

def run_random_episode(env, render=False, max_steps=1000):
    obs, _ = env.reset()
    total_reward = 0.0
    steps = 0
    done = False
    while not done and steps < max_steps:
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        total_reward += reward
        steps += 1
        if render:
            env.render()
            # small sleep for human-visualization
            time.sleep(1 / 120)
    return total_reward, steps

## ✅ Create and inspect a PyBullet robotics environment

Common lightweight robotics / continuous-control environments provided by `pybullet_envs` include:
- `InvertedPendulumBulletEnv-v0` (simple control)
- `InvertedDoublePendulumBulletEnv-v0`
- `AntBulletEnv-v0` (quadruped-like agent)
- `KukaBulletEnv-v0` (robot arm scenarios)

Pick one that matches your experiment complexity; start simple and scale up.

In [None]:
# Example: Inverted Pendulum (fast to run and observe)
env_id = 'InvertedPendulumBulletEnv-v0'  # change to AntBulletEnv-v0 or KukaBulletEnv-v0 for harder tasks
env = gym.make(env_id)
print('Environment:', env_id)
print('Observation space:', env.observation_space)
print('Action space:', env.action_space)

## ▶️ Run a few random episodes to sanity-check the simulator

In [None]:
rewards = []
for i in range(3):
    r, steps = run_random_episode(env, render=False, max_steps=1000)
    rewards.append(r)
    print(f'Episode {i+1}: reward={r:.2f}, steps={steps}')
print('Random policy mean reward:', np.mean(rewards))

## 📦 Collect trajectories (example) — store transitions for offline learning or analysis

This snippet collects a fixed number of episodes and stores `(s, a, r, s', done)` tuples in memory. Useful for debugging, curriculum learning or offline RL experiments.

In [None]:
def collect_episodes(env, policy_fn, n_episodes=5, max_steps=1000):
    buffer = []
    for ep in range(n_episodes):
        obs, _ = env.reset()
        done = False
        steps = 0
        while not done and steps < max_steps:
            action = policy_fn(obs)
            next_obs, reward, terminated, truncated, info = env.step(action)
            done = terminated or truncated
            buffer.append((obs, action, reward, next_obs, done))
            obs = next_obs
            steps += 1
    return buffer

# Example: collect 3 episodes using a random policy
random_policy = lambda s: env.action_space.sample()
dataset = collect_episodes(env, random_policy, n_episodes=3)
print(f'Collected {len(dataset)} transitions (sample)')

## ▶️ Quick example: plug-in `stable-baselines3` (optional)

If you want to run a ready-made RL algorithm quickly, `stable-baselines3` provides clean implementations (PPO, SAC, TD3, DQN). The snippet below is optional — uncomment/install if you have the package and want a fast baseline.

⚠️ Note: installing `stable-baselines3[extra]` can be large and may require additional system packages for some environments.

In [None]:
# Optional: run a quick PPO baseline with stable-baselines3 (uncomment to run)
# from stable_baselines3 import PPO
# model = PPO('MlpPolicy', env, verbose=1)
# model.learn(total_timesteps=20000)
# print('PPO baseline trained (20k steps)')
# # Evaluate
# mean_reward = 0.0
# for _ in range(5):
#     obs, _ = env.reset()
#     done = False
#     ep_r = 0
#     while not done:
#         action, _ = model.predict(obs, deterministic=True)
#         obs, reward, terminated, truncated, info = env.step(action)
#         done = terminated or truncated
#         ep_r += reward
#     mean_reward += ep_r
# print('Mean eval reward (5 eps):', mean_reward / 5)

## 🛠 Practical tips for robotics simulation experiments

- **Start simple:** test algorithms on `InvertedPendulumBulletEnv-v0` before moving to `AntBulletEnv-v0` or KUKA arm tasks.
- **Deterministic seeds:** set seeds for NumPy, Python `random`, and the environment to reproduce results.
- **Rendering:** run rendering only during evaluation. Rendering slows training significantly.
- **Frame-rate & physics steps:** tune physics timestep and substeps if the simulator supports it — affects fidelity and stability.
- **Domain randomization:** randomize simulation parameters (mass, friction, delays) to produce more robust policies for sim-to-real transfer.
- **Observation design:** include velocities/angles/forces that are available in real sensors if planning sim-to-real transfer.
- **Use lightweight wrappers:** to normalize observations, clip actions, and scale rewards consistently across algorithms.
- **Monitor CPU/GPU:** complex simulators and parallel environments can be CPU-bound; profile to identify bottlenecks.
- **Consider using vectorized envs** (many parallel sims) to increase sample throughput — packages like `stable-baselines3` accept vectorized environments.

## ✅ Summary

- This notebook showed how to start RL experiments in robotics simulators using **PyBullet**.
- You learned how to create environments, run random agents, collect transitions, and (optionally) plug into `stable-baselines3` for quick baselines.
- Practical tips were provided to help scale experiments and prepare for sim-to-real transfer.

If you'd like, I can:
1. provide a **complete DDPG / TD3 / SAC** example tuned for a PyBullet robot (e.g., `AntBulletEnv-v0`),
2. generate utility wrappers (normalization, vectorized envs) for faster training,
3. or produce a **README + experiment template** (Dockerfile, requirements, run scripts) for reproducible robotics RL experiments.

Which would you like next?