# Freenove Robot Dog - RL Locomotion Training

Train a PPO policy to walk using **mjlab** (MuJoCo Warp + rsl_rl).

**Requirements**: GPU runtime (A100/V100/T4). Go to Runtime > Change runtime type > GPU.

## Architecture
```
MuJoCo Warp (GPU physics)  -->  mjlab (env managers)  -->  rsl_rl (PPO)  -->  Trained Policy
     |                              |                          |
  freenove_dog.xml          env_cfgs.py (rewards,        rl_cfg.py (network,
  (robot model)            sensors, terminations)        learning rate)
```

## 1. Setup

Clone the project from GitHub.

In [None]:
!rm -rf /content/quadruped
!git clone https://github.com/Solace-Stephane/freenove-quadruped-rl.git /content/quadruped

# Verify the project structure
!ls /content/quadruped/src/freenove_velocity/
!ls /content/quadruped/README.md

In [None]:
# Install uv (fast Python package manager)
!pip install uv

# Verify GPU is available
!nvidia-smi

## 2. Configure Weights & Biases (optional)

W&B tracks training metrics (rewards, losses, episode lengths).
Run ONE of the cells below.

In [None]:
# Option A: Use W&B offline (no account needed, logs saved locally)
!wandb offline

In [None]:
# Option B: Login to W&B for online tracking (enter API key when prompted)
# !wandb login

## 3. Sanity Check: Zero-Action Playback

Watch the robot stand and fall under gravity with zero actions.
This verifies the MJCF model loads correctly.

In [None]:
import subprocess
import sys

process = subprocess.Popen(
  [
    "uv", "run",
    "--project", "/content/quadruped",
    "play",
    "Mjlab-Velocity-Flat-Freenove-Dog",
    "--agent", "zero",
  ],
  stdout=subprocess.PIPE,
  stderr=subprocess.STDOUT,
  universal_newlines=True,
  bufsize=1,
  cwd="/content/quadruped",
)

for line in process.stdout:
  print(line, end="")
  sys.stdout.flush()
  if "serving" in line.lower() or "running on" in line.lower() or "8081" in line:
    print("\n" + "=" * 50)
    print("Server is running! Execute the next cell to view.")
    print("=" * 50)
    break

In [None]:
from google.colab import output
output.serve_kernel_port_as_iframe(8081)

In [None]:
# Stop the playback server before training
process.terminate()
process.wait()
print("Playback stopped.")

## 4. Train the Policy

Train PPO on flat terrain. Key parameters:
- `--env.scene.num-envs`: parallel environments (higher = faster, more VRAM)
- `--agent.max-iterations`: training steps (3000-5000 is usually enough)

**Expected time**: ~30-60 min on A100, ~1-2 hours on T4.

In [None]:
import subprocess
import sys

process = subprocess.Popen(
  [
    "uv", "run",
    "--project", "/content/quadruped",
    "train",
    "Mjlab-Velocity-Flat-Freenove-Dog",
    "--env.scene.num-envs", "4096",
    "--agent.max-iterations", "3000",
  ],
  stdout=subprocess.PIPE,
  stderr=subprocess.STDOUT,
  universal_newlines=True,
  bufsize=1,
  cwd="/content/quadruped",
  env={**__import__('os').environ, "CUDA_VISIBLE_DEVICES": "0"},
)

for line in process.stdout:
  print(line, end="")
  sys.stdout.flush()

process.wait()
print(f"\nTraining finished with return code: {process.returncode}")

## 5. Play the Trained Policy

Visualize the trained policy in the simulator.

In [None]:
import subprocess
import sys
import glob

# Find the latest checkpoint
log_dirs = sorted(glob.glob("/content/quadruped/logs/freenove_dog_velocity/*/"))
if log_dirs:
  latest_run = log_dirs[-1]
  print(f"Latest run: {latest_run}")
else:
  print("No training runs found. Train first!")
  latest_run = None

if latest_run:
  process = subprocess.Popen(
    [
      "uv", "run",
      "--project", "/content/quadruped",
      "play",
      "Mjlab-Velocity-Flat-Freenove-Dog",
      "--log-dir", latest_run,
    ],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    universal_newlines=True,
    bufsize=1,
    cwd="/content/quadruped",
  )

  for line in process.stdout:
    print(line, end="")
    sys.stdout.flush()
    if "serving" in line.lower() or "running on" in line.lower() or "8081" in line:
      print("\n" + "=" * 50)
      print("Server is running! Execute the next cell to view.")
      print("=" * 50)
      break

In [None]:
from google.colab import output
output.serve_kernel_port_as_iframe(8081)

## 6. Export Policy for Deployment

Export the trained actor network to ONNX format for efficient inference
on the Raspberry Pi.

In [None]:
import glob
import os
import torch

# Find the latest checkpoint
log_dirs = sorted(glob.glob("/content/quadruped/logs/freenove_dog_velocity/*/"))
latest_run = log_dirs[-1] if log_dirs else None

if latest_run:
  # Find the model checkpoint
  ckpt_files = sorted(glob.glob(os.path.join(latest_run, "model_*.pt")))
  if not ckpt_files:
    ckpt_files = sorted(glob.glob(os.path.join(latest_run, "**/*.pt"), recursive=True))
  
  if ckpt_files:
    latest_ckpt = ckpt_files[-1]
    print(f"Latest checkpoint: {latest_ckpt}")
    
    # Load checkpoint
    checkpoint = torch.load(latest_ckpt, map_location="cpu", weights_only=False)
    print(f"Checkpoint keys: {list(checkpoint.keys())}")
    
    # Save the actor model state dict separately for deployment
    export_path = "/content/quadruped/deploy/policy_checkpoint.pt"
    torch.save(checkpoint, export_path)
    print(f"\nCheckpoint saved to: {export_path}")
    print(f"File size: {os.path.getsize(export_path) / 1024:.1f} KB")
  else:
    print("No checkpoint files found!")
else:
  print("No training runs found!")

In [None]:
# Download the checkpoint and deploy script
from google.colab import files

# Download the policy checkpoint
files.download("/content/quadruped/deploy/policy_checkpoint.pt")

# Download the deployment script
files.download("/content/quadruped/deploy/deploy.py")

print("\nDownloaded! Copy these to your Raspberry Pi and run:")
print("  scp deploy.py policy_checkpoint.pt sxn@192.168.100.234:~/")
print("  ssh sxn@192.168.100.234 'python3 deploy.py'")