# 01. Environment Setup for OpenVLA

**Goal**: Set up a remote server environment with 4×40GB GPUs for OpenVLA inference and LIBERO simulation.

## Prerequisites
- Remote server with 4× NVIDIA A100 40GB (or similar)
- SSH access to the server
- Basic familiarity with conda/pip environments

## What We'll Cover
1. CUDA and driver verification
2. Conda environment creation
3. Core dependencies installation
4. LIBERO and MuJoCo setup
5. Multi-GPU configuration

---
## Step 1: Verify GPU Setup

First, let's verify your GPU configuration on the remote server.

In [None]:
# Check NVIDIA driver and CUDA version
!nvidia-smi

In [None]:
# Verify we have 4 GPUs with 40GB each
!nvidia-smi --query-gpu=index,name,memory.total --format=csv

Expected output should show 4 GPUs with ~40GB memory each (e.g., A100-40GB, A6000).

---
## Step 2: Create Conda Environment

We'll create a dedicated conda environment for OpenVLA.

In [None]:
# Create and activate conda environment (run in terminal if needed)
# !conda create -n openvla python=3.10 -y
# !conda activate openvla

# Verify Python version
import sys
print(f"Python version: {sys.version}")
assert sys.version_info >= (3, 10), "Python 3.10+ required"

---
## Step 3: Install Core OpenVLA Dependencies

### 3.1 PyTorch with CUDA

In [None]:
# Install PyTorch with CUDA 12.1 support
# Adjust cu121 to match your CUDA version (cu118, cu121, cu124)
!pip install torch==2.2.0 torchvision --index-url https://download.pytorch.org/whl/cu121

In [None]:
# Verify PyTorch and CUDA
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"Number of GPUs: {torch.cuda.device_count()}")

# List all GPUs
for i in range(torch.cuda.device_count()):
    props = torch.cuda.get_device_properties(i)
    print(f"  GPU {i}: {props.name} ({props.total_memory / 1e9:.1f} GB)")

### 3.2 HuggingFace Transformers and Flash Attention

In [None]:
# Install transformers and related packages
!pip install transformers==4.40.1 accelerate timm==0.9.10 tokenizers

In [None]:
# Install Flash Attention 2 for optimized inference
# This significantly speeds up inference (2-3x faster)
!pip install flash-attn==2.5.5 --no-build-isolation

In [None]:
# Verify Flash Attention installation
try:
    import flash_attn
    print(f"Flash Attention version: {flash_attn.__version__}")
except ImportError:
    print("Flash Attention not installed - inference will still work but slower")

### 3.3 Additional Dependencies

In [None]:
# Install remaining dependencies
!pip install pillow numpy scipy matplotlib seaborn
!pip install einops sentencepiece protobuf

---
## Step 4: Install LIBERO and MuJoCo

LIBERO is a simulation benchmark built on robosuite/MuJoCo.

In [None]:
# Install MuJoCo
!pip install mujoco==2.3.7

In [None]:
# Verify MuJoCo installation
import mujoco
print(f"MuJoCo version: {mujoco.__version__}")

In [None]:
# Install robosuite (LIBERO dependency)
!pip install robosuite==1.4.1

In [None]:
# Install LIBERO
!pip install libero

In [None]:
# Verify LIBERO installation
try:
    from libero.libero import benchmark
    print("LIBERO installed successfully!")
    
    # List available task suites
    print("\nAvailable task suites:")
    for suite_name in ["libero_spatial", "libero_object", "libero_goal", "libero_90"]:
        suite = benchmark.get_benchmark_dict(suite_name)
        print(f"  {suite_name}: {suite['n_tasks']} tasks")
except ImportError as e:
    print(f"LIBERO import error: {e}")

### 4.1 For Headless Servers (No Display)

If your remote server has no display, you need to set up virtual display for rendering.

In [None]:
# Install system packages for headless rendering (run as sudo if needed)
# These commands should be run in terminal:
# sudo apt-get update
# sudo apt-get install -y xvfb libgl1-mesa-glx libosmesa6-dev

# For MuJoCo rendering without display
import os
os.environ['MUJOCO_GL'] = 'osmesa'  # or 'egl' for GPU rendering
print(f"MUJOCO_GL set to: {os.environ.get('MUJOCO_GL')}")

In [None]:
# Alternative: Use EGL for GPU-accelerated headless rendering
# os.environ['MUJOCO_GL'] = 'egl'
# os.environ['PYOPENGL_PLATFORM'] = 'egl'

---
## Step 5: Install OpenVLA from Repository

In [None]:
# Navigate to OpenVLA repository root
import os
REPO_ROOT = "/Users/davidpark/Documents/Claude/openvla"  # Adjust to your path
os.chdir(REPO_ROOT)
print(f"Working directory: {os.getcwd()}")

In [None]:
# Install OpenVLA in development mode
!pip install -e .

In [None]:
# Verify OpenVLA installation
from prismatic.models import load_vla
print("OpenVLA package loaded successfully!")

---
## Step 6: Multi-GPU Configuration

With 4×40GB GPUs, we have several options for model loading and inference.

In [None]:
# Option 1: Single GPU (simplest, ~14GB for 7B model with BF16)
GPU_CONFIG = {
    "single_gpu": {
        "device_map": "cuda:0",
        "torch_dtype": "bfloat16",
        "description": "Load entire model on GPU 0"
    },
    
    # Option 2: Model parallelism across GPUs (for larger batch sizes)
    "multi_gpu_parallel": {
        "device_map": "auto",
        "torch_dtype": "bfloat16",
        "description": "Automatically distribute model across available GPUs"
    },
    
    # Option 3: Multiple model instances (for parallel rollouts)
    "multi_instance": {
        "devices": ["cuda:0", "cuda:1", "cuda:2", "cuda:3"],
        "torch_dtype": "bfloat16",
        "description": "One model per GPU for parallel environment rollouts"
    },
    
    # Option 4: Quantized (for memory-constrained scenarios)
    "quantized_8bit": {
        "device_map": "auto",
        "load_in_8bit": True,
        "description": "8-bit quantization for ~50% memory savings"
    }
}

for name, config in GPU_CONFIG.items():
    print(f"{name}:")
    print(f"  {config['description']}")
    print()

In [None]:
# Calculate memory requirements
def estimate_model_memory(num_params_billions, dtype="bfloat16"):
    """Estimate GPU memory needed for model parameters."""
    bytes_per_param = {
        "float32": 4,
        "float16": 2,
        "bfloat16": 2,
        "int8": 1,
        "int4": 0.5
    }
    
    param_memory_gb = num_params_billions * bytes_per_param[dtype]
    # Add ~20% overhead for activations, gradients, etc.
    total_memory_gb = param_memory_gb * 1.2
    
    return param_memory_gb, total_memory_gb

print("OpenVLA-7B Memory Estimates:")
print("="*50)
for dtype in ["float32", "bfloat16", "int8", "int4"]:
    param_mem, total_mem = estimate_model_memory(7, dtype)
    print(f"{dtype:>10}: {param_mem:.1f} GB params, ~{total_mem:.1f} GB total")

---
## Step 7: Download OpenVLA Model

Pre-download the model to avoid timeouts during inference.

In [None]:
# Download model (this may take a while - ~14GB)
from transformers import AutoModelForVision2Seq, AutoProcessor

MODEL_ID = "openvla/openvla-7b"

print(f"Downloading {MODEL_ID}...")
print("This will download ~14GB of model weights. Please be patient.")

In [None]:
# Download processor (tokenizer + image processor)
processor = AutoProcessor.from_pretrained(
    MODEL_ID,
    trust_remote_code=True
)
print("Processor downloaded successfully!")

In [None]:
# Download model (without loading to GPU yet)
# This caches the model for future use
import torch

vla = AutoModelForVision2Seq.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    # Don't load Flash Attention yet - just caching
)

print("Model downloaded and cached!")
print(f"Model parameters: {sum(p.numel() for p in vla.parameters()) / 1e9:.2f}B")

In [None]:
# Clear model from memory (we'll load properly in later notebooks)
del vla
torch.cuda.empty_cache()
print("Model cleared from memory.")

---
## Step 8: Final Environment Check

In [None]:
def check_environment():
    """Comprehensive environment check."""
    import sys
    import torch
    
    results = {}
    
    # Python
    results["Python"] = sys.version.split()[0]
    
    # PyTorch
    results["PyTorch"] = torch.__version__
    results["CUDA Available"] = torch.cuda.is_available()
    results["GPU Count"] = torch.cuda.device_count()
    
    # Transformers
    import transformers
    results["Transformers"] = transformers.__version__
    
    # TIMM
    import timm
    results["TIMM"] = timm.__version__
    
    # Flash Attention
    try:
        import flash_attn
        results["Flash Attention"] = flash_attn.__version__
    except ImportError:
        results["Flash Attention"] = "Not installed"
    
    # MuJoCo
    try:
        import mujoco
        results["MuJoCo"] = mujoco.__version__
    except ImportError:
        results["MuJoCo"] = "Not installed"
    
    # LIBERO
    try:
        import libero
        results["LIBERO"] = "Installed"
    except ImportError:
        results["LIBERO"] = "Not installed"
    
    # OpenVLA
    try:
        from prismatic.models import load_vla
        results["OpenVLA"] = "Installed"
    except ImportError:
        results["OpenVLA"] = "Not installed"
    
    return results

# Run check
print("Environment Status")
print("="*50)
for key, value in check_environment().items():
    status = "✅" if value not in ["Not installed", False, 0] else "❌"
    print(f"{status} {key}: {value}")

---
## Summary

You now have a complete environment for:
- Running OpenVLA-7B inference on 4×40GB GPUs
- Using Flash Attention for optimized inference
- LIBERO simulation for evaluation

### Next Steps
→ Continue to **02_architecture_overview.ipynb** to understand OpenVLA's model architecture.

### Troubleshooting

| Issue | Solution |
|-------|----------|
| CUDA out of memory | Use `load_in_8bit=True` or single GPU |
| Flash Attention build fails | Install from source: `pip install flash-attn --no-build-isolation` |
| MuJoCo rendering fails | Set `MUJOCO_GL=osmesa` or install EGL |
| LIBERO import error | Install with `pip install libero` |
| Model download timeout | Use `huggingface-cli download openvla/openvla-7b` |