Skip to content

Kking112/Agent_Demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pokemon Red VLA Agent Demo

A Vision-Language-Action agent demo that plays Pokemon Red using LiquidAI's LFM2-VL-450M model, trained with TRL's GRPO algorithm and optimized via Unsloth.

Features

  • 🎮 Vision-Language Agent: Uses LFM2-VL-450M to analyze game screens and decide actions
  • 🚀 Memory Efficient: 4-bit quantization and LoRA adapters via Unsloth
  • 🏋️ GRPO Training: Group Relative Policy Optimization with custom game rewards
  • 📊 Interactive Demo: Watch the agent play with real-time statistics

Quick Start

Prerequisites

  1. GPU: NVIDIA GPU with 8GB+ VRAM (16GB+ recommended for training)
  2. Environment Server: The Pokemon Red OpenEnv server must be running

Installation

cd Pokemon_Red_OpenEnv/Agent_Demo

# Create virtual environment and install dependencies
uv sync

# Activate environment (optional, uv run handles this)
source .venv/bin/activate

Start the Environment Server

In a separate terminal:

cd Pokemon_Red_OpenEnv/pokemonred_env
uv sync
uv run python -m server.app

The server will start at http://localhost:8000.

Run the Demo

# Run with base model (no training required)
uv run python demo.py --max-steps 100

# Run in headless mode for faster testing
uv run python demo.py --headless --max-steps 50

# Use a trained checkpoint
uv run python demo.py --checkpoint outputs/final

Train the Agent

# Start training
uv run python train.py --max-steps 500

# Use custom config
uv run python train.py --config configs/train_config.yaml

# Resume from checkpoint
uv run python train.py --resume outputs/checkpoint-200

Project Structure

Agent_Demo/
├── agent/
│   ├── __init__.py
│   └── vla_agent.py       # VLA agent with LFM2-VL-450M
├── env/
│   ├── __init__.py
│   └── env_wrapper.py     # Environment wrapper for training
├── training/
│   ├── __init__.py
│   └── trainer.py         # GRPO trainer configuration
├── configs/
│   ├── train_config.yaml  # Training hyperparameters
│   └── demo_config.yaml   # Demo settings
├── demo.py                # Interactive demo script
├── train.py               # Training entry point
├── pyproject.toml         # Dependencies
└── README.md

Configuration

Training Config (configs/train_config.yaml)

Parameter Default Description
model_id LiquidAI/LFM2-VL-450M HuggingFace model ID
lora_rank 16 LoRA rank for fine-tuning
batch_size 1 Per-device batch size
gradient_accumulation 8 Effective batch = 1 × 8 = 8
learning_rate 5e-6 Learning rate
num_generations 4 GRPO completions per prompt
max_steps 1000 Total training steps

Demo Config (configs/demo_config.yaml)

Parameter Default Description
checkpoint null Path to trained LoRA checkpoint
max_steps 1000 Steps to run demo
temperature 0.1 Sampling temperature (lower = deterministic)
delay 0.1 Seconds between steps

How It Works

Agent Architecture

  1. Game Screen → LFM2-VL-450M processes the 144×160 pixel Game Boy screen
  2. Context Prompt → Agent receives HP, position, and battle status
  3. Action Prediction → Model outputs one of 7 actions: Down, Left, Right, Up, A, B, Start
  4. Environment Step → Action is executed, reward is returned

GRPO Training

The agent is trained using Group Relative Policy Optimization:

  1. Prompt Generation: Collect game states by playing randomly
  2. Multiple Completions: Generate 4 action predictions per state
  3. Reward Evaluation: Execute each action and get game reward
  4. Policy Update: Optimize model to favor higher-reward actions

Reward Functions

Reward Weight Description
Game Reward Primary From environment (exploration, badges, levels)
Action Validity 0.1 Bonus for valid action format
Brevity 0.05 Bonus for concise responses

Memory Requirements

Mode VRAM Notes
Demo (4-bit) ~4GB Base model inference
Demo (bf16) ~2GB Uses LFM2-VL-450M's small size
Training (4-bit + LoRA) ~8-12GB With gradient checkpointing

Troubleshooting

"Connection refused" error

Make sure the Pokemon Red environment server is running:

cd ../pokemonred_env && uv run python -m server.app

Out of Memory

  • Reduce batch_size to 1
  • Reduce num_generations to 2
  • Ensure load_in_4bit: true in config

Slow training

  • Enable GSPO: use_gspo: true
  • Use 8-bit optimizer: optim: adamw_8bit
  • Enable gradient checkpointing (enabled by default)

License

This demo is part of the Pokemon Red OpenEnv project for The OpenEnv Challenge hackathon.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages