Pokemon Red VLA Agent Demo

A Vision-Language-Action agent demo that plays Pokemon Red using LiquidAI's LFM2-VL-450M model, trained with TRL's GRPO algorithm and optimized via Unsloth.

Features

🎮 Vision-Language Agent: Uses LFM2-VL-450M to analyze game screens and decide actions
🚀 Memory Efficient: 4-bit quantization and LoRA adapters via Unsloth
🏋️ GRPO Training: Group Relative Policy Optimization with custom game rewards
📊 Interactive Demo: Watch the agent play with real-time statistics

Quick Start

Prerequisites

GPU: NVIDIA GPU with 8GB+ VRAM (16GB+ recommended for training)
Environment Server: The Pokemon Red OpenEnv server must be running

Installation

cd Pokemon_Red_OpenEnv/Agent_Demo

# Create virtual environment and install dependencies
uv sync

# Activate environment (optional, uv run handles this)
source .venv/bin/activate

Start the Environment Server

In a separate terminal:

cd Pokemon_Red_OpenEnv/pokemonred_env
uv sync
uv run python -m server.app

The server will start at http://localhost:8000.

Run the Demo

# Run with base model (no training required)
uv run python demo.py --max-steps 100

# Run in headless mode for faster testing
uv run python demo.py --headless --max-steps 50

# Use a trained checkpoint
uv run python demo.py --checkpoint outputs/final

Train the Agent

# Start training
uv run python train.py --max-steps 500

# Use custom config
uv run python train.py --config configs/train_config.yaml

# Resume from checkpoint
uv run python train.py --resume outputs/checkpoint-200

Project Structure

Agent_Demo/
├── agent/
│   ├── __init__.py
│   └── vla_agent.py       # VLA agent with LFM2-VL-450M
├── env/
│   ├── __init__.py
│   └── env_wrapper.py     # Environment wrapper for training
├── training/
│   ├── __init__.py
│   └── trainer.py         # GRPO trainer configuration
├── configs/
│   ├── train_config.yaml  # Training hyperparameters
│   └── demo_config.yaml   # Demo settings
├── demo.py                # Interactive demo script
├── train.py               # Training entry point
├── pyproject.toml         # Dependencies
└── README.md

Configuration

Training Config (`configs/train_config.yaml`)

Parameter	Default	Description
`model_id`	`LiquidAI/LFM2-VL-450M`	HuggingFace model ID
`lora_rank`	`16`	LoRA rank for fine-tuning
`batch_size`	`1`	Per-device batch size
`gradient_accumulation`	`8`	Effective batch = 1 × 8 = 8
`learning_rate`	`5e-6`	Learning rate
`num_generations`	`4`	GRPO completions per prompt
`max_steps`	`1000`	Total training steps

Demo Config (`configs/demo_config.yaml`)

Parameter	Default	Description
`checkpoint`	`null`	Path to trained LoRA checkpoint
`max_steps`	`1000`	Steps to run demo
`temperature`	`0.1`	Sampling temperature (lower = deterministic)
`delay`	`0.1`	Seconds between steps

How It Works

Agent Architecture

Game Screen → LFM2-VL-450M processes the 144×160 pixel Game Boy screen
Context Prompt → Agent receives HP, position, and battle status
Action Prediction → Model outputs one of 7 actions: Down, Left, Right, Up, A, B, Start
Environment Step → Action is executed, reward is returned

GRPO Training

The agent is trained using Group Relative Policy Optimization:

Prompt Generation: Collect game states by playing randomly
Multiple Completions: Generate 4 action predictions per state
Reward Evaluation: Execute each action and get game reward
Policy Update: Optimize model to favor higher-reward actions

Reward Functions

Reward	Weight	Description
Game Reward	Primary	From environment (exploration, badges, levels)
Action Validity	0.1	Bonus for valid action format
Brevity	0.05	Bonus for concise responses

Memory Requirements

Mode	VRAM	Notes
Demo (4-bit)	~4GB	Base model inference
Demo (bf16)	~2GB	Uses LFM2-VL-450M's small size
Training (4-bit + LoRA)	~8-12GB	With gradient checkpointing

Troubleshooting

"Connection refused" error

Make sure the Pokemon Red environment server is running:

cd ../pokemonred_env && uv run python -m server.app

Out of Memory

Reduce batch_size to 1
Reduce num_generations to 2
Ensure load_in_4bit: true in config

Slow training

Enable GSPO: use_gspo: true
Use 8-bit optimizer: optim: adamw_8bit
Enable gradient checkpointing (enabled by default)

License

This demo is part of the Pokemon Red OpenEnv project for The OpenEnv Challenge hackathon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pokemon Red VLA Agent Demo

Features

Quick Start

Prerequisites

Installation

Start the Environment Server

Run the Demo

Train the Agent

Project Structure

Configuration

Training Config (`configs/train_config.yaml`)

Demo Config (`configs/demo_config.yaml`)

How It Works

Agent Architecture

GRPO Training

Reward Functions

Memory Requirements

Troubleshooting

"Connection refused" error

Out of Memory

Slow training

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
agent		agent
configs		configs
env		env
tests		tests
training		training
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Kking112/Agent_Demo

Folders and files

Latest commit

History

Repository files navigation

Pokemon Red VLA Agent Demo

Features

Quick Start

Prerequisites

Installation

Start the Environment Server

Run the Demo

Train the Agent

Project Structure

Configuration

Training Config (configs/train_config.yaml)

Demo Config (configs/demo_config.yaml)

How It Works

Agent Architecture

GRPO Training

Reward Functions

Memory Requirements

Troubleshooting

"Connection refused" error

Out of Memory

Slow training

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Training Config (`configs/train_config.yaml`)

Demo Config (`configs/demo_config.yaml`)

Packages