Optimized Pixel Art Learner - A VQ-VAE system for pixel art generation and photo-to-sprite conversion.
A Vector Quantized Variational Autoencoder designed specifically for pixel art. Unlike traditional VQ-VAEs that downsample spatially (losing pixel-level detail), OPAL's DirectVQVAE operates per-pixel to preserve the crisp edges and limited palettes that define pixel art.
Built entirely on a custom GPU tensor library - no PyTorch, no TensorFlow.
- Direct VQ-VAE: No spatial downsampling. 64x64 input = 64x64 latent = 4096 code positions
- Photo-to-Sprite: Convert photographs to pixel art sprites via VQ-VAE + adaptive HOG downsampling
- PixelCNN Prior: Autoregressive generation of novel sprites
- Custom CUDA Backend: Hand-written tensor operations via CuPy
- Memory Efficient: Runs on consumer GPUs (8GB VRAM)
Photo-to-Sprite Pipeline:
Input Photo
|
VQ-VAE Pixelization (preserves structure)
|
HOG Adaptive Downsampling (preserves edges)
|
Post-processing (palette, dithering)
|
Output Sprite
Training Pipeline:
Sprite Dataset --> Encoder --> Vector Quantizer --> Decoder --> Reconstruction
|
PixelCNN Prior (learns distribution)
- Python 3.8+
- CuPy (CUDA 11.x or 12.x)
- NumPy
- Pillow
pip install cupy-cuda11x numpy pillow# Train VQ-VAE on sprite dataset
python cli.py train --config small --sprites ./sprites
# Generate new sprites from trained model
python cli.py generate --checkpoint ./checkpoints/best.pkl --num 16
# Convert photo to pixel art sprite
python cli.py pixelize --input photo.jpg --output sprite.png --size 64
# Full photo-to-sprite interpretation
python cli.py sprite --input photo.jpg --output sprite.pngfrom direct_vqvae import DirectVQVAE
from interpret import photo_to_sprite
# Load trained model
model = DirectVQVAE.load('checkpoints/best.pkl')
# Convert photo to sprite
sprite = photo_to_sprite('photo.jpg', model, target_size=64)
sprite.save('output.png')| File | Purpose |
|---|---|
tensor_gpu.py |
Custom autograd tensor library with CUDA support |
direct_vqvae.py |
VQ-VAE with per-pixel quantization (no spatial downsampling) |
vq_layer.py |
Vector quantization with EMA codebook updates |
pixelcnn_prior.py |
Autoregressive prior for sprite generation |
interpret.py |
Photo-to-sprite pipeline |
pixelize.py |
Image pixelization with tiled processing |
hog_gpu.py |
GPU-accelerated HOG for adaptive downsampling |
postprocess_pixelart.py |
Color quantization, dithering, palette matching |
cli.py |
Command-line interface |
Traditional VQ-VAEs downsample spatially:
- 64x64 → 16x16 latent → 256 code positions (lossy)
OPAL's DirectVQVAE maintains resolution:
- 64x64 → 64x64 latent → 4096 code positions (lossless spatial)
The codebook learns common RGB patterns rather than spatial patches, preserving the pixel-perfect detail that defines pixel art.
- VQ-VAE Pixelization: The trained encoder maps continuous colors to discrete codebook entries, naturally quantizing the image
- HOG Adaptive Downsampling: Histogram of Oriented Gradients identifies edge regions; downsampling preserves gradients
- Post-processing: Color palette mapping, optional dithering, edge sharpening
MIT