feat: VLM primitives — vision ops, NN layers, encoder models by 0xDaizz · Pull Request #96 · 0xDaizz/RMLX

0xDaizz · 2026-03-17T01:15:30Z

Summary

Add VLM (Vision-Language Model) primitive operations: interpolate (bilinear/bicubic), pixel_normalize, pool2d with Metal kernel registration and buffer slot definitions
Add vision NN layers: PatchEmbedding, VisionPositionalEmbedding, VisionTransformerBlock, MultiModalProjector, and cache-free attention forward pass
Add vision encoder model configs: ViT-Base/16, ViT-Large/14, SigLIP SO400M/14, CLIP ViT-L/14-336 with HF safetensors weight mapping

Test plan

cargo check --workspace passes
cargo fmt --all --check passes
cargo clippy --workspace --all-targets passes (fixed large_enum_variant warning)
Integration test with actual vision encoder weights (future PR)

🤖 Generated with Claude Code

Core ops (rmlx-core): - interpolate: bilinear/bicubic Metal kernels, align_corners support - pixel_normalize: per-channel (pixel-mean)/std, ImageNet defaults - pool2d: avg/max pooling with configurable kernel/stride/padding NN layers (rmlx-nn): - PatchEmbedding: Conv2d + reshape for image→patch sequence - VisionPositionalEmbedding: learned + grid2D - VisionTransformerBlock: pre-norm bidirectional attention - MultiModalProjector: Linear or MLP (fc1→GELU→fc2) - Attention::forward_no_cache() for vision (no RoPE, no mask, no KV cache) Vision encoder models: - ViT-Base/16 (768h, 12L), ViT-Large/14 (1024h, 24L) - SigLIP SO400M/14 (1152h, 27L, no class token) - CLIP ViT-L/14-336 (1024h, 24L, QuickGELU) - HF weight name mapping for vision encoders Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

0xDaizz merged commit db8d116 into main Mar 17, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: VLM primitives — vision ops, NN layers, encoder models#96

feat: VLM primitives — vision ops, NN layers, encoder models#96
0xDaizz merged 1 commit intomainfrom
feat/vlm-primitives

0xDaizz commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0xDaizz commented Mar 17, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant