feat: add GPU backends, quantization, and search optimizations#166
Open
cluster2600 wants to merge 6 commits intoalibaba:mainfrom
Open
feat: add GPU backends, quantization, and search optimizations#166cluster2600 wants to merge 6 commits intoalibaba:mainfrom
cluster2600 wants to merge 6 commits intoalibaba:mainfrom
Conversation
Add Metal Shading Language kernels for GPU-accelerated vector operations on Apple Silicon, including L2 distance, inner product, cosine similarity, vector normalization, matrix multiplication, and top-k selection. Includes C API wrapper, CMakeLists.txt for Metal compilation, and comprehensive Google Test suite. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add unified acceleration module supporting FAISS CPU and GPU backends with automatic hardware detection. Includes backend benchmark suite for performance comparison and realistic dataset benchmarks. New files: - python/zvec/accelerate.py: Unified accelerator interface - python/zvec/backends/gpu.py: FAISS GPU backend - python/zvec/backends/detect.py: Hardware detection - python/zvec/backends/benchmark.py: Performance benchmarks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Product Quantization (PQ) encoder, Optimized Product Quantization (OPQ) with rotation learning, and Scalar Quantization (8/16-bit) for efficient vector compression and approximate nearest neighbor search. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add pure Python HNSW index with FAISS fallback, optimized search functions (ADC, batched search, reranking), and Apple Silicon MPS backend using PyTorch for GPU-accelerated vector operations on macOS. Update pyproject.toml with accelerate/gpu optional dependencies and per-file-ignores for backends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3 tasks
2 tasks
cluster2600
added a commit
to cluster2600/zvec
that referenced
this pull request
Feb 25, 2026
Add header-only C++ implementations of Product Quantization (PQ) and Optimized Product Quantization (OPQ), plus upgrade the Python OPQ rotation from QR decomposition to SVD-based Orthogonal Procrustes. C++ Product Quantizer (product_quantizer.h): - k-means training with configurable m sub-quantizers and k centroids - encode/decode with distortion measurement - Header-only, depends only on <algorithm>, <cmath>, <vector> C++ OPQ (opq.h): - SVD-based Procrustes rotation: R = V * U^T from SVD(X^T * Y) - Self-contained Jacobi one-sided SVD solver (no LAPACK dependency) - Iterative refinement of rotation + PQ codebooks Python OPQ (_learn_rotation): - Replace simplified QR decomposition with SVD Procrustes - M = X^T @ decoded, U, _, Vt = svd(M), R = Vt.T @ U.T - Produces orthogonal rotations (error ~4e-6) - Benchmarked: ~1-10% reconstruction improvement over plain PQ Follow-up to alibaba#166 ("Future Work: sophisticated OPQ optimization"). Tested on: - macOS: clang++ C++17 compilation + runtime tests - Linux (Blackwell GPU): Python OPQ + cuVS CAGRA integration Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
4 tasks
cluster2600
added a commit
to cluster2600/zvec
that referenced
this pull request
Feb 25, 2026
Add persistent vector storage backed by RocksDB for GPU pipeline integration, plus documentation for the Metal C++ backend. VectorStorage (vector_storage.h): - RocksDB column families: "vectors", "pq_codes", "metadata" - Batch put/get for raw vectors and PQ codes - load_all() streams vectors into contiguous GPU-ready float buffer - Integrates with existing RocksdbContext wrapper Documentation (docs/METAL_CPP.md): - Architecture overview: RocksDB → load_all() → Metal GPU Buffers - Complete kernel reference table (distance, utility kernels) - Simdgroup optimization dispatch model - C++ PQ/OPQ API examples - RocksDB storage API examples Follow-up to alibaba#166 ("Future Work: Integration with RocksDB storage"). Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
This was referenced Feb 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
C++ Metal Backend
src/ailego/gpu/metal/zvec_metal.h— C API headersrc/ailego/gpu/metal/zvec_metal.cc— Objective-C++ implementationsrc/ailego/gpu/metal/zvec_metal.metal— Metal shaders (L2, IP, cosine, normalize, matmul, top-k)src/ailego/gpu/metal/CMakeLists.txt— Metal compilationtests/test_metal.cc— Google Test suitePython Backends
python/zvec/accelerate.py— Unified accelerator interfacepython/zvec/backends/gpu.py— FAISS GPU backendpython/zvec/backends/detect.py— Hardware detectionpython/zvec/backends/quantization.py— PQ encoderpython/zvec/backends/opq.py— OPQ encoder + Scalar Quantizerpython/zvec/backends/hnsw.py— Pure Python HNSW with FAISS fallbackpython/zvec/backends/search.py— ADC, batch search, rerankingpython/zvec/backends/apple_silicon.py— Apple Silicon MPS backendpython/zvec/backends/benchmark.py— Backend performance benchmarksConfiguration
pyproject.toml—accelerate/gpuoptional dependencies, per-file-ignores for backendsDocs
docs/METAL_CPP.md— Metal backend documentationContext
Split from #157. Aligns with cluster2600#2 content.
Test plan