Skip to content

bkahan/sphereQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

257 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sphereQL

CI License: MIT Crates.io PyPI

Project high-dimensional embeddings onto a 3D sphere for fast semantic search, spatial queries, category-aware exploration, and interactive visualization.

sphereQL maps vectors from any embedding model (OpenAI, Cohere, sentence-transformers, etc.) onto spherical coordinates via one of four projection families — linear PCA, kernel PCA with a Gaussian (RBF) kernel, Laplacian eigenmap over a k-NN similarity graph, or random projection — then indexes them with shell/sector partitioning for fast nearest-neighbor lookups. A Category Enrichment Layer computes inter-category relationships, classifies bridges (Genuine / OverlapArtifact / Weak), and builds inner spheres for high-resolution within-category search. sphereQL auto-tunes its pipeline per corpus against a scalar QualityMetric; a meta-model recalls winning configs from past tuner runs when a new corpus arrives. Callable from Rust, Python, or the browser via WASM.

Documentation

Full documentation lives under docs/.

Install

# Cargo.toml
[dependencies]
sphereql = { version = "0.1", features = ["full"] }
# Python
pip install sphereql

See architecture.md for feature-flag details.

Rust — minimal example

use sphereql::embed::*;

// 1. Build a pipeline from categorized embeddings.
let input = PipelineInput {
    categories: vec![
        "science".into(), "science".into(),
        "cooking".into(), "cooking".into(),
    ],
    embeddings: vec![
        vec![0.1, 0.9, 0.3, 0.0],
        vec![0.2, 0.8, 0.4, 0.1],
        vec![0.9, 0.1, 0.0, 0.5],
        vec![0.8, 0.2, 0.1, 0.4],
    ],
};
let pipeline = SphereQLPipeline::new(input).unwrap();

// 2. Query nearest neighbors.
let query = PipelineQuery { embedding: vec![0.15, 0.85, 0.35, 0.05] };
let results = pipeline.query(SphereQLQuery::Nearest { k: 3 }, &query);

See the Rust quickstart for spatial indexing, the layout engine, GraphQL, and the full embedding pipeline. auto-tuning.md covers the PipelineConfig + auto_tune + MetaModel workflow end-to-end.

Python — minimal example

import sphereql

categories = ["science", "science", "cooking", "cooking"]
embeddings = [
    [0.1, 0.9, 0.3, 0.0],
    [0.2, 0.8, 0.4, 0.1],
    [0.9, 0.1, 0.0, 0.5],
    [0.8, 0.2, 0.1, 0.4],
]

pipeline = sphereql.Pipeline(categories, embeddings)
results = pipeline.nearest([0.15, 0.85, 0.35, 0.05], k=3)

# Interactive 3D visualization in your browser
sphereql.visualize(categories, embeddings, title="My Embeddings")

The Python bindings cover the full Rust surface — PCA, Kernel PCA, Laplacian eigenmap, auto_tune, MetaModel, FeedbackAggregator, and the category enrichment layer. Type stubs (.pyi) are auto-generated via pyo3-stub-gen. See the Python quickstart for semantic search, 3D visualization, vector database bridges, and the core type surface.

WASM — minimal example

cd sphereql-wasm && wasm-pack build --target web
import init, { Pipeline } from './pkg/sphereql_wasm.js';
await init();

const pipeline = new Pipeline(JSON.stringify({
  categories: ["science", "cooking"],
  embeddings: [[0.1, 0.9, 0.3], [0.9, 0.1, 0.0]],
}));
const results = pipeline.nearest(JSON.stringify([0.15, 0.85, 0.35]), 1);

Same bindings coverage as Python. Every pipeline / category / metalearning method returns typed values via tsifywasm-pack build emits a .d.ts with named interfaces, no JSON.parse required. See the WASM quickstart for category enrichment in the browser.

Workspace layout

Crate Role
sphereql Umbrella crate with feature flags for selective imports.
sphereql-core Spherical math — points, conversions, distance metrics, region types.
sphereql-index Spatial indexing with shell + sector partitioning.
sphereql-layout Layout engines (Fibonacci spiral, k-means, force-directed).
sphereql-embed Projections, query pipeline, Category Enrichment Layer, metalearning framework.
sphereql-graphql async-graphql schema: spatial queries (cone/shell/band/wedge), the full category enrichment surface, subscriptions, and a pluggable TextEmbedder trait for text query input.
sphereql-vectordb Vector store bridge (InMemory, Qdrant, Pinecone) with hybrid search.
sphereql-python Python bindings via PyO3/maturin.
sphereql-wasm WASM bindings via wasm-bindgen.
sphereql-corpus Shared example corpora (775-concept built-in + 300-concept stress).

Full dependency graph and crate-by-crate description in architecture.md.

Project status

sphereQL is at v0.2.0-alpha. The core API is functional and covered by 450+ tests, but may change before 1.0. Known limitations and roadmap are in project-status.md.

Binding parity is protected by a drift check (scripts/check-drift) — new public items in sphereql-embed / sphereql-layout must either have a Python/WASM binding or an allowlist entry with a reason in .bindings-ignore.toml.

Contributing

  1. Fork the repo and create a feature branch.
  2. Run cargo test --workspace --all-features and cargo clippy --workspace --all-features --all-targets.
  3. For Python changes, run cd sphereql-python && maturin develop && pytest -v.
  4. Open a PR against main.

The codebase uses Rust 2024 edition. All CI checks must pass before merge. See testing.md for the full pipeline.

License

MIT

About

Project high-dimensional embeddings onto a 3D sphere for fast semantic search, spatial queries, category-aware exploration, and interactive visualization.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors