Skip to content

feat(semantic): support high-dimensional embeddings#55

Merged
ualtinok merged 1 commit into
cortexkit:mainfrom
chrisolszewski:feat/semantic-high-dimensions
May 22, 2026
Merged

feat(semantic): support high-dimensional embeddings#55
ualtinok merged 1 commit into
cortexkit:mainfrom
chrisolszewski:feat/semantic-high-dimensions

Conversation

@chrisolszewski
Copy link
Copy Markdown
Contributor

@chrisolszewski chrisolszewski commented May 21, 2026

Closes #49

Summary

Raises the embedding dimension cap from 1024 to 4096 so larger models like
text-embedding-3-large (3072) can be indexed through the external backends.
The cache format and all defaults stay the same.

What changed

  • semantic_index.rs: MAX_DIMENSION goes from 1024 to 4096. Dimension
    checking is consolidated into one validate_embedding_dimension() that runs
    both when a backend returns a batch and when a cache is read back.
  • Tests: build, persist, reload, and search at 4096; reject 0 and 4097 on both
    build and deserialization.

Why just the cap and one check

This PR intentionally stays small: a constant change plus a single boundary
check. Bigger questions like aggregate memory ceilings and corrupt-cache
hardening are left out so they can be handled on their own if you want them.

Validation

  • cargo fmt --check
  • cargo test -p agent-file-tools --test semantic_validation_test
  • cargo test -p agent-file-tools semantic_index

Summary by cubic

Raises the embedding dimension cap to 4096 to support high‑dimensional models like text-embedding-3-large (3072) and common 4096‑d local models. Cache format and defaults are unchanged.

  • New Features

    • Increase MAX_DIMENSION from 1024 to 4096.
    • Allow 4096‑d embeddings during build and read; reject 0 and >4096 with clear errors.
  • Refactors

    • Add validate_embedding_dimension() and use it in batch validation and deserialization.

Written for commit 32f7f25. Summary will update on new commits. Review in cubic

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

@ualtinok ualtinok merged commit aa1f378 into cortexkit:main May 22, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support high-dimensional embedding models (3072 and 4096 dim)

2 participants