A command-line tool for managing and versioning embedding vectors. Think of it as "Git for Embeddings" - helping you track, compare, and manage semantic changes in your ML models.
- Store and version control your embedding vectors
- Compare semantic similarity between embeddings
- Track embedding history and changes
- Roll back to previous embedding versions
- Support for multiple embedding models
- Organize embeddings into sets for better management
make clean
make all
make install
# Download the release archive (tar.gz or zip):
# tar -xzf embedding_bridge-<version>.tar.gz
# # or
# unzip embedding_bridge-<version>.zip
# cd embedding_bridge-<version>
# Create a short-name `embr` launcher in this folder (symlink to the wrapper):
ln -s run_embedding_bridge.sh embr
# Add the release directory to your PATH so you can invoke the launcher:
export PATH="$PWD:$PATH"
# Now run the CLI (the wrapper will set LD_LIBRARY_PATH for you):
embr --help
# Initialize
embr init
# Register a model
embr model register text-embedding-3-small --dimensions 1536 --normalize
# Store an embedding
embr store --embedding vector.bin --dims 1536 document.txt
# Check status
embr status document.txt
# Compare embeddings
embr diff <hash1> <hash2>
# Roll back to previous version
embr rollback <hash> document.txt
# Create and manage sets
embr set create experimental
embr switch experimental
# Register a new model
embr model register <model-name> --dimensions <dims> [--normalize]
# List registered models
embr model list
# Store embedding from binary file
embr store --embedding vector.bin --dims 1536 file.txt
# Store embedding from numpy file
embr store --embedding vector.npy file.txt
# Check embedding status
embr status file.txt
embr status -v file.txt # verbose output
# Compare embeddings
embr diff <hash1> <hash2>
# Roll back to previous version
embr rollback <hash> file.txt
# View embedding log
embr log file.txt
# Create a new set of embeddings
embr set create <name>
# List available sets
embr set list
embr set list --verbose
# Switch between sets
embr switch <name>
# Show current set status
embr set status
# Compare differences between sets
embr set diff <set1> <set2>
# Delete a set
embr set -d <name> [--force]
# Merge embeddings from one set to another
embr merge <source-set> [<target-set>] [--strategy=<strategy>]
# Add a remote
embr remote add <name> <url>
# List remotes
embr remote list
# Push a set to remote
embr push <remote> [<set>]
# Pull a set from remote
embr pull <remote> [<set>]
# Download a file or directory from a repository
embr get <remote> <path>
# Garbage collect unreferenced embeddings
embr gc [options]
# Example: dry run
enbr gc -n
# Remove embeddings from tracking
embr rm file.txt
embr rm --cached file.txt
embr rm -m openai-3 file.txt
Leverage EmbeddingBridge within your Python projects using our dedicated package. It's ideal for scripting, automation, and deeper integration with your existing ML workflows.
Python Package Highlights:
- Direct access to core storage and management functionalities via Python
ctypes
bindings to the C library. EmbeddingStore
class: For fine-grained control over embedding storage, retrieval, and similarity searches.EmbeddingBridge
class: Programmatically executeembr
CLI commands from your Python scripts.- Seamlessly integrate EmbeddingBridge's versioning and management capabilities into your Python applications.
Install the Python package in editable mode directly from the project root:
pip install -e ./python
This command assumes your setup.py
for the Python package is located in the python
subdirectory.
For comprehensive API documentation, usage examples, and advanced configurations, please consult the Python Package README.
We welcome contributions! Please see our Contributing Guidelines for details.
This project is licensed under the GNU General Public License v2.0 - see the LICENSE file for details.