Embedding Model Selection Framework

A comprehensive framework for evaluating and selecting the most suitable embedding model for various NLP tasks. This tool provides detailed performance metrics, cost analysis, and resource utilization comparisons across different embedding providers including OpenAI, VoyageAI, Hugging Face models, and Sentence Transformers.

🌟 Features

Comprehensive embedding model evaluation framework
Support for multiple embedding providers:
- OpenAI (text-embedding-3-small, text-embedding-3-large)
- VoyageAI (voyage-2, voyage-3)
- Hugging Face models (BERT, DistilBERT)
- Sentence Transformers
Detailed performance metrics and resource monitoring
Cost analysis and estimation
Task-specific model recommendations
Automated model selection based on requirements

📋 Prerequisites

Python 3.8+
CUDA-compatible GPU (optional, but recommended for better performance)
API keys for OpenAI and VoyageAI (if using these providers)

🚀 Installation

Clone the repository:

git@github.com:Ziv2/Embedding-selection-framework.git
cd embedding-selection-framework

Install required packages:

pip install -r requirements.txt

Set up environment variables for API keys:

export OPENAI_API_KEY='your-openai-api-key'
export VOYAGE_API_KEY='your-voyage-api-key'

💻 Usage

Basic Usage

from embedding_evaluator import EnhancedEmbeddingEvaluator
from embedding_providers import OpenAIEmbedding, VoyageAIEmbedding, HuggingFaceEmbedding

# Initialize providers
providers = [
    HuggingFaceEmbedding('bert-base-uncased'),
    OpenAIEmbedding('text-embedding-3-small'),
    VoyageAIEmbedding('voyage-2')
]

# Compare embeddings
results = EnhancedEmbeddingEvaluator.compare_embeddings(
    embedding_providers=providers,
    classifier_factory=lambda: SVC(kernel='linear'),
    train_texts=train_texts,
    train_labels=train_labels,
    test_texts=test_texts,
    test_labels=test_labels
)

Task-Specific Model Selection

from nlp_embeddings import NLPEmbeddingHandler

handler = NLPEmbeddingHandler()
result = handler.get_embeddings('sentiment_analysis', "This product is amazing!")

📊 Performance Metrics

The framework provides comprehensive metrics including:

Accuracy, Precision, Recall, F1 score
Processing speed (tokens/second)
Memory usage (RAM and GPU)
Cost estimates
Embedding dimensions
Total processing time

🛠️ Configuration

The framework supports various configuration options through environment variables or a config file:

MODEL_SPECS = {
    'OpenAI_text-embedding-3-small': {
        'dim': 1536,
        'cost_per_1k_tokens': 0.00002,
        'max_tokens': 8191,
        'suggested_batch': 2048
    },
    # ... more model specifications
}

📈 Example Results

Model                         Accuracy    F1-Score    Tokens/sec    Cost/1M tokens
---------------------------- ----------- ----------- ------------- ----------------
OpenAI_text-embedding-3-small    0.945      0.943        5000         $0.02
VoyageAI_voyage-2               0.932      0.930        4500         $0.12
HF_bert-base-uncased           0.918      0.915        2000         $0.00

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Hugging Face for their transformer models
Sentence Transformers team
OpenAI for their embedding models and API
VoyageAI for their embedding services

📞 Contact

https://www.deepkeep.ai/contact

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Embeddings-selector		Embeddings-selector
benchmark_results		benchmark_results
diffuse-comparison		diffuse-comparison
docs		docs
model_benchmark_examples		model_benchmark_examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embedding Model Selection Framework

🌟 Features

📋 Prerequisites

🚀 Installation

💻 Usage

Basic Usage

Task-Specific Model Selection

📊 Performance Metrics

🛠️ Configuration

📈 Example Results

🤝 Contributing

📝 License

🙏 Acknowledgments

📞 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Embedding Model Selection Framework

🌟 Features

📋 Prerequisites

🚀 Installation

💻 Usage

Basic Usage

Task-Specific Model Selection

📊 Performance Metrics

🛠️ Configuration

📈 Example Results

🤝 Contributing

📝 License

🙏 Acknowledgments

📞 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages