This project provides a service to convert text to vector embeddings using OpenAI's embedding models and perform similarity searches. It's designed to be used as a standalone CLI, or as a Model Context Protocol (MCP) server for integration with Claude Desktop and other MCP clients.
- Generate embeddings from text using OpenAI's embedding models
- Save embeddings with custom IDs
- Perform similarity searches across stored embeddings
- Compare semantic similarity between two texts
- Delete and manage stored embeddings
- MCP server integration for Claude Desktop
- Python 3.9+ (for standalone CLI)
- Python 3.10+ (for MCP server)
- OpenAI API key
-
Clone this repository:
git clone https://github.com/yourusername/mcp-text-to-embedding.git cd mcp-text-to-embedding -
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install dependencies:
pip install -r requirements.txt -
Create a
.envfile with your OpenAI API key:cp .env.example .envThen edit the
.envfile and replaceyour_openai_api_key_herewith your actual OpenAI API key.
- The
.gitignorefile is set up to exclude the.envfile from Git - Use
.env.exampleas a template, but create your own.envfile locally - For production deployments, consider using a secrets management service
- If you accidentally commit your API key, you should immediately rotate it on the OpenAI dashboard
The embedding_cli.py script provides a command-line interface for working with embeddings:
# Generate an embedding and save it
./embedding_cli.py generate "This is a test sentence" --id test-sentence
# List all saved embeddings
./embedding_cli.py list
# Compare two texts for similarity
./embedding_cli.py compare "This is a test" "This is a sample"
# Search for similar embeddings
./embedding_cli.py search "test query" --top-k 3 --threshold 0.7
# Delete an embedding
./embedding_cli.py delete test-sentence
Run ./embedding_cli.py --help for more information on each command.
To run as an MCP server (requires Python 3.10+ and the MCP package):
- Ensure you have Python 3.10+ installed
- Install the MCP package:
pip install "mcp[cli]" - Run the MCP server:
python mcp_server.py
To use this server with Claude Desktop:
- Make sure Claude Desktop is installed (download from anthropic.com)
- Create a Claude Desktop configuration file at
~/Library/Application Support/Claude/claude_desktop_config.json - Configure the server using the example in
claude_desktop_config.example.json:
{
"mcpServers": {
"text-embeddings": {
"command": "python",
"args": [
"/absolute/path/to/mcp-text-to-embedding/mcp_server.py"
],
"env": {
"OPENAI_API_KEY": "your_openai_api_key_here"
},
"cwd": "/absolute/path/to/mcp-text-to-embedding"
}
}
}Make sure to:
- Replace
/absolute/path/to/mcp-text-to-embeddingwith the actual path to your project - Replace
your_openai_api_key_herewith your actual OpenAI API key - Restart Claude Desktop after making changes
When run as an MCP server, the following tools are available:
text_to_embedding: Convert text to an embedding vector and save itsimilarity_search: Find similar embeddings to a query textcompare_texts: Compare two text strings and get a similarity score with interpretationlist_embeddings: Show all saved embeddingsdelete_embedding: Remove an embedding from the repository
text_to_embedding.py: Core embedding generation functionalitysimilarity_search.py: Functions for similarity search and embedding managementembedding_cli.py: Command-line interface for working with embeddingsmcp_server.py: MCP server implementationembeddings/: Directory where embeddings are stored
Text embeddings are vector representations of text that capture semantic meaning. They're useful for:
- Semantic search
- Text clustering
- Finding similar documents
- Building recommendation systems
- And more!
The Model Context Protocol (MCP) is an open protocol that enables seamless integration between LLM applications and external data sources and tools. Learn more at modelcontextprotocol.io.
MIT