Skip to content

dev-hari-prasad/vps-vector-node

Repository files navigation

VectorNode

One-command tool to turn any VPS into a production-ready, OpenAI-compatible embedding API server.

πŸ“– Table of Contents

✨ Features

  • OpenAI-compatible API: Drop-in replacement for /v1/embeddings
  • CPU-only inference: Runs on low-end VPS (4GB RAM, 2 vCPU)
  • Hugging Face integration: Download and cache models from HF Hub
  • API key authentication: Secure your API with local key management
  • Concurrency control: Automatic concurrency limits based on model size and hardware
  • Request queueing: Handle traffic spikes gracefully
  • Memory safety: Pre-flight checks prevent OOM crashes

πŸš€ Quick Start

Prerequisites

  • Node.js 18+ (npm or npx)
  • Hugging Face Account with an API token
  • 4GB+ RAM (for small models; 8GB+ recommended)

Installation

npm install -g vps-vector-node

Or run without installing:

npx vps-vector-node --help

Step 1: Get Your Hugging Face Token

VectorNode requires a Hugging Face token to download models.

Get a token:

  1. Go to https://huggingface.co/join (create account if needed)
  2. Navigate to https://huggingface.co/settings/tokens
  3. Click "New token" β†’ Give it a name (e.g., "vectornode")
  4. Select "Read" permission
  5. Click "Generate token" and copy it immediately

Login to VectorNode:

npx vps-vector-node login hf --token hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Or if installed globally:

vectornode login hf --token hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The token is stored securely in ~/.vectornode/config.json.

Token troubleshooting:

  • Invalid token? Ensure you copied the entire token including hf_ prefix
  • Permission denied? Token must have "Read" permission
  • Lost token? Create a new one at https://huggingface.co/settings/tokens

Step 2: Download a Model

Download the model before starting the server (avoids delays on first request):

npx vps-vector-node models:download bge-small-en

Or if installed globally:

vectornode models:download bge-small-en

See Available Models for other options.

Step 3: Create an API Key

npx vps-vector-node key create --name dev

This outputs an API key like sk-abc123.... Store it securelyβ€”you'll need it for every API request.

Step 4: Start the Server

npx vps-vector-node serve --model bge-small-en --port 3000

Or if installed globally:

vectornode serve --model bge-small-en --port 3000

You should see:

[INFO] Starting VectorNode server { model: 'bge-small-en', port: '3000', host: '0.0.0.0' }
[INFO] Model loaded successfully { model: 'bge-small-en', dimensions: 384 }
[INFO] Server listening on 0.0.0.0:3000

Step 5: Test the API

Using curl:

curl -H "Authorization: Bearer sk-abc123..." \
  -H "Content-Type: application/json" \
  -X POST http://localhost:3000/v1/embeddings \
  -d '{
    "model": "bge-small-en",
    "input": "hello world"
  }'

Using Postman:

  • Method: POST
  • URL: http://localhost:3000/v1/embeddings
  • Headers:
    • Authorization: Bearer sk-abc123...
    • Content-Type: application/json
  • Body (raw JSON):
    {
      "model": "bge-small-en",
      "input": "hello world"
    }

Important Notes:

  • /v1/embeddings only accepts POST requests (GET will return 404)
  • Send the payload as raw JSON in the body, not as query parameters
  • VectorNode handles Unicode (spaces, tabs, emoji) automatically:
    {
      "model": "bge-small-en",
      "input": "hello world πŸš€\tTabbed text"
    }
  • If your JSON has extra surrounding quotes, remove them (e.g., remove " before { and after })

πŸ“Š Available Models

Model Comparison Table

Model ID Dimensions Size (GB) Parameters Min RAM Rec RAM Best Use Case Multilingual Latency
bge-micro 384 0.1 ~22M 2GB 4GB Edge/Frontend, ultra-low latency, real-time No ~5ms
minilm-l6-v2 384 0.4 ~22M 2GB 4GB General purpose, semantic search, fast inference No ~8ms
gte-small 384 0.5 ~33M 2GB 4GB General embedding, retrieval Yes ~10ms
bge-small-en 384 0.5 ~110M 3GB 6GB RAG, production search, English-focused No ~12ms
gemma-embedding-small 768 1.6 ~1.6M 3GB 6GB Lightweight multilingual, fast Yes ~15ms
gte-base 768 1.5 ~137M 4GB 8GB Higher quality search, retrieval-augmented gen Yes ~25ms
bge-base-en 768 1.5 ~125M 4GB 8GB Better accuracy than small, English RAG No ~30ms
qwen2.5-embedding-0.5b 1024 2.0 ~500M 4GB 8GB Modern multilingual, strong semantic understanding Yes ~40ms
gemma-embedding-base 768 3.0 ~308M 6GB 12GB High-quality multilingual embeddings Yes ~50ms
bge-large-en 1024 3.0 ~335M 8GB 16GB Best accuracy for English, production RAG systems No ~80ms

Quick Decision Tree

Choose your model based on your needs:

  • Ultra-fast (<10ms), edge device?
    β†’ bge-micro or minilm-l6-v2

  • Production English RAG/search?
    β†’ bge-small-en (balanced) or bge-large-en (best quality)

  • Multilingual support needed?
    β†’ gemma-embedding-small (fast) or gte-base (quality)

  • General purpose, cost-conscious?
    β†’ gte-small or minilm-l6-v2

  • Maximum accuracy, sufficient hardware?
    β†’ bge-large-en or gemma-embedding-base

Recommended Defaults by Hardware

Hardware Profile Recommended Model Why
2GB RAM, 1-2 vCPU bge-micro or minilm-l6-v2 Minimal overhead, fast inference
4GB RAM, 2-4 vCPU bge-small-en or gte-small Production-ready, good balance
8GB RAM, 4-8 vCPU bge-base-en or gte-base Higher quality, still responsive
16GB+ RAM, 8+ vCPU bge-large-en or gemma-embedding-base Best quality and multilingual support

List All Available Models

npx vps-vector-node models
# or if installed globally:
vectornode models

πŸ”Œ API Reference

POST /v1/embeddings

Generate embeddings for input text(s).

Request:

{
  "model": "bge-small-en",
  "input": "hello world"
}

For multiple inputs:

{
  "model": "bge-small-en",
  "input": ["text1", "text2", "text3"]
}

Query Parameters:

  • normalize (boolean, default: true) - Whether to L2-normalize embeddings

Response (200 OK):

{
  "object": "list",
  "data": [
    {
      "index": 0,
      "embedding": [0.0123, -0.0034, ...],
      "object": "embedding"
    }
  ],
  "model": "bge-small-en",
  "usage": {
    "text_length": [11],
    "tokens_estimated": [3]
  },
  "inference_time_ms": 12
}

Status Codes:

  • 200 - Success
  • 202 - Request queued (includes queue_position in response)
  • 400 - Bad request (missing/invalid fields)
  • 401 - Unauthorized (invalid/missing API key)
  • 429 - Queue full
  • 500 - Server error

GET /v1/models

List available models on the server.

Headers:

  • Authorization: Bearer sk-... (required)

Response (200 OK):

{
  "object": "list",
  "data": [
    {
      "id": "bge-small-en",
      "dimensions": 384,
      "size_gb": 0.5,
      "quantization": "Q4_K",
      "source": "huggingface",
      "loaded": true
    }
  ]
}

GET /health

Health check endpoint (no authentication required).

Response (200 OK):

{
  "status": "ok",
  "model_loaded": "bge-small-en",
  "uptime_seconds": 1023
}

GET /metrics

Server performance metrics.

Headers:

  • Authorization: Bearer sk-... (required)

Response (200 OK):

{
  "active_requests": 3,
  "queue_depth": 2,
  "memory_free_mb": 2432,
  "cpu_load": 0.43,
  "requests_per_sec": 12.5,
  "avg_latency_ms": 45
}

πŸ’» CLI Commands

Login to Hugging Face

npx vps-vector-node login hf --token <token>
# or if installed globally:
vectornode login hf --token <token>

Token is stored in ~/.vectornode/config.json.

Download Models

npx vps-vector-node models:download bge-small-en
# or if installed globally:
vectornode models:download bge-small-en

Serve

Start the embedding API server:

npx vps-vector-node serve --model <model-id> [options]

Common Options:

  • --port <port> - Port to listen on (default: 3000)
  • --host <host> - Host to bind to (default: 0.0.0.0)
  • --max-queue-size <size> - Maximum queue size (default: 100)
  • --batch-size <size> - Batch size for processing (default: 10)
  • --threads <threads> - Number of threads (auto-detected)
  • --verbose - Enable verbose logging
  • --trace - Enable trace logging

Key Management

Create a new API key:

npx vps-vector-node key create --name <name>

List all keys:

npx vps-vector-node key list

Revoke a key:

npx vps-vector-node key revoke --name <name>

βš™οΈ Configuration

Configuration is stored in ~/.vectornode/:

~/.vectornode/
β”œβ”€β”€ config.json          # HF token and settings
β”œβ”€β”€ keys.json            # API keys
β”œβ”€β”€ models/              # Downloaded models cache
β”‚   └── bge-small-en/
β”‚   └── bge-base-en/
β”‚   └── ...
└── logs/                # Trace logs (if --trace enabled)

System Requirements

Minimum (for small models):

  • RAM: 4GB
  • CPU: 2 vCPU
  • Storage: 5GB free

Recommended (for base models):

  • RAM: 8GB
  • CPU: 4 vCPU
  • Storage: 10GB free

The server checks available memory before loading models and will refuse to start if insufficient.


🐳 Deployment

Docker

Dockerfile:

FROM node:18-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --production

COPY . .

EXPOSE 3000
CMD ["node", "cli/index.js", "serve", "--model", "bge-small-en"]

Build and run:

docker build -t vectornode .
docker run -p 3000:3000 vectornode

Systemd Service (Linux)

Create /etc/systemd/system/vectornode.service:

[Unit]
Description=VectorNode Embedding API
After=network.target

[Service]
Type=simple
User=vectornode
WorkingDirectory=/opt/vectornode
ExecStart=/usr/bin/node /opt/vectornode/cli/index.js serve --model bge-small-en --port 3000
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable vectornode
sudo systemctl start vectornode

πŸ”§ Troubleshooting

Model Download Fails

Invalid or missing token:

Token permission issues:

  • Ensure your token has "Read" permission

Network/connectivity issues:

  • Check connection to huggingface.co: ping huggingface.co
  • Try again (may be temporary)

Not enough disk space:

  • Check available space: df -h
  • Clean up space or use a smaller model

Model Not Found

[ERROR] Model not found: gemma-embedding-small. 
Run: vectornode models:download gemma-embedding-small

Solution: Download the model first:

npx vps-vector-node models:download gemma-embedding-small

Out of Memory Errors

  • Use a smaller model (e.g., bge-small-en instead of bge-large-en)
  • Reduce --max-queue-size flag
  • Add swap space to the system
  • Increase available RAM

Slow Inference

  • Check CPU usage: top or htop
  • Try reducing --batch-size
  • Use a quantized model
  • Add more CPU resources

API Returns 401 Unauthorized

Missing or invalid API key:

  • List keys: npx vps-vector-node key list
  • Create new key: npx vps-vector-node key create --name dev
  • Verify header format: Authorization: Bearer sk-...

JSON Parse Errors

[ERROR] Unexpected token '"', ""{\\n  \\\"mo\"... is not valid JSON

Solution in Postman:

  • Use Body tab β†’ select raw β†’ pick JSON from dropdown
  • Place payload in body (not in query params)
  • Remove any extra quotes around the JSON

πŸ‘¨β€πŸ’» Development

Setup

git clone <repo>
cd vectornode
npm install

Run Tests

npm test

Local Development with Verbose Logging

node cli/index.js serve --model bge-small-en --verbose --port 3000

Enable Trace Logging

node cli/index.js serve --model bge-small-en --trace --port 3000

Logs are saved to ~/.vectornode/logs/


πŸ“„ License

MIT

🀝 Contributing

Contributions welcome! Please:

  1. Open an issue to discuss your idea
  2. Fork the repository
  3. Submit a PR with a clear description

About

One-command tool to turn any VPS into a production-ready, OpenAI-compatible embedding API server.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published