VectorNode

One-command tool to turn any VPS into a production-ready, OpenAI-compatible embedding API server.

📖 Table of Contents

Features
Quick Start
Available Models
API Reference
CLI Commands
Configuration
Deployment
Troubleshooting
Development

✨ Features

OpenAI-compatible API: Drop-in replacement for /v1/embeddings
CPU-only inference: Runs on low-end VPS (4GB RAM, 2 vCPU)
Hugging Face integration: Download and cache models from HF Hub
API key authentication: Secure your API with local key management
Concurrency control: Automatic concurrency limits based on model size and hardware
Request queueing: Handle traffic spikes gracefully
Memory safety: Pre-flight checks prevent OOM crashes

🚀 Quick Start

Prerequisites

Node.js 18+ (npm or npx)
Hugging Face Account with an API token
4GB+ RAM (for small models; 8GB+ recommended)

Installation

npm install -g vps-vector-node

Or run without installing:

npx vps-vector-node --help

Step 1: Get Your Hugging Face Token

VectorNode requires a Hugging Face token to download models.

Get a token:

Go to https://huggingface.co/join (create account if needed)
Navigate to https://huggingface.co/settings/tokens
Click "New token" → Give it a name (e.g., "vectornode")
Select "Read" permission
Click "Generate token" and copy it immediately

Login to VectorNode:

npx vps-vector-node login hf --token hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Or if installed globally:

vectornode login hf --token hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The token is stored securely in ~/.vectornode/config.json.

Token troubleshooting:

Invalid token? Ensure you copied the entire token including hf_ prefix

Permission denied? Token must have "Read" permission

Lost token? Create a new one at https://huggingface.co/settings/tokens

Step 2: Download a Model

Download the model before starting the server (avoids delays on first request):

npx vps-vector-node models:download bge-small-en

Or if installed globally:

vectornode models:download bge-small-en

See Available Models for other options.

Step 3: Create an API Key

npx vps-vector-node key create --name dev

This outputs an API key like sk-abc123.... Store it securely—you'll need it for every API request.

Step 4: Start the Server

npx vps-vector-node serve --model bge-small-en --port 3000

Or if installed globally:

vectornode serve --model bge-small-en --port 3000

You should see:

[INFO] Starting VectorNode server { model: 'bge-small-en', port: '3000', host: '0.0.0.0' }
[INFO] Model loaded successfully { model: 'bge-small-en', dimensions: 384 }
[INFO] Server listening on 0.0.0.0:3000

Step 5: Test the API

Using curl:

curl -H "Authorization: Bearer sk-abc123..." \
  -H "Content-Type: application/json" \
  -X POST http://localhost:3000/v1/embeddings \
  -d '{
    "model": "bge-small-en",
    "input": "hello world"
  }'

Using Postman:

Method: POST
URL: http://localhost:3000/v1/embeddings
Headers:
- Authorization: Bearer sk-abc123...
- Content-Type: application/json

Body (raw JSON):

{
  "model": "bge-small-en",
  "input": "hello world"
}

Important Notes:
/v1/embeddings only accepts POST requests (GET will return 404)

Send the payload as raw JSON in the body, not as query parameters
VectorNode handles Unicode (spaces, tabs, emoji) automatically:
{
  "model": "bge-small-en",
  "input": "hello world 🚀\tTabbed text"
}
If your JSON has extra surrounding quotes, remove them (e.g., remove " before { and after })

📊 Available Models

Model Comparison Table

Model ID	Dimensions	Size (GB)	Parameters	Min RAM	Rec RAM	Best Use Case	Multilingual	Latency
`bge-micro`	384	0.1	~22M	2GB	4GB	Edge/Frontend, ultra-low latency, real-time	No	~5ms
`minilm-l6-v2`	384	0.4	~22M	2GB	4GB	General purpose, semantic search, fast inference	No	~8ms
`gte-small`	384	0.5	~33M	2GB	4GB	General embedding, retrieval	Yes	~10ms
`bge-small-en`	384	0.5	~110M	3GB	6GB	RAG, production search, English-focused	No	~12ms
`gemma-embedding-small`	768	1.6	~1.6M	3GB	6GB	Lightweight multilingual, fast	Yes	~15ms
`gte-base`	768	1.5	~137M	4GB	8GB	Higher quality search, retrieval-augmented gen	Yes	~25ms
`bge-base-en`	768	1.5	~125M	4GB	8GB	Better accuracy than small, English RAG	No	~30ms
`qwen2.5-embedding-0.5b`	1024	2.0	~500M	4GB	8GB	Modern multilingual, strong semantic understanding	Yes	~40ms
`gemma-embedding-base`	768	3.0	~308M	6GB	12GB	High-quality multilingual embeddings	Yes	~50ms
`bge-large-en`	1024	3.0	~335M	8GB	16GB	Best accuracy for English, production RAG systems	No	~80ms

Quick Decision Tree

Choose your model based on your needs:

Ultra-fast (<10ms), edge device?
→ bge-micro or minilm-l6-v2
Production English RAG/search?
→ bge-small-en (balanced) or bge-large-en (best quality)
Multilingual support needed?
→ gemma-embedding-small (fast) or gte-base (quality)
General purpose, cost-conscious?
→ gte-small or minilm-l6-v2
Maximum accuracy, sufficient hardware?
→ bge-large-en or gemma-embedding-base

Recommended Defaults by Hardware

Hardware Profile	Recommended Model	Why
2GB RAM, 1-2 vCPU	`bge-micro` or `minilm-l6-v2`	Minimal overhead, fast inference
4GB RAM, 2-4 vCPU	`bge-small-en` or `gte-small`	Production-ready, good balance
8GB RAM, 4-8 vCPU	`bge-base-en` or `gte-base`	Higher quality, still responsive
16GB+ RAM, 8+ vCPU	`bge-large-en` or `gemma-embedding-base`	Best quality and multilingual support

List All Available Models

npx vps-vector-node models
# or if installed globally:
vectornode models

🔌 API Reference

POST /v1/embeddings

Generate embeddings for input text(s).

Request:

{
  "model": "bge-small-en",
  "input": "hello world"
}

For multiple inputs:

{
  "model": "bge-small-en",
  "input": ["text1", "text2", "text3"]
}

Query Parameters:

normalize (boolean, default: true) - Whether to L2-normalize embeddings

Response (200 OK):

{
  "object": "list",
  "data": [
    {
      "index": 0,
      "embedding": [0.0123, -0.0034, ...],
      "object": "embedding"
    }
  ],
  "model": "bge-small-en",
  "usage": {
    "text_length": [11],
    "tokens_estimated": [3]
  },
  "inference_time_ms": 12
}

Status Codes:

200 - Success
202 - Request queued (includes queue_position in response)
400 - Bad request (missing/invalid fields)
401 - Unauthorized (invalid/missing API key)
429 - Queue full
500 - Server error

GET /v1/models

List available models on the server.

Headers:

Authorization: Bearer sk-... (required)

Response (200 OK):

{
  "object": "list",
  "data": [
    {
      "id": "bge-small-en",
      "dimensions": 384,
      "size_gb": 0.5,
      "quantization": "Q4_K",
      "source": "huggingface",
      "loaded": true
    }
  ]
}

GET /health

Health check endpoint (no authentication required).

Response (200 OK):

{
  "status": "ok",
  "model_loaded": "bge-small-en",
  "uptime_seconds": 1023
}

GET /metrics

Server performance metrics.

Headers:

Authorization: Bearer sk-... (required)

Response (200 OK):

{
  "active_requests": 3,
  "queue_depth": 2,
  "memory_free_mb": 2432,
  "cpu_load": 0.43,
  "requests_per_sec": 12.5,
  "avg_latency_ms": 45
}

💻 CLI Commands

Login to Hugging Face

npx vps-vector-node login hf --token <token>
# or if installed globally:
vectornode login hf --token <token>

Token is stored in ~/.vectornode/config.json.

Download Models

npx vps-vector-node models:download bge-small-en
# or if installed globally:
vectornode models:download bge-small-en

Serve

Start the embedding API server:

npx vps-vector-node serve --model <model-id> [options]

Common Options:

--port <port> - Port to listen on (default: 3000)
--host <host> - Host to bind to (default: 0.0.0.0)
--max-queue-size <size> - Maximum queue size (default: 100)
--batch-size <size> - Batch size for processing (default: 10)
--threads <threads> - Number of threads (auto-detected)
--verbose - Enable verbose logging
--trace - Enable trace logging

Key Management

Create a new API key:

npx vps-vector-node key create --name <name>

List all keys:

npx vps-vector-node key list

Revoke a key:

npx vps-vector-node key revoke --name <name>

⚙️ Configuration

Configuration is stored in ~/.vectornode/:

~/.vectornode/
├── config.json          # HF token and settings
├── keys.json            # API keys
├── models/              # Downloaded models cache
│   └── bge-small-en/
│   └── bge-base-en/
│   └── ...
└── logs/                # Trace logs (if --trace enabled)

System Requirements

Minimum (for small models):

RAM: 4GB
CPU: 2 vCPU
Storage: 5GB free

Recommended (for base models):

RAM: 8GB
CPU: 4 vCPU
Storage: 10GB free

The server checks available memory before loading models and will refuse to start if insufficient.

🐳 Deployment

Docker

Dockerfile:

FROM node:18-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --production

COPY . .

EXPOSE 3000
CMD ["node", "cli/index.js", "serve", "--model", "bge-small-en"]

Build and run:

docker build -t vectornode .
docker run -p 3000:3000 vectornode

Systemd Service (Linux)

Create /etc/systemd/system/vectornode.service:

[Unit]
Description=VectorNode Embedding API
After=network.target

[Service]
Type=simple
User=vectornode
WorkingDirectory=/opt/vectornode
ExecStart=/usr/bin/node /opt/vectornode/cli/index.js serve --model bge-small-en --port 3000
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable vectornode
sudo systemctl start vectornode

🔧 Troubleshooting

Model Download Fails

Invalid or missing token:

Check if token is set: cat ~/.vectornode/config.json
Re-login with correct token: npx vps-vector-node login hf --token <your-token>
Get a new token: https://huggingface.co/settings/tokens

Token permission issues:

Ensure your token has "Read" permission

Network/connectivity issues:

Check connection to huggingface.co: ping huggingface.co
Try again (may be temporary)

Not enough disk space:

Check available space: df -h
Clean up space or use a smaller model

Model Not Found

[ERROR] Model not found: gemma-embedding-small. 
Run: vectornode models:download gemma-embedding-small

Solution: Download the model first:

npx vps-vector-node models:download gemma-embedding-small

Out of Memory Errors

Use a smaller model (e.g., bge-small-en instead of bge-large-en)
Reduce --max-queue-size flag
Add swap space to the system
Increase available RAM

Slow Inference

Check CPU usage: top or htop
Try reducing --batch-size
Use a quantized model
Add more CPU resources

API Returns 401 Unauthorized

Missing or invalid API key:

List keys: npx vps-vector-node key list
Create new key: npx vps-vector-node key create --name dev
Verify header format: Authorization: Bearer sk-...

JSON Parse Errors

[ERROR] Unexpected token '"', ""{\\n  \\\"mo\"... is not valid JSON

Solution in Postman:

Use Body tab → select raw → pick JSON from dropdown
Place payload in body (not in query params)
Remove any extra quotes around the JSON

👨‍💻 Development

Setup

git clone <repo>
cd vectornode
npm install

Run Tests

npm test

Local Development with Verbose Logging

node cli/index.js serve --model bge-small-en --verbose --port 3000

Enable Trace Logging

node cli/index.js serve --model bge-small-en --trace --port 3000

Logs are saved to ~/.vectornode/logs/

📄 License

MIT

🤝 Contributing

Contributions welcome! Please:

Open an issue to discuss your idea
Fork the repository
Submit a PR with a clear description

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cli		cli
lib		lib
server		server
spec		spec
tests		tests
.gitignore		.gitignore
.npmignore		.npmignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
PUBLISHING.md		PUBLISHING.md
README.md		README.md
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
vectornode-0.1.0.tgz		vectornode-0.1.0.tgz
vectornode.service		vectornode.service

License

dev-hari-prasad/vps-vector-node

Folders and files

Latest commit

History

Repository files navigation

VectorNode

📖 Table of Contents

✨ Features

🚀 Quick Start

Prerequisites

Installation

Step 1: Get Your Hugging Face Token

Step 2: Download a Model

Step 3: Create an API Key

Step 4: Start the Server

Step 5: Test the API

📊 Available Models

Model Comparison Table

Quick Decision Tree

Recommended Defaults by Hardware

List All Available Models

🔌 API Reference

POST /v1/embeddings

GET /v1/models

GET /health

GET /metrics

💻 CLI Commands

Login to Hugging Face

Download Models

Serve

Key Management

⚙️ Configuration

System Requirements

🐳 Deployment

Docker

Systemd Service (Linux)

🔧 Troubleshooting

Model Download Fails

Model Not Found

Out of Memory Errors

Slow Inference

API Returns 401 Unauthorized

JSON Parse Errors

👨‍💻 Development

Setup

Run Tests

Local Development with Verbose Logging

Enable Trace Logging

📄 License

🤝 Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages