Skip to content

EPSILON0-dev/WordCrafter

Repository files navigation

WordCrafter

A full-stack application for exploring word embeddings! WordCrafter provides a FastAPI backend with word embeddings, vocabulary completion, and semantic search capabilities, paired with a React frontend that visualizes embeddings and enables semantic vector arithmetic.

Screenshots

Neighbour Explorer Semantic Essence Enricher
Neighbour Explorer Semantic Essence Enricher

Features

Backend (FastAPI)

  • Word Completion: Fast prefix-based word completion with configurable result counts
  • Vocabulary Search: Check word existence in the database
  • Semantic Search: Find semantically similar words using embeddings with nearest neighbor search
  • High Performance: In-memory vocabulary trie for fast prefix matching
  • Vector Database: PostgreSQL with pgvector for efficient embedding similarity queries
  • Async API: FastAPI with async/await for high concurrency

Frontend (React)

  • Interactive Word Explorer: Search words and visualize their embeddings as 30×10 dot grids
  • Semantic Calculator: Perform vector arithmetic (e.g., king - man + woman)
  • Semantic Essence Enricher: Extract semantic deltas from word pairs and apply them to target words
  • Autocomplete: Fast prefix-based word completion with 250ms debounce
  • In-browser Vector Math: Add, subtract, normalize, and scale embeddings without server round-trips
  • Responsive Design: Built with Tailwind CSS and TypeScript

Prerequisites

For Backend & Database

  • Python 3.10+
  • PostgreSQL 15+ with pgvector extension (provided via Docker)
  • Docker & Docker Compose

For Frontend

  • Node.js 18+
  • pnpm

Quick Start

Option 1: Full Stack with Docker (Recommended)

# Start all services: database, backend, and frontend
docker compose -f docker_compose.yml up --build

The app will be available at http://localhost:3939

Option 2: Development Setup

1. Start the database:

docker compose -f docker_compose_dev.yml up -d

2. Set up Python environment and install dependencies:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

3. Start the backend API:

fastapi dev api/main.py

API available at http://localhost:8000

4. Start the frontend (in a new terminal):

cd ui
pnpm install
pnpm dev

UI available at http://localhost:5173

Load Word Embeddings (First Time Only)

If you have custom embeddings data, prepare and load it:

source .venv/bin/activate
cd scripts

# 1. Extract vocabulary from your data source
python extract_vocab.py --input <your_data> --output vocab.txt

# 2. Filter embeddings by vocabulary
python filter_embeddings.py --input <embeddings> --vocab vocab.txt --output filtered.txt

# 3. Normalize embeddings
python normalize_embeddings.py --input filtered.txt --output normalized.txt

# 4. Load into database
python prepare_db.py --embeddings normalized.txt --host localhost --port 5432

Note: This process can take 30+ minutes depending on vocabulary size.

Backend API Endpoints

Development: http://localhost:8000
Production: http://localhost:3939

Word Completion

GET /api/vocabulary/completion/{word}?count=20

Returns up to count words that start with the given prefix.

Example:

curl http://localhost:8000/api/vocabulary/completion/hel?count=5

Word Existence Check

GET /api/vocabulary/check/{word}

Check if a word exists in the vocabulary.

Example:

curl http://localhost:8000/api/vocabulary/check/python

Get Word Embedding

GET /api/embeddings/{word}

Retrieve the 300-dimensional embedding vector for a word.

Find Nearest Words

GET /api/vocabulary/nearest/{word}?count=5

Find semantically similar words based on embedding vectors.

Nearest by Custom Embedding

POST /api/embeddings/nearest

Find nearest words for a custom embedding vector.

Request Body:

{
  "embedding": [0.1, 0.2, ..., 0.5],
  "count": 5
}

Frontend Pages

Route Feature Description
/ Home / Neighbor Explorer Type a word and explore its nearest neighbors
/about About Learn about embeddings and how the app works
/calculate Semantic Calculator Perform vector arithmetic (e.g., king - man + woman)
/enrich Semantic Essence Enricher Extract semantic deltas from word pairs and apply to targets

Backend Architecture

Database Layer (api/db.py)

  • Manages PostgreSQL connections
  • Executes queries for embeddings and vocabulary
  • Uses psycopg3 for async support
  • Supports vector similarity search via pgvector's HNSW index

Vocabulary Layer (api/vocab.py)

  • In-memory trie-like structure indexed by first two characters
  • Fast O(1) prefix matching for word completion
  • Efficient word existence checking
  • Handles edge cases and invalid inputs

API Layer (api/main.py)

  • FastAPI application with lifespan context manager
  • Request/response logging middleware
  • Pydantic models for request validation
  • JSON response formatting

Frontend Architecture

Components

  • EmbeddingViz: Renders embeddings as 30×10 dot grids with min-max normalization to [0, 1]
  • WordInput: Text input with autocomplete dropdown (250ms debounce)
  • Layout: Main shell with navigation and page outlet
  • Pages: Four independent page components managing their own state

Libraries

  • embeddings.ts: Pure vector math functions (add, subtract, scale, normalize, weighted combinations)
  • api.ts: Typed fetch wrappers for all backend endpoints
  • useDebounce: Generic debounce hook with configurable delay
  • useWordCompletion: Hook for fetching and caching word completion suggestions

Styling

  • Tailwind CSS v3: Utility-first CSS framework
  • shadcn/ui: Unstyled, accessible component primitives
  • Modern glassmorphic design: Dark theme with lime-green accents and smooth transitions

Development

Backend

source .venv/bin/activate

# Format code (Black, 140 char line length)
black api/ tests/ scripts/

# Run tests
pytest tests/

# Run tests with coverage
pytest tests/ --cov=api

Frontend

cd ui

# Lint code
pnpm lint

# Build for production
pnpm build

# Preview production build
pnpm preview

Environment Variables

Backend (api/.env)

DB_HOST=localhost
DB_PORT=5432
DB_USER=wc_user
DB_PASSWORD=wc_password
DB_NAME=wc_db

Frontend (ui/.env)

VITE_API_URL=http://localhost:8000

License

Licensed under the MIT License. See LICENSE for details.

About

Semantic arithmetic for word embeddings - extract meaning from word pairs and apply it to targets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors