WordCrafter

A full-stack application for exploring word embeddings! WordCrafter provides a FastAPI backend with word embeddings, vocabulary completion, and semantic search capabilities, paired with a React frontend that visualizes embeddings and enables semantic vector arithmetic.

Screenshots

Neighbour Explorer	Semantic Essence Enricher

Features

Backend (FastAPI)

Word Completion: Fast prefix-based word completion with configurable result counts
Vocabulary Search: Check word existence in the database
Semantic Search: Find semantically similar words using embeddings with nearest neighbor search
High Performance: In-memory vocabulary trie for fast prefix matching
Vector Database: PostgreSQL with pgvector for efficient embedding similarity queries
Async API: FastAPI with async/await for high concurrency

Frontend (React)

Interactive Word Explorer: Search words and visualize their embeddings as 30×10 dot grids
Semantic Calculator: Perform vector arithmetic (e.g., king - man + woman)
Semantic Essence Enricher: Extract semantic deltas from word pairs and apply them to target words
Autocomplete: Fast prefix-based word completion with 250ms debounce
In-browser Vector Math: Add, subtract, normalize, and scale embeddings without server round-trips
Responsive Design: Built with Tailwind CSS and TypeScript

Prerequisites

For Backend & Database

Python 3.10+
PostgreSQL 15+ with pgvector extension (provided via Docker)
Docker & Docker Compose

For Frontend

Node.js 18+
pnpm

Quick Start

Option 1: Full Stack with Docker (Recommended)

# Start all services: database, backend, and frontend
docker compose -f docker_compose.yml up --build

The app will be available at http://localhost:3939

Option 2: Development Setup

1. Start the database:

docker compose -f docker_compose_dev.yml up -d

2. Set up Python environment and install dependencies:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

3. Start the backend API:

fastapi dev api/main.py

API available at http://localhost:8000

4. Start the frontend (in a new terminal):

cd ui
pnpm install
pnpm dev

UI available at http://localhost:5173

Load Word Embeddings (First Time Only)

If you have custom embeddings data, prepare and load it:

source .venv/bin/activate
cd scripts

# 1. Extract vocabulary from your data source
python extract_vocab.py --input <your_data> --output vocab.txt

# 2. Filter embeddings by vocabulary
python filter_embeddings.py --input <embeddings> --vocab vocab.txt --output filtered.txt

# 3. Normalize embeddings
python normalize_embeddings.py --input filtered.txt --output normalized.txt

# 4. Load into database
python prepare_db.py --embeddings normalized.txt --host localhost --port 5432

Note: This process can take 30+ minutes depending on vocabulary size.

Backend API Endpoints

Development: http://localhost:8000
Production: http://localhost:3939

Word Completion

GET /api/vocabulary/completion/{word}?count=20

Returns up to count words that start with the given prefix.

Example:

curl http://localhost:8000/api/vocabulary/completion/hel?count=5

Word Existence Check

GET /api/vocabulary/check/{word}

Check if a word exists in the vocabulary.

Example:

curl http://localhost:8000/api/vocabulary/check/python

Get Word Embedding

GET /api/embeddings/{word}

Retrieve the 300-dimensional embedding vector for a word.

Find Nearest Words

GET /api/vocabulary/nearest/{word}?count=5

Find semantically similar words based on embedding vectors.

Nearest by Custom Embedding

POST /api/embeddings/nearest

Find nearest words for a custom embedding vector.

Request Body:

{
  "embedding": [0.1, 0.2, ..., 0.5],
  "count": 5
}

Frontend Pages

Route	Feature	Description
`/`	Home / Neighbor Explorer	Type a word and explore its nearest neighbors
`/about`	About	Learn about embeddings and how the app works
`/calculate`	Semantic Calculator	Perform vector arithmetic (e.g., `king - man + woman`)
`/enrich`	Semantic Essence Enricher	Extract semantic deltas from word pairs and apply to targets

Backend Architecture

Database Layer (`api/db.py`)

Manages PostgreSQL connections
Executes queries for embeddings and vocabulary
Uses psycopg3 for async support
Supports vector similarity search via pgvector's HNSW index

Vocabulary Layer (`api/vocab.py`)

In-memory trie-like structure indexed by first two characters
Fast O(1) prefix matching for word completion
Efficient word existence checking
Handles edge cases and invalid inputs

API Layer (`api/main.py`)

FastAPI application with lifespan context manager
Request/response logging middleware
Pydantic models for request validation
JSON response formatting

Frontend Architecture

Components

EmbeddingViz: Renders embeddings as 30×10 dot grids with min-max normalization to [0, 1]
WordInput: Text input with autocomplete dropdown (250ms debounce)
Layout: Main shell with navigation and page outlet
Pages: Four independent page components managing their own state

Libraries

embeddings.ts: Pure vector math functions (add, subtract, scale, normalize, weighted combinations)
api.ts: Typed fetch wrappers for all backend endpoints
useDebounce: Generic debounce hook with configurable delay
useWordCompletion: Hook for fetching and caching word completion suggestions

Styling

Tailwind CSS v3: Utility-first CSS framework
shadcn/ui: Unstyled, accessible component primitives
Modern glassmorphic design: Dark theme with lime-green accents and smooth transitions

Development

Backend

source .venv/bin/activate

# Format code (Black, 140 char line length)
black api/ tests/ scripts/

# Run tests
pytest tests/

# Run tests with coverage
pytest tests/ --cov=api

Frontend

cd ui

# Lint code
pnpm lint

# Build for production
pnpm build

# Preview production build
pnpm preview

Environment Variables

Backend (`api/.env`)

DB_HOST=localhost
DB_PORT=5432
DB_USER=wc_user
DB_PASSWORD=wc_password
DB_NAME=wc_db

Frontend (`ui/.env`)

VITE_API_URL=http://localhost:8000

License

Licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
api		api
docs		docs
scripts		scripts
tests		tests
ui		ui
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker_compose.yml		docker_compose.yml
docker_compose_dev.yml		docker_compose_dev.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

WordCrafter

Screenshots

Features

Backend (FastAPI)

Frontend (React)

Prerequisites

For Backend & Database

For Frontend

Quick Start

Option 1: Full Stack with Docker (Recommended)

Option 2: Development Setup

Load Word Embeddings (First Time Only)

Backend API Endpoints

Word Completion

Word Existence Check

Get Word Embedding

Find Nearest Words

Nearest by Custom Embedding

Frontend Pages

Backend Architecture

Database Layer (api/db.py)

Vocabulary Layer (api/vocab.py)

API Layer (api/main.py)

Frontend Architecture

Components

Libraries

Styling

Development

Backend

Frontend

Environment Variables

Backend (api/.env)

Frontend (ui/.env)

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Database Layer (`api/db.py`)

Vocabulary Layer (`api/vocab.py`)

API Layer (`api/main.py`)

Backend (`api/.env`)

Frontend (`ui/.env`)

Packages