product-knowledge-vector-db

Tip

Semantic Product Knowledge Base

Tired of endlessly scrolling through online stores, trying to find the right product? Traditional keyword search and filters often miss the mark. This project brings semantic search to your product catalog, making it faster and more intuitive for users to find what they want.

How does it work?

All your products and their descriptions are converted into vectors (embeddings) using an embeddings API (like OpenAI).
When a user searches, their query is also embedded into a vector.
The system compares the query vector to all product vectors in your database, finding the most relevant matches—even if the keywords don’t exactly match.

This approach delivers more accurate, natural, and satisfying search results for your customers.

This project provides a complete open-source example stack for building a semantic product search and Q&A API over your own data:

Uses Postgres + pgvector for fast vector similarity search
Uses OpenAI for embeddings and answers
Exposes a simple FastAPI backend
All product data, embeddings, and retrieval logic are managed in your own database (no vendor lock-in)

Ideal for real-world RAG (retrieval-augmented generation), explainable product recommendations, and natural language search in commerce and catalog applications.

0 · What you get

Slice	Highlights
Vector DB	Postgres 15 image with `pgvector`, schema migrations tracked in `sql/`
Backend API	FastAPI (`app/**`) exposing `/embed`, `/similar`, `/answer`, `/langchain-answer`, protected by an API-Key header
Async helpers	Thin, reusable modules for OpenAI calls (`app/openai_utils.py`) and Postgres access (`app/db.py`)
Data & ETL	CSV → JSONL converter, batch loader that embeds + writes to Postgres
CLI tooling	Simple `curl` commands let you try semantic search from the terminal
Docker stack	`docker-compose.yml` spins up the DB and the API container
Tests / CI-ready	Pure-unit + integration tests (>90 % for `app/`), ¶ examples for GitHub Actions

1 · First-time setup

# ① Install Python deps (isolated virtual-env handled by Poetry)
poetry install

# ② Start the full stack (db + api) in Docker
docker compose -f docker/docker-compose.yml up -d        # or:  docker-compose …

Heads-up: The first container start will run all SQL migrations automatically.

Optionally seed / update the catalog

# Convert vendor CSV → JSONL (one-off)
poetry run python scripts/csv_to_jsonl.py data/products_full.csv

# Embed every product & bulk-insert (async, OpenAI key required)
poetry run python scripts/load_full.py

Try a quick CLI search

curl -G http://localhost:8000/similar \
     -H "X-API-Key:$API_KEY" \
     --data-urlencode "q=wireless waterproof earbuds" \
     --data-urlencode "k=8"

2 · Environment Variables

Copy .env_example to .env (both scripts and containers read it) and review/customize the values:

cp .env_example .env

# Postgres
PGHOST=localhost
PGPORT=5432
PGUSER=postgres
PGPASSWORD=postgres
PGDATABASE=products
# optional—if omitted, the app assembles one from the PG* vars
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/products

# Auth
API_KEY=dev123

# OpenAI
OPENAI_API_KEY=sk-...
OPENAI_EMBED_MODEL=text-embedding-3-small        # 1536-dim output

3 · Daily Workflow Cheatsheet

All docker compose commands assume you’re in the project root. If you’re inside the docker/ folder, add -f docker/docker-compose.yml to every command.

Task	Command
Start the full stack	`docker compose -f docker/docker-compose.yml up -d`
Stop the full stack	`docker compose -f docker/docker-compose.yml down`
Reset and wipe all data	`docker compose -f docker/docker-compose.yml down -v`
Re-embed & reload catalog	`poetry run python scripts/load_full.py`
Open SQL shell	`docker exec -it vectordb psql -U postgres -d products`
FastAPI dev-server (no Docker)	`poetry run uvicorn app:create_app --factory --reload`

4 · API Quickstart

Assumes containers are up and API key is set via export. All endpoints return JSON; for improved readability, all curl outputs are piped through jq .. (See .env for credentials and settings.)

# 0️⃣  Export your API key from .env (run this first)
export API_KEY=dev123

# Health
curl -H "X-API-Key:$API_KEY" http://localhost:8000/health | jq .

# 1️⃣  Single embedding
curl -X POST http://localhost:8000/embed \
     -H "X-API-Key:$API_KEY" -H "Content-Type:application/json" \
     -d '{"text":"wireless waterproof earbuds"}' | jq .

# 2️⃣  Top-k similarity search
curl -G http://localhost:8000/similar \
     -H "X-API-Key:$API_KEY" \
     --data-urlencode "q=wireless waterproof earbuds" \
     --data-urlencode "k=5" | jq .

# 3️⃣  Question-answering (RAG style)
curl -X POST http://localhost:8000/answer \
     -H "X-API-Key:$API_KEY" -H "Content-Type:application/json" \
     -d '{"question":"What running shoes do you stock?","k":3}' | jq .

# 4️⃣  LangChain-powered Q&A
curl -X POST http://localhost:8000/langchain-answer \
     -H "X-API-Key:$API_KEY" -H "Content-Type:application/json" \
     -d '{"question":"Do you carry trail shoes?","k":3}' | jq .

Note: If you haven’t installed jq, you can do so with sudo apt-get install jq (Debian/Ubuntu) or brew install jq (macOS).

Interactive API documentation (Swagger / ReDoc)

Because the backend is built with FastAPI, full OpenAPI/Swagger docs are auto-generated. Once the stack is running you can explore every endpoint in a browser:

# start services if they are not already up
docker compose -f docker/docker-compose.yml up -d

Open http://localhost:8000/docs for the Swagger-UI (try requests directly).
Or visit http://localhost:8000/redoc for the ReDoc style.

Both pages respect the same X-API-Key header—enter your key once in the UI and it will be added to each call automatically.

Front-end Demo

A minimal static UI with Alpine.js and Tailwind CSS lives in frontend/. Here's how to run the complete stack:

Quick Start (3 steps)

# 1. Start the backend (API + Database)
docker compose -f docker/docker-compose.yml up -d

# 2. Start the frontend
python -m http.server 8080 --directory frontend

# 3. Open in browser
open http://localhost:8080

What you get

Interactive search UI at http://localhost:8080
Semantic product search with real-time results
Q&A capabilities powered by LangChain + OpenAI
Responsive design that works on mobile and desktop

Features

Search input with loading spinner
Markdown-rendered answers from the AI
Product cards with similarity scores
Hover effects and smooth animations
Error handling for API failures

API Endpoints (for reference)

Backend API docs: http://localhost:8000/docs
Health check: curl -H "X-API-Key:dev123" http://localhost:8000/health

5 · Testing locally

Note: All docker compose commands assume you’re in the project root. If you’re inside the docker/ folder, append -f docker/docker-compose.yml to each command.

# ① Start only the Postgres + pgvector service
docker compose -f docker/docker-compose.yml up -d db

# ② Run all tests (unit + integration)
poetry run pytest -q

# ③ (Optional) Tear down the database when you’re done
docker compose -f docker/docker-compose.yml down

If the database isn’t running when you invoke pytest, any integration tests that depend on it will be automatically skipped.

6 · Data durability

Vector data live in the named Docker volume pg_data.

Safe	Data lost
`docker compose down`	`docker compose down -v`
Container restarts	`docker volume rm pg_data`

Use standard docker volume commands (or pg_dump) to back up.

7 · Stand-alone scripts & `import app`

Every script under scripts/ now prepends this snippet:

import pathlib, sys
repo_root = pathlib.Path(__file__).resolve().parents[1]
sys.path.insert(0, str(repo_root))

so you can run them from any path without fiddling with PYTHONPATH.

8 · FAQ

“Vectors disappeared!” Did you use down -v or delete pg_data? Restore a backup or rerun load_full.py.
Change model dimension? Add a migration resizing embedding, set OPENAI_EMBED_MODEL, re-embed.
Port collision? Tweak ports in docker/docker-compose.yml.

Enjoy exploring semantic search with a real Postgres back-end! Pull requests & issues welcome.

`debug_prompt.py`

Use this helper to inspect the exact prompt built for the LangChain pipeline. It retrieves product context, fills the template and prints both the prompt and the model's answer.

poetry run python scripts/debug_prompt.py

The output shows the fully rendered prompt followed by the LLM response so you can iterate on prompt wording.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.github/workflows		.github/workflows
app		app
data		data
docker		docker
frontend		frontend
screenshots		screenshots
scripts		scripts
sql		sql
tests		tests
.coveragerc		.coveragerc
.env_example		.env_example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Planning.md		Planning.md
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

product-knowledge-vector-db

Table of Contents

0 · What you get

1 · First-time setup

Optionally seed / update the catalog

Try a quick CLI search

2 · Environment Variables

3 · Daily Workflow Cheatsheet

4 · API Quickstart

Interactive API documentation (Swagger / ReDoc)

Front-end Demo

Quick Start (3 steps)

What you get

Features

API Endpoints (for reference)

5 · Testing locally

6 · Data durability

7 · Stand-alone scripts & `import app`

8 · FAQ

`debug_prompt.py`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

product-knowledge-vector-db

Table of Contents

0 · What you get

1 · First-time setup

Optionally seed / update the catalog

Try a quick CLI search

2 · Environment Variables

3 · Daily Workflow Cheatsheet

4 · API Quickstart

Interactive API documentation (Swagger / ReDoc)

Front-end Demo

Quick Start (3 steps)

What you get

Features

API Endpoints (for reference)

5 · Testing locally

6 · Data durability

7 · Stand-alone scripts & import app

8 · FAQ

debug_prompt.py

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

7 · Stand-alone scripts & `import app`

`debug_prompt.py`

Packages