Skip to content

DimWebDev/product-knowledge-vector-db.

Repository files navigation

product-knowledge-vector-db

Tip

Semantic Product Knowledge Base

Tired of endlessly scrolling through online stores, trying to find the right product? Traditional keyword search and filters often miss the mark. This project brings semantic search to your product catalog, making it faster and more intuitive for users to find what they want.

How does it work?

  • All your products and their descriptions are converted into vectors (embeddings) using an embeddings API (like OpenAI).
  • When a user searches, their query is also embedded into a vector.
  • The system compares the query vector to all product vectors in your database, finding the most relevant matches—even if the keywords don’t exactly match.

This approach delivers more accurate, natural, and satisfying search results for your customers.


This project provides a complete open-source example stack for building a semantic product search and Q&A API over your own data:

  • Uses Postgres + pgvector for fast vector similarity search
  • Uses OpenAI for embeddings and answers
  • Exposes a simple FastAPI backend
  • All product data, embeddings, and retrieval logic are managed in your own database (no vendor lock-in)

Ideal for real-world RAG (retrieval-augmented generation), explainable product recommendations, and natural language search in commerce and catalog applications.

Semantic Pruduct Search UI


Table of Contents

  1. What you get
  2. First-time setup
  3. Environment Variables
  4. Daily Workflow Cheatsheet
  5. API Quickstart
  6. Testing locally
  7. Data durability
  8. Stand-alone scripts & import app
  9. FAQ
  10. Debugging prompts


0 · What you get

Slice Highlights
Vector DB Postgres 15 image with pgvector, schema migrations tracked in sql/
Backend API FastAPI (app/**) exposing /embed, /similar, /answer, /langchain-answer, protected by an API-Key header
Async helpers Thin, reusable modules for OpenAI calls (app/openai_utils.py) and Postgres access (app/db.py)
Data & ETL CSV → JSONL converter, batch loader that embeds + writes to Postgres
CLI tooling Simple curl commands let you try semantic search from the terminal
Docker stack docker-compose.yml spins up the DB and the API container
Tests / CI-ready Pure-unit + integration tests (>90 % for app/), ¶ examples for GitHub Actions

1 · First-time setup

# ① Install Python deps (isolated virtual-env handled by Poetry)
poetry install

# ② Start the full stack (db + api) in Docker
docker compose -f docker/docker-compose.yml up -d        # or:  docker-compose …

Heads-up: The first container start will run all SQL migrations automatically.

Optionally seed / update the catalog

# Convert vendor CSV → JSONL (one-off)
poetry run python scripts/csv_to_jsonl.py data/products_full.csv

# Embed every product & bulk-insert (async, OpenAI key required)
poetry run python scripts/load_full.py

Try a quick CLI search

curl -G http://localhost:8000/similar \
     -H "X-API-Key:$API_KEY" \
     --data-urlencode "q=wireless waterproof earbuds" \
     --data-urlencode "k=8"

2 · Environment Variables

Copy .env_example to .env (both scripts and containers read it) and review/customize the values:

cp .env_example .env
# Postgres
PGHOST=localhost
PGPORT=5432
PGUSER=postgres
PGPASSWORD=postgres
PGDATABASE=products
# optional—if omitted, the app assembles one from the PG* vars
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/products

# Auth
API_KEY=dev123

# OpenAI
OPENAI_API_KEY=sk-...
OPENAI_EMBED_MODEL=text-embedding-3-small        # 1536-dim output

3 · Daily Workflow Cheatsheet

All docker compose commands assume you’re in the project root. If you’re inside the docker/ folder, add -f docker/docker-compose.yml to every command.

Task Command
Start the full stack docker compose -f docker/docker-compose.yml up -d
Stop the full stack docker compose -f docker/docker-compose.yml down
Reset and wipe all data docker compose -f docker/docker-compose.yml down -v
Re-embed & reload catalog poetry run python scripts/load_full.py
Open SQL shell docker exec -it vectordb psql -U postgres -d products
FastAPI dev-server (no Docker) poetry run uvicorn app:create_app --factory --reload

4 · API Quickstart

Assumes containers are up and API key is set via export. All endpoints return JSON; for improved readability, all curl outputs are piped through jq .. (See .env for credentials and settings.)

# 0️⃣  Export your API key from .env (run this first)
export API_KEY=dev123

# Health
curl -H "X-API-Key:$API_KEY" http://localhost:8000/health | jq .

# 1️⃣  Single embedding
curl -X POST http://localhost:8000/embed \
     -H "X-API-Key:$API_KEY" -H "Content-Type:application/json" \
     -d '{"text":"wireless waterproof earbuds"}' | jq .

# 2️⃣  Top-k similarity search
curl -G http://localhost:8000/similar \
     -H "X-API-Key:$API_KEY" \
     --data-urlencode "q=wireless waterproof earbuds" \
     --data-urlencode "k=5" | jq .

# 3️⃣  Question-answering (RAG style)
curl -X POST http://localhost:8000/answer \
     -H "X-API-Key:$API_KEY" -H "Content-Type:application/json" \
     -d '{"question":"What running shoes do you stock?","k":3}' | jq .

# 4️⃣  LangChain-powered Q&A
curl -X POST http://localhost:8000/langchain-answer \
     -H "X-API-Key:$API_KEY" -H "Content-Type:application/json" \
     -d '{"question":"Do you carry trail shoes?","k":3}' | jq .

Note: If you haven’t installed jq, you can do so with sudo apt-get install jq (Debian/Ubuntu) or brew install jq (macOS).

Interactive API documentation (Swagger / ReDoc)

Because the backend is built with FastAPI, full OpenAPI/Swagger docs are auto-generated. Once the stack is running you can explore every endpoint in a browser:

# start services if they are not already up
docker compose -f docker/docker-compose.yml up -d

Both pages respect the same X-API-Key header—enter your key once in the UI and it will be added to each call automatically.

Front-end Demo

A minimal static UI with Alpine.js and Tailwind CSS lives in frontend/. Here's how to run the complete stack:

Quick Start (3 steps)

# 1. Start the backend (API + Database)
docker compose -f docker/docker-compose.yml up -d

# 2. Start the frontend
python -m http.server 8080 --directory frontend

# 3. Open in browser
open http://localhost:8080

What you get

  • Interactive search UI at http://localhost:8080
  • Semantic product search with real-time results
  • Q&A capabilities powered by LangChain + OpenAI
  • Responsive design that works on mobile and desktop

Features

  • Search input with loading spinner
  • Markdown-rendered answers from the AI
  • Product cards with similarity scores
  • Hover effects and smooth animations
  • Error handling for API failures

API Endpoints (for reference)

FASTAPI Docs


5 · Testing locally

Note: All docker compose commands assume you’re in the project root. If you’re inside the docker/ folder, append -f docker/docker-compose.yml to each command.

# ① Start only the Postgres + pgvector service
docker compose -f docker/docker-compose.yml up -d db

# ② Run all tests (unit + integration)
poetry run pytest -q

# ③ (Optional) Tear down the database when you’re done
docker compose -f docker/docker-compose.yml down

If the database isn’t running when you invoke pytest, any integration tests that depend on it will be automatically skipped.


6 · Data durability

Vector data live in the named Docker volume pg_data.

Safe Data lost
docker compose down docker compose down -v
Container restarts docker volume rm pg_data

Use standard docker volume commands (or pg_dump) to back up.


7 · Stand-alone scripts & import app

Every script under scripts/ now prepends this snippet:

import pathlib, sys
repo_root = pathlib.Path(__file__).resolve().parents[1]
sys.path.insert(0, str(repo_root))

so you can run them from any path without fiddling with PYTHONPATH.


8 · FAQ

  • “Vectors disappeared!” Did you use down -v or delete pg_data? Restore a backup or rerun load_full.py.
  • Change model dimension? Add a migration resizing embedding, set OPENAI_EMBED_MODEL, re-embed.
  • Port collision? Tweak ports in docker/docker-compose.yml.

Enjoy exploring semantic search with a real Postgres back-end! Pull requests & issues welcome.


debug_prompt.py

Use this helper to inspect the exact prompt built for the LangChain pipeline. It retrieves product context, fills the template and prints both the prompt and the model's answer.

poetry run python scripts/debug_prompt.py

The output shows the fully rendered prompt followed by the LLM response so you can iterate on prompt wording.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors