VisionQuery

VisionQuery is a lightweight text-to-image semantic search system that serves a pretrained multimodal model behind an API, indexes image embeddings, and returns ranked results via cosine similarity.

It’s a backend-first MVP focused on clarity and explainability: simple vector storage, clean endpoints, containerized dev, and basic observability.

What it does

Ingest images into an embedding index
Embed text + images into the same CLIP vector space
Search by natural language and retrieve the most similar images
Expose metrics for monitoring (Prometheus-compatible)

Architecture

High-level system

flowchart LR
  C[Client<br/>curl / UI] -->|HTTP| API[FastAPI Backend]

  API --> E[CLIP Embedder<br/>text + image]
  API --> VS[Vector Store<br/>in-memory cosine]

  API -->|/metrics| P[Prometheus]
  P --> G[Grafana]

  subgraph Data
    IMG[(Local Images<br/>/data/images)]
  end

  IMG -->|ingest path| API
  E -->|embeddings| VS
  VS -->|top-k results| API

Request flow

sequenceDiagram
  autonumber
  participant U as User
  participant A as FastAPI
  participant M as CLIP Embedder
  participant V as Vector Store

  U->>A: POST /ingest/image (path)
  A->>M: embed(image)
  M-->>A: image_embedding
  A->>V: add(image_embedding, metadata)
  A-->>U: {status: ok}

  U->>A: POST /search (query, top_k)
  A->>M: embed(text)
  M-->>A: text_embedding
  A->>V: cosine_search(text_embedding, top_k)
  V-->>A: ranked_results
  A-->>U: results + similarity scores

Tech stack

API: FastAPI
Embeddings: pretrained CLIP (text + image)
Search: in-memory vector store (cosine similarity)
Observability: Prometheus + Grafana
Infra: Docker + Docker Compose

Repository layout

This is the meaningful layout (excluding node_modules/, build artifacts, etc.).

VisionQuery/
├── backend/
│   ├── app/
│   │   ├── main.py          # API routes + orchestration + metrics
│   │   ├── embeddings.py    # CLIP embedding (text + image)
│   │   └── vector_store.py  # in-memory similarity search
│   ├── Dockerfile
│   └── requirements.txt
├── frontend/                # optional React UI
├── data/
│   └── images/              # local images (not tracked)
├── monitoring/
│   └── prometheus/
│       └── prometheus.yml   # Prometheus scrape config
├── docker-compose.yml
└── README.md

API

Health

GET /health

Returns a simple status payload for Docker + monitoring checks.

Ingest image

POST /ingest/image

Body:

{ "path": "data/images/example.jpg" }

Embeds the image and stores its vector + metadata in the in-memory index.

Search

POST /search

Body:

{ "query": "a red car on the street", "top_k": 5 }

Returns the top-k most similar images with cosine similarity scores.

Metrics

GET /metrics

Prometheus-compatible metrics (request counts, latency).

Quickstart (Docker)

Prerequisites

Docker
Docker Compose

Run everything

From the repo root:

docker compose up --build

Services

Backend API: http://localhost:8000
Frontend UI: http://localhost:5173
Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (default: admin / admin)

Stop

docker compose down

Reset (including volumes)

docker compose down -v

Local development (without Docker)

Docker is recommended for consistent infra. Use local runs for faster iteration.

Backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend (optional)

cd frontend
npm install
npm run dev

Usage examples

Ingest an image

curl -X POST http://localhost:8000/ingest/image \
  -H "Content-Type: application/json" \
  -d '{"path":"data/images/example.jpg"}'

Search

curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query":"a dog on a beach","top_k":5}'

Notes and design choices

Lazy model loading: the CLIP model loads on first request to keep container startup fast and health checks reliable.
In-memory index: simple to understand and easy to swap later (FAISS / pgvector).
Backend-first: the core deliverable is a clean API and system design; the UI is optional.

Limitations (intentional)

Index resets on restart (no persistent vector store)
No authentication / access control
No batching or async inference
Not tuned for large-scale indexing

These tradeoffs keep the system small, explainable, and interview-friendly.

Possible extensions

Replace the in-memory store with FAISS or pgvector
Persist metadata + vectors in a database
Add image upload support (instead of file paths)
Add tracing + richer dashboards
Batch embedding + async job queue for higher throughput

References

CLIP (OpenAI): Learning Transferable Visual Models From Natural Language Supervision
FastAPI documentation
Prometheus instrumentation + exposition formats
Grafana documentation
Docker Compose documentation

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
data/images		data/images
frontend		frontend
monitoring/prometheus		monitoring/prometheus
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionQuery

What it does

Architecture

High-level system

Request flow

Tech stack

Repository layout

API

Health

Ingest image

Search

Metrics

Quickstart (Docker)

Prerequisites

Run everything

Services

Stop

Reset (including volumes)

Local development (without Docker)

Backend

Frontend (optional)

Usage examples

Ingest an image

Search

Notes and design choices

Limitations (intentional)

Possible extensions

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VisionQuery

What it does

Architecture

High-level system

Request flow

Tech stack

Repository layout

API

Health

Ingest image

Search

Metrics

Quickstart (Docker)

Prerequisites

Run everything

Services

Stop

Reset (including volumes)

Local development (without Docker)

Backend

Frontend (optional)

Usage examples

Ingest an image

Search

Notes and design choices

Limitations (intentional)

Possible extensions

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages