KiranaAI — Visual Billing Counter

A local grocery/retail billing system that uses multimodal AI to identify products from photos and populate the bill automatically. Built for offline-first operation in small Indian retail shops.

📸 Visual Gallery

Desktop Dashboard	Mobile View

The Problem

Manual billing is slow and error-prone. Barcode scanners require label infrastructure. KiranaAI instead scans a product photo and adds it to the bill — no barcodes needed.

The harder sub-problem: AI models go stale. Gemma 4 knows common products from training, but a new local brand or a recently launched SKU is invisible to it. The system must handle these gracefully without needing retraining.

Architecture

graph TD
    A[Scan Product Image] --> B{Parallel Inference}
    subgraph "Hybrid AI Engine"
    B --> C[Gemma 4: LLM]
    B --> D[SigLIP: Vision Encoder]
    C --> E[Static Catalog Lookup]
    D --> F[Qdrant: Vector Search]
    end
    E --> G[Outcome Engine]
    F --> G
    G --> H{Evaluation}
    H -- "Confident Match" --> I[Auto-Add to Cart]
    H -- "Gemma Unknown" --> J[Show SigLIP Suggestions]
    H -- "Conflict Detected" --> K[Manual Confirmation]
    J --> L[User Selects/Adds New]
    K --> L
    L --> M[Update Custom Catalog & Qdrant]

Why two models?

Model	Strength	Weakness
Gemma 4 (via Ollama)	Fast, reasons from training on millions of products	Can't learn new products without retraining; sometimes maps visually similar products to the wrong SKU
SigLIP (google/siglip-base-patch16-224)	Purely visual, reasons from actual photos you've added	Only knows products you've explicitly indexed

SigLIP acts as both a fallback (when Gemma fails) and a validator (when Gemma misclassifies confidently). If SigLIP has a photo of the correct product and its cosine similarity score is high enough, it overrides Gemma's guess by triggering a human confirmation step.

Conflict detection

When Gemma identifies a catalog item and SigLIP simultaneously finds a different product in the custom catalog at score ≥ 0.72 (with no significant word overlap in the names), the system stops and shows the user both candidates. Neither is auto-added. The user picks, and the correct item goes into the bill.

This catches the "Gemma confidently misidentifies visually similar products" failure mode — e.g., Gemma calling a Redmi Note 4 a Samsung Galaxy S23 because both are candybar smartphones.

Tech Stack

Component	Choice
Backend	FastAPI + Uvicorn
Primary classifier	Gemma 4 (`gemma4:e4b`) via Ollama
Image embeddings	SigLIP (`google/siglip-base-patch16-224`, 768-dim)
Vector DB	Qdrant (local file mode, no Docker)
Frontend	Vanilla HTML/CSS/JS
Image storage	Filesystem (`./product_images/`)

User Flows

Happy path — known product

Scan Samsung Galaxy S23 box
→ Gemma returns SKU-011, SigLIP has nothing conflicting
→ Product auto-added to bill with price
→ "Not this product?" button shown for manual override

Unknown product — single photo

Scan a new local brand snack
→ Gemma returns "unknown"
→ SigLIP searches custom catalog, finds similar items if any
→ User picks from visual matches OR clicks "Add as New Product"
→ Modal opens with live camera preview
→ User can capture multiple angles (up to 3) directly from webcam
→ Or upload photos from device by clicking upload zone
→ Fills name/price/category, reviews thumbnail previews
→ Removes any photo using X button if needed
→ Clicks "Save & Add to Cart"
→ Product indexed in Qdrant with all photos immediately
→ Next scan of same product: SigLIP finds it at high confidence

Unknown product — multi-photo capture

Adding new snack product
→ Click "Add as New Product"
→ Modal shows live camera feed in preview area
→ Click "📸 Capture from Camera" to capture first angle
→ Purple-bordered thumbnail appears below showing captured image
→ Rotate product, click capture again for second angle
→ Green-bordered thumbnail appears
→ Adjust lighting, click capture for third angle (max limit)
→ See all three thumbnails in preview strip (can remove any with X)
→ Alternatively click upload zone to add file images (default border)
→ Fill product details (name/price/category/brand/unit)
→ Click "Save & Add to Cart"
→ All images embedded and indexed in Qdrant simultaneously

Misclassification (conflict) — hybrid detection

Scan Xiaomi Redmi Note 4
→ Gemma returns Samsung Galaxy S23 (wrong, high confidence)
→ SigLIP runs in parallel, finds Redmi Note 4 at score 0.84
→ Score ≥ 0.72 threshold AND names have zero word overlap
→ Conflict triggered automatically
→ Modal shows: Gemma's guess (amber/tan) + SigLIP's match (purple)
→ User clicks the correct product (Redmi) → added to bill
→ System learns from user selection for future scans

How conflict detection works:
- Word overlap check: names are split on whitespace
- Only words > 2 chars count (filters "a", "s23" style artifacts)  
- "Samsung Galaxy S23" vs "Xiaomi Redmi Note 4": zero overlap → conflict
- "Apple iPhone 14" vs "Apple iPhone 14 Pro": significant overlap → no conflict
- Gemma's match is trusted if SigLIP doesn't have high-confidence alternative

Manual correction — user override

Gemma auto-added Samsung S23 but it's wrong
→ User clicks "✏ Not this product?" button (orange)
→ Item removed from cart immediately
→ Vector search runs on the original captured image
→ Modal shows all SigLIP custom catalog matches + full static catalog
→ User picks correct product → added to bill
→ No permanent learning (user might have made a mistake)

Manage custom catalog

User clicks "Manage Custom Catalog" button
→ Modal displays all manually-added products as cards
→ Each card shows: thumbnail, name, price, category, brand, unit
→ User clicks trash icon on a card to delete
→ Confirmation: "This also removes the stored photos"
→ Product is removed from Qdrant (all vectors with product_id)
→ Photos deleted from filesystem
→ Catalog updated in real-time

Setup

Prerequisites

Python 3.10+
Ollama running locally with gemma4:e4b pulled

Run

cd /home/dedsec/Kirana/Plan2/amd_hackathon
bash run.sh

run.sh creates a venv, installs PyTorch CPU-only (to avoid a 2 GB CUDA download), installs remaining deps, kills anything on port 6001, and starts the server.

Open http://localhost:6001 in a browser.

First run: SigLIP (~400 MB) downloads from HuggingFace once and caches. Qdrant collection is created empty.

API

Method	Path	Purpose
`POST`	`/api/inference`	Image → Gemma + SigLIP parallel classification
`POST`	`/api/vector-search`	Image → SigLIP-only search (used by correction flow)
`POST`	`/api/add-product`	Add new product with photos to vector DB
`GET`	`/api/catalog`	Fetch the static 12-SKU products.json
`GET`	`/api/vector-products`	List all custom products in Qdrant
`DELETE`	`/api/delete-product/{id}`	Remove product vectors + photos from disk
`GET`	`/api/images/{id}/{file}`	Serve product thumbnail/photos

Configuration (main.py)

Variable	Default	Description
`SIM_THRESHOLD`	`0.60`	Minimum cosine similarity for SigLIP to return a result
`CONFLICT_THRESHOLD`	`0.72`	SigLIP score above which a name disagreement triggers user confirmation
`TOP_K`	`5`	Max results returned from vector search
`MODEL_NAME`	`gemma4:e4b`	Ollama model tag

How the System Works

Image Processing Pipeline

Input optimization: Images are resized to 512×512 max, compressed to JPEG at 75–85% quality (12–40 KB typical)
Parallel inference: Gemma 4 and SigLIP run simultaneously via asyncio.gather():
- Gemma receives prompt with static catalog and image → returns JSON with SKU, name, confidence
- SigLIP encodes image to 768-dim vector → searches Qdrant for similar products
Result merging: Backend enriches Gemma's result with catalog metadata (price, category, brand, unit)
Conflict resolution: If SigLIP finds high-confidence alternative with no name overlap, user confirms

Vector Store (Qdrant)

Collection: Single products collection with 768-dim cosine-distance vectors
Deduplication: Multiple images of same product stored as separate vectors, but queries deduplicate by product_id
Per-product indexing: When adding product with 3 photos, creates 3 separate vectors all tagged with same product_id
Lookup time: ~5ms for 1000 vectors (negligible compared to Gemma/SigLIP latency)
Scaling: Local file-based storage (no Docker), suitable for 10K+ products

Image Management

Storage: Photos saved to ./product_images/{product_id}/
- thumbnail.jpg (256×256) — displayed in modals and manage view
- image_1.jpg, image_2.jpg, image_3.jpg (512×512) — used for embedding
Cleanup: Deleting product removes all vectors + entire image directory
Multi-angle strategy: 3 photos from different angles → 3 vectors → higher recall for lighting/orientation variations

Project Structure

amd_hackathon/
├── main.py           # FastAPI backend — inference, vector ops, file serving
├── script.js         # Frontend — scan flow, cart, modals, conflict UI
├── index.html        # UI markup
├── style.css         # Glassmorphism dark theme
├── products.json     # Static 12-SKU catalog (what Gemma is prompted with)
├── requirements.txt  # Python dependencies (torch installed separately)
├── run.sh            # One-command setup and start
├── qdrant_db/        # Qdrant persistent storage (gitignored)
└── product_images/   # Stored product photos (gitignored)

Performance

Gemma inference: ~2–4s per image (CPU, local Ollama)
SigLIP encoding: ~1s per image (CPU)
Qdrant search: ~5ms for 1000 vectors
Both models run in parallel — total latency is max(Gemma, SigLIP), not sum

UI Features

Camera Preview in Add-Product Modal

Shared stream: Modal camera reuses the same MediaStream from main scan camera — no additional permissions
Live preview: User sees what they're about to capture in 16:9 aspect ratio
Real-time: No lag between main camera and modal preview
Canvas capture: Captures are drawn to canvas, converted to JPEG blob, stored in-memory until form submit

Photo Management

Multi-source capture: Can add photos from camera (live capture), file upload, and original scan in same product
Visual indicators:
- Purple border = original scan image (from scan/upload that triggered "Add as New Product")
- Green border = camera-captured image (from modal preview)
- Default border = file-uploaded image
Removable: Each thumbnail has X button in top-right corner — click to remove and retake
Preview strip: Shows all selected images below form fields for review before submit
Limit enforcement: Maximum 3 images total across all sources

Manage Custom Catalog

Card layout: Each product shown as card with thumbnail, name, price, category, brand, unit
Delete action: Click trash icon to remove product (also deletes all photos from disk and vectors from Qdrant)
Real-time sync: Catalog updates immediately after deletion
Visual feedback: Smooth transitions when removing products

Best Practices

Accuracy Tuning

Adding photos improves accuracy:
- 1 photo: ~70% recall for SigLIP (limited angles, lighting sensitivity)
- 2 photos: ~85% recall (covers some rotation/lighting variation)
- 3 photos: ~95% recall (covers most real-world scanning scenarios)
- Best practice: capture front, back, and an angled view
Conflict threshold tuning (CONFLICT_THRESHOLD):
- Current value (0.72) is conservative — catches most misclassifications
- Too many false conflicts? Raise to 0.75–0.80 (requires higher SigLIP confidence)
- Missing real conflicts? Lower to 0.65–0.70 (catches subtler disagreements)
- Disabling conflicts? Set to 1.0 (not recommended — re-enables Gemma misclassification issues)
Similarity threshold tuning (SIM_THRESHOLD):
- Current value (0.60) is balanced
- Raising it (0.65+) reduces false positives but may miss similar products
- Lowering it (0.55-) increases recall but shows more unrelated items

Data Management

The custom vector store grows over time: Each new product add increases coverage for future scans
- After 20–30 products: SigLIP fallback becomes very effective
- After 100+ products: Rarely need Gemma misclassification override
Qdrant persistence: All vectors and metadata survive server restarts (stored in ./qdrant_db/)
Image cleanup: Deleting a product removes both vectors and photos — no orphaned images
Backup strategy: Copy ./qdrant_db/ and ./product_images/ directories for backup

Performance Optimization

Parallel inference: Gemma and SigLIP run simultaneously — total latency is ~max(Gemma, SigLIP), not sum
- Expected: 3–5s total per scan (not 4–5s sequential)
Image optimization: Frontend compresses images to 75% JPEG quality before upload
- Typical size: 15–20 KB per image (40 KB max even for high-res photos)
- Reduces network latency for slower connections
Vector search speed: Qdrant queries complete in ~5ms even with 1000+ indexed vectors
Thumbnail generation: Automatic from first image, further saves storage

Troubleshooting

Camera not showing in modal

Check permissions: Browser must have camera access (check address bar)
Ensure main camera is initialized: Click "Scan & Add Item" first to initialize video stream
Try different browser: Some browsers have stricter MediaStream sharing policies

Photos not recognized after adding

Check count: Single photo is weaker signal — add 2–3 angles for better recall
Check lighting: SigLIP is sensitive to lighting changes — capture in consistent light
Wait for embedding: Qdrant indexing is synchronous, but may take 1–2s per image

SigLIP finding wrong products

Lower SIM_THRESHOLD: If genuinely similar products are being confused, adjust threshold
Add distinguishing photos: Capture unique angles that differ from similar products
Delete and re-add: Remove product and re-add with better photos from different angles

Too many conflict dialogs

Raise CONFLICT_THRESHOLD: Current setting catches subtle disagreements — may be too aggressive
Verify Gemma catalog: Some visually similar products in the base catalog may need manual correction

Notes

Data stays local: All inference runs on-device via Ollama and local SigLIP. No images sent to cloud.
Offline operation: Once models are cached, system works fully offline (no internet needed for inference)
Product photos archived: Deleted products have images fully removed from disk, no storage bloat
Vector deduplication: Same product from different photos creates multiple vectors (one per image) but deduplicated in results — you see product once with highest score

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
showcase		showcase
.gitignore		.gitignore
.vercelignore		.vercelignore
README.md		README.md
index.html		index.html
main.py		main.py
products.json		products.json
requirements.txt		requirements.txt
run.sh		run.sh
script.js		script.js
style.css		style.css
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

KiranaAI — Visual Billing Counter

📸 Visual Gallery

The Problem

Architecture

Why two models?

Conflict detection

Tech Stack

User Flows

Happy path — known product

Unknown product — single photo

Unknown product — multi-photo capture

Misclassification (conflict) — hybrid detection

Manual correction — user override

Manage custom catalog

Setup

Prerequisites

Run

API

Configuration (main.py)

How the System Works

Image Processing Pipeline

Vector Store (Qdrant)

Image Management

Project Structure

Performance

UI Features

Camera Preview in Add-Product Modal

Photo Management

Manage Custom Catalog

Best Practices

Accuracy Tuning

Data Management

Performance Optimization

Troubleshooting

Camera not showing in modal

Photos not recognized after adding

SigLIP finding wrong products

Too many conflict dialogs

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages