Skip to content

JMkrish/TextileSearchApp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TextileSearchApp — Local Visual Search Engine

A fully local, web-based visual search engine for fabric design thumbnails. It uses OpenAI's CLIP for multimodal embeddings and FAISS for fast similarity search. You can search by text (e.g., "red floral fabric pattern") or by drag-and-drop image, and get the top 10 most similar thumbnails with their corresponding TIF paths and similarity scores.


Table of Contents


Features

  • Dual query modes

    • Text search: Natural-language queries (e.g., "blue geometric stripes", "floral print"). Thumbnail filenames are treated as searchable metadata: if the query terms appear in the file name (e.g. dupatta_floral.jpg, 3inch_border_saree.png), those results are ranked at the top even when CLIP visual similarity is lower.
    • Image search: Drag-and-drop or file-pick an image to find visually similar thumbnails.
  • Fully local

    • No cloud APIs; CLIP model and FAISS index run on your machine. Internet needed only for initial setup (downloading CLIP weights and dependencies).
  • Efficient search

    • One-time indexing: all thumbnails are embedded with CLIP at index-build time.
    • FAISS provides fast approximate/exact nearest-neighbor search on embeddings.
  • Flexible image input

    • Handles mixed JPG/PNG thumbnails with non-uniform dimensions; CLIP preprocessing (resize/normalize) is applied automatically.
  • Clear results

    • Top 10 matches with similarity scores, thumbnail previews, and paths to the original TIF files.
  • Cross-platform

    • Python backend (Flask), HTML/CSS/JS frontend. Works on Linux, macOS, and Windows.

Tech Stack

Component Technology
Embeddings OpenAI CLIP (ViT-B/32 by default)
Vector search FAISS (CPU; faiss-cpu)
Image I/O Pillow (PIL)
Backend Flask 3.x, Python 3.10+
Frontend HTML5, CSS3, vanilla JavaScript
Deep learning PyTorch, torchvision

All dependencies are open-source. Optional GPU support via PyTorch CUDA for faster indexing and querying.


Project Structure

TextileSearchApp/
├── README.md                 # This file
├── requirements.txt          # Python dependencies
├── config.yaml               # Main configuration (paths, index, server)
├── config.example.yaml       # Template with comments
├── backend/
│   ├── app.py                # Flask web app & API
│   ├── config_loader.py      # Loads config.yaml + env overrides
│   ├── search_engine.py      # CLIP + FAISS search engine
│   ├── indexer.py            # CLI: full index build (one-time or rebuild)
│   ├── incremental_indexer.py # CLI: add new thumbnails to existing index
│   ├── download_clip_model.py # CLI: download CLIP model with retries
│   ├── data/                 # Generated at index time (create if missing)
│   │   ├── faiss_index.bin   # FAISS index
│   │   └── metadata.npy     # Thumbnail ↔ TIF path mapping
│   └── uploads/              # Unused; image search uses a temp file (deleted after each query)
├── static/
│   └── app.js                # Frontend logic (search, drag-drop, results)
└── templates/
    └── index.html            # Main UI
  • Full indexing: Run indexer.py once to build the index from all thumbnails. It writes backend/data/faiss_index.bin and backend/data/metadata.npy.
  • Incremental indexing: Run incremental_indexer.py periodically to add only new thumbnails without rebuilding the full index (see Adding New Images Periodically).
  • Serving: app.py loads the index at startup and serves the UI and API. Thumbnails are served from your configured thumbnails directory.

Prerequisites

  • Python: 3.10 or newer
  • Disk: Enough space for PyTorch, CLIP, and the FAISS index (roughly 2–4 GB for the stack; index size depends on number of thumbnails)
  • RAM: 4 GB minimum; 8 GB+ recommended for larger datasets
  • Optional: NVIDIA GPU + CUDA for faster CLIP inference (use device="cuda" when building/running)

Installation

1. Clone or copy the project

Ensure you have the TextileSearchApp directory with backend/, static/, templates/, and requirements.txt.

2. Create and activate a virtual environment

cd TextileSearchApp
python3 -m venv .venv
  • Linux / macOS:
    source .venv/bin/activate
  • Windows (PowerShell):
    .venv\Scripts\Activate.ps1

3. Install dependencies

pip install -r requirements.txt

This installs Flask, PyTorch, torchvision, FAISS (CPU), Pillow, NumPy, tqdm, and OpenAI CLIP from GitHub. The first run may download CLIP model weights; after that, the app can run offline.

4. Create index and upload directories (optional; indexer/app create as needed)

mkdir -p backend/data backend/uploads

Configuration

Configuration uses a config file for defaults and environment variables for overrides—a common industry practice (e.g. 12-factor app).

Config file (defaults)

Edit config.yaml in the project root (or copy from config.example.yaml). This is the single source of default values for both the web app and the indexer.

paths:
  thumbnails_dir: "/path/to/thumbnails"   # Your thumbnail images (JPG/PNG)
  tifs_dir: "/path/to/tifs"               # Optional; if empty, thumbnail filename is shown in results

index:
  index_path: "backend/data/faiss_index.bin"
  metadata_path: "backend/data/metadata.npy"
  upload_folder: "backend/uploads"

server:
  host: "0.0.0.0"
  port: 8000
  • Thumbnails directory: All JPG/PNG thumbnails; scanned recursively. If tifs_dir is set, each thumbnail is mapped to a TIF by filename stem (e.g. design_001.pngdesign_001.tif). If tifs_dir is empty, the thumbnail filename is shown in results instead.
  • Paths under index can be relative to the project root or absolute.
  • If your naming or folder layout differs, adjust _thumbnail_to_tif() in backend/search_engine.py.

Environment variable overrides

Any value can be overridden by environment variables (useful for deployment or different machines without editing the file):

Variable Overrides
THUMBNAILS_DIR paths.thumbnails_dir
TIFS_DIR paths.tifs_dir
INDEX_PATH index.index_path
METADATA_PATH index.metadata_path
UPLOAD_FOLDER index.upload_folder
CLIP_MODEL_PATH Local CLIP .pt file path
CONFIG_PATH Path to a different config file
SERVER_HOST / SERVER_PORT server.host / server.port

Optional: put variables in a .env file in the project root; python-dotenv loads it automatically (do not commit secrets to .env if you add any later).

Precedence

Config file (defaults).env (if present) → Environment variables (override). For the indexer, CLI arguments override config and env (e.g. --thumbnails_dir overrides THUMBNAILS_DIR and the config file).


Building the Index

Build the FAISS index and metadata before starting the web app. Re-run whenever you add, remove, or change thumbnails.

From the project root (TextileSearchApp/):

source .venv/bin/activate   # or .venv\Scripts\Activate.ps1 on Windows
cd backend

If config.yaml has paths.thumbnails_dir and paths.tifs_dir set, you can run:

python indexer.py

Otherwise pass paths explicitly (CLI overrides config):

python indexer.py \
  --thumbnails_dir "/path/to/thumbnails" \
  --tifs_dir "/path/to/tifs"

Indexer arguments (all optional when config is set):

Argument Default (from config) Description
--thumbnails_dir config paths.thumbnails_dir Root directory of thumbnail images (JPG/PNG).
--tifs_dir config paths.tifs_dir Root directory of TIF files.
--index_path config index.index_path Where to save the FAISS index.
--metadata_path config index.metadata_path Where to save path metadata.
--device auto (cuda if available) cuda or cpu.
--clip_model_path config / CLIP_MODEL_PATH Local CLIP .pt file.

Example (CPU only):

python indexer.py --thumbnails_dir /data/thumbnails --tifs_dir /data/tifs --device cpu

When finished, you should see data/faiss_index.bin and data/metadata.npy under backend/. The app reads the same paths from config.yaml.


Adding New Images Periodically

When you add new fabric designs (new thumbnails and optionally new TIFs) to your folders, you can update the search index without rebuilding from scratch by using the incremental indexer. This keeps your faiss_index.bin and metadata.npy in sync with the latest designs.

Think of it like this:

  • indexer.py (full index) = build everything from zero.
  • incremental_indexer.py (incremental) = only add the new pictures you just copied in.

You never edit faiss_index.bin manually; the scripts do it for you.

Quick recipe: update with new designs

  1. Copy new thumbnails

    • Put new JPG/PNG thumbnails into the folder configured as paths.thumbnails_dir in config.yaml.
      Example (Linux / Windows WSL):
      paths:
        thumbnails_dir: "/mnt/c/Users/you/Designs/thumbs"
        tifs_dir: "/mnt/c/Users/you/Designs/tifs"   # optional
    • If you also store TIFs, copy the matching TIFs into paths.tifs_dir using the same stem:
      • KD00256_2.pngKD00256_2.tif
  2. Run the incremental indexer

    From the project root (TextileSearchApp/), with your virtual environment activated:

    # 1) Activate the virtual environment (Linux/macOS)
    source .venv/bin/activate
    
    # 2) Go to the backend folder
    cd backend
    
    # 3) Update the index with only the new thumbnails
    python incremental_indexer.py

    On Windows PowerShell (if not using WSL), it looks like:

    .venv\Scripts\Activate.ps1
    cd backend
    python incremental_indexer.py

    What this script does (in simple terms):

    • Loads your existing data/faiss_index.bin and data/metadata.npy.
    • Scans paths.thumbnails_dir for all thumbnail files.
    • Compares them to what is already in metadata.npy.
    • For files that are new:
      • Calculates their CLIP embeddings.
      • Appends them to the FAISS index (faiss_index.bin).
      • Adds entries to metadata.npy (including thumbnail path, TIF path, and file date).
    • Leaves already-indexed files unchanged.

    If there are no new thumbnails, it prints a message and exits without modifying the index.

  3. Restart the web app

    The Flask app keeps a copy of the index in memory. To see the newly indexed designs:

    # In backend/, where app.py lives
    python app.py

    If the app is already running, stop it with Ctrl+C in that terminal, then run python app.py again.

    After restart:

    • The new designs participate in text search (including filename-based matching).
    • The new designs participate in image search.
    • The time filter (Designs: Show all / Up to 1 week / etc.) will treat them as “new” based on their file date.

How often should you run it?

  • Every time you add a batch of new thumbnails (e.g. once a day or once a week).
  • It is safe to run even if there are no new files; it will simply do nothing.

Automating updates (optional)

To add new images automatically on a schedule (e.g. nightly), run the incremental indexer from cron (Linux/macOS) or Task Scheduler (Windows), then restart the app or use a process manager that reloads after the script runs.

Example cron job (Linux, run at 2 a.m. every day):

0 2 * * * cd /path/to/TextileSearchApp/backend && /path/to/.venv/bin/python incremental_indexer.py >> /path/to/TextileSearchApp/incremental.log 2>&1

After this cron runs, make sure your app is restarted or configured to reload (for a simple setup, you can just restart python app.py manually each morning).

When to use full index vs incremental

  • Full index (indexer.py):

    • Use the first time you set up the project.
    • Use when you move/rename many files, or drastically change the contents of thumbnails_dir and tifs_dir.
    • Rebuilds the entire faiss_index.bin and metadata.npy from what is currently on disk.
  • Incremental (incremental_indexer.py):

    • Use when you add new designs but keep the old ones.
    • Only processes the new thumbnails and appends them to the existing index.
    • Much faster when you have thousands of existing images and only a few new ones.

Time filter (design age)

You can restrict search results to designs by age using each file’s modification time at index time.

In the UI

Next to Show X results, a Designs dropdown offers:

  • Show all (default) — no time filter
  • Up to 1 week — only designs whose file date (at index time) is within the last 7 days
  • Up to 1 month, Up to 3 months, Up to 6 months, Up to 1 year — same idea for 30, 90, 180, and 365 days

Changing the dropdown re-runs the current search with the selected filter. Pagination and result count apply to the filtered set.

How it works

  • At index time (full index or incremental), the indexer stores each thumbnail’s file modification time (mtime) in the metadata.
  • When you choose a time range, the backend only returns results whose stored mtime is within that many days from “now”.
  • Existing indices built before this feature have no mtime in metadata; they are treated as very old, so they appear in Show all but not in any “Up to X” filter. To use the time filter on an old index, rebuild the index once (python indexer.py) or run the incremental indexer (new entries will get mtime; old entries still won’t show in time-filtered results until you do a full rebuild).

Running the Application

From the project root:

source .venv/bin/activate
cd backend
python app.py

The server starts at http://0.0.0.0:8000 (all interfaces). Open a browser to:

Ensure config.yaml has valid paths.thumbnails_dir (paths.tifs_dir is optional; if empty, results show the thumbnail filename). Build the index so data/faiss_index.bin and data/metadata.npy exist; otherwise the app will exit at startup. Host and port come from config.yaml or SERVER_HOST / SERVER_PORT.


Web Interface

  • Text query: Type a phrase (e.g. "red floral fabric pattern" or "dupatta floral") and click Search by text or press Enter. Matches in thumbnail filenames (e.g. dupatta_floral.jpg) are shown first; remaining results are ordered by CLIP similarity.
  • Image query: Drag and drop an image onto the drop zone, or click the zone to choose a file (JPG/PNG).
  • Results: Matches are shown in a grid with thumbnail, similarity score, and TIF path. Use Show X results (10/20/50/100) and Designs: Show all | Up to 1 week | … | Up to 1 year to filter by design age (file date at index time). Pagination applies to the (possibly filtered) set.
  • Clear: Resets the text box and results.

Thumbnails in the results are served by the Flask app from THUMBNAILS_DIR. If thumbnails do not show, check Troubleshooting (path/relative path handling).


API Reference

Base URL: http://localhost:8000 (or your host/port).

Text search

  • Endpoint: POST /api/search/text
  • Headers: Content-Type: application/json
  • Body:
    {
      "query": "red floral fabric pattern",
      "top_k": 10
    }
  • Response (200):
    {
      "results": [
        {
          "thumbnail_path": "/path/to/thumbnails/design_001.png",
          "thumbnail_url": "/thumbnails/design_001.png",
          "tif_path": "/path/to/tifs/design_001.tif",
          "score": 0.312
        }
      ],
      "total": 1500
    }
    Optional request fields: offset (for pagination), top_k (1–500, default 10), max_age_days (optional; restrict to designs with file mtime within the last N days; omit or 0 for no filter).
  • Errors: 400 if query is missing/empty; 500 on server errors.

Image search

  • Endpoint: POST /api/search/image
  • Content-Type: multipart/form-data
  • Fields:
    • image: image file (e.g. JPG/PNG)
    • top_k: optional integer (default 10)
    • offset: optional integer (for pagination)
    • max_age_days: optional integer (restrict to designs with file mtime within the last N days; omit or 0 for no filter)
  • Response: Same results structure as text search.
  • Errors: 400 if no file or unsupported type; 500 on server errors.

Thumbnail serving

  • Endpoint: GET /thumbnails/<path:filename>
  • Serves files from THUMBNAILS_DIR. filename should be the path relative to THUMBNAILS_DIR (e.g. if thumbnail is THUMBNAILS_DIR/abc/def.png, use /thumbnails/abc/def.png).

Troubleshooting

"Index file not found" / "Metadata file not found"

  • Run the indexer first and use the same --index_path and --metadata_path as in app.py.
  • Ensure paths in app.py point to existing faiss_index.bin and metadata.npy (e.g. backend/data/).

Thumbnails not loading in the browser

  • The frontend requests thumbnails via /thumbnails/<path>. If metadata stores absolute paths, the frontend may be requesting a path the server doesn’t recognize. Store relative paths in metadata (relative to THUMBNAILS_DIR) and use thumbnail_path as the <path> in /thumbnails/. Adjust search_engine.py in build_index() to save os.path.relpath(p, self.thumbnails_dir) and ensure serve_thumbnail serves with send_from_directory(THUMBNAILS_DIR, filename).

CUDA / GPU errors

  • To force CPU: in both indexer.py and app.py, pass device="cpu" when creating ClipSearchEngine.

Connection reset / URLError when downloading CLIP

  • The first run downloads the CLIP model from the internet; firewalls, proxies, or unstable networks can cause Connection reset by peer or URLError.
  • Fix 1 – Retry script: From backend/ run:
    python download_clip_model.py
    It retries the download with a longer timeout; if it still fails, it prints manual download instructions.
  • Fix 2 – Manual download: Open this URL in a browser (or on another machine with stable internet), then save the file as ~/.cache/clip/ViT-B-32.pt (create ~/.cache/clip if needed):
    https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt
    
    Then run the indexer again; CLIP will use the cached file.
  • Fix 3 – Local path: Download the .pt file anywhere, then run the indexer with:
    python indexer.py ... --clip_model_path /path/to/ViT-B-32.pt
    Or set the environment variable: export CLIP_MODEL_PATH=/path/to/ViT-B-32.pt.

Leading spaces in paths

  • If you pass paths in quotes, avoid a leading space (e.g. use "/mnt/c/Users/..." not " /mnt/c/Users/..."). The indexer now strips whitespace from directory arguments.

CLIP or torch import errors

  • Activate the same venv and reinstall: pip install -r requirements.txt. Use Python 3.10+.

Poor or irrelevant search results

  • Use the same CLIP model (e.g. ViT-B/32) for indexing and querying (default in ClipSearchEngine).
  • Ensure images are valid (not corrupted); CLIP’s preprocess resizes/normalizes automatically.

Large dataset: slow indexing

  • Increase batch size in search_engine.py (build_index(batch_size=128) or 256) if you have enough RAM.
  • Use GPU: --device cuda for the indexer.

Optimization & Scaling

  • Batch size: In ClipSearchEngine.build_index(), increase batch_size (e.g. 128–256) for faster indexing when RAM allows.
  • GPU: Set device="cuda" in the indexer and in app.py for faster embedding and search.
  • Very large corpora (e.g. >100k images): Consider switching from IndexFlatIP to an approximate FAISS index (e.g. IndexIVFFlat or IndexHNSWFlat) for faster search at slight recall trade-off; keep METRIC_INNER_PRODUCT for normalized vectors.
  • Maintenance: For new images, run incremental_indexer.py then restart the app. For a full refresh, re-run indexer.py. Back up data/faiss_index.bin and data/metadata.npy if needed.

License & Credits

This project is for local use and does not send data to external services after initial dependency and model download.

About

App to match the Fabric Designs with the real images.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors