Image Similarity Search Engine

A high-performance image similarity search engine built from scratch. This project uses a pre-trained CNN in Python/PyTorch for high-dimensional feature extraction and a custom k-d tree implementation in C for efficient nearest-neighbor search.

It's a practical demonstration of how modern AI pipelines and classic, high-performance data structures can work together to solve complex problems.

Core Concepts

This project is divided into two main components:

Feature Extraction (The "Visual Brain")
- A Python script uses a pre-trained Convolutional Neural Network (MobileNetV2) to convert images into meaningful 1280-dimensional feature vectors (embeddings).
- This process, known as Transfer Learning, leverages a model already trained on millions of images to understand the "content" of a new image and represent it numerically.
Efficient Search (The "Librarian Brain")
- A C program loads the thousands of feature vectors generated by the Python script.
- To avoid a slow linear scan, it organizes these high-dimensional points into a k-d tree, a specialized binary search tree for spatial data.
- This allows for an extremely fast nearest-neighbor search to find the image with the smallest Euclidean distance to a query image, using an intelligent pruning algorithm to avoid unnecessary comparisons.

Project Structure

image-similarity-search/
├── c_search/              # C program for the k-d tree search
│   ├── src/
│   ├── include/
│   ├── data/
│   └── Makefile
│
├── python_extractor/      # Python script for feature extraction
│   ├── dataset/
│   ├── extract_features.py
│   └── requirements.txt
│
├── .gitignore             # Files and folders to ignore
├── .gitattributes         # Configures Git LFS for large files
└── README.md              # You are here

Setup and Installation

Prerequisites

A C compiler (like GCC) and make.
Python 3.8+ and pip.
Git LFS (for handling the large vectors.csv file).

Installation Steps

Clone the repository: Make sure you have Git LFS installed (git lfs install).

git clone https://github.com/coderstale/image-similarity-search.git
cd image-similarity-search

Set up the Python environment:

cd python_extractor
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Usage

The process is two steps: first generate the data, then run the search.

Step 1: Generate Feature Vectors

Run the Python script to download the CIFAR-10 dataset and generate the vectors.csv file.

# Make sure you are in the python_extractor/ directory with the venv active
python extract_features.py

This will create a large vectors.csv file in the c_search/data/ directory.

Step 2: Compile and Run the C Search Program

Navigate to the C directory, compile the code with make, and run the application.

# From the project root, navigate to the C directory
cd c_search

# Compile the program
make

# Run the search application
./bin/search_app

The program will load the 50,000 vectors, build the k-d tree, and then prompt you to enter an image ID to find its most similar match.

Future Work

Web Frontend: The C backend could be refactored into a simple web server using a library like mongoose. A simple HTML/JavaScript frontend could then be built to provide a graphical interface for searching and displaying images.
K-Nearest Neighbors: The search algorithm could be extended to find the K nearest neighbors instead of just one, providing a gallery of similar images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Similarity Search Engine

Core Concepts

Project Structure

Setup and Installation

Prerequisites

Installation Steps

Usage

Step 1: Generate Feature Vectors

Step 2: Compile and Run the C Search Program

Future Work

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
c_search		c_search
python_extractor		python_extractor
README.md		README.md

cattolatte/image-similarity-search

Folders and files

Latest commit

History

Repository files navigation

Image Similarity Search Engine

Core Concepts

Project Structure

Setup and Installation

Prerequisites

Installation Steps

Usage

Step 1: Generate Feature Vectors

Step 2: Compile and Run the C Search Program

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages