Image Finder CompTech2022

Image Finder is an implementation of the Image Finder project as part of the CompTech2022 winter school.

You can try fast search by text or image over 2 millions Professional photo in Demo app.

About

The product goal is possibility to help users to quickly finding the right images.

Survey

According to a survey conducted as part of CompTech2022, users typically extract images on their devices themselves. This opportunity will be useful for people whose professional activities or hobbies are related to photo. In addition, we learned that this product will be useful to absolutely any user.

The target audience

This product is aimed at all users.

System operation

Image search;
Search by text query (Russian/English);
Video search.

Structure

colabs — directory with research experiments in colab notebooks;
test – directory with tests;
assets — directory with images;
main.py — main file that includes all classes and functions for user-friendly web-service
faissindexer.py — contains FAISS indexer that stores image embeddings and searches nearest neighbors for given text embedding;
dummyindexer.py — contains simple indexer that stores image embbeddings and searches nearest neighbors for given text embedding by one-vs-all comparison;
hnsw_indexer.py — contains HNSW indexer that stores image embbeddings and searches nearest neighbors for given text embedding by approximate nearest neighbor search;
embedder.py — contains wrapper-classes for different CLIP models;
searchmodel.py — classes that connect indexers and CLIP embedders, load and store indexed images and their paths;
CLIP_attention_maps.py — attention maps for CLIP model;
ruCLIP_attention_maps.py — attention maps for RuCLIP model;
requirements.txt — list of dependencies

Principle of operation

For example

Text query

Image

Usage

This project was tested on python 3.7

Clone repository git clone https://github.com/comptech-winter-school/image-finder.git
Install required dependencies from requirements.txt.
Run command streamlit run main.py --server.port {PORT}
Copy IP-ADDRESS:PORT from terminal and paste it in browser
Select preferred indexer
Select text query or image method for processing
Select output image count
If you want to filter output results, you can use threshold slider
The images will be print with sorting of cosine distance.

Docker Build

For running in Docker run these commands:

docker build -t streamlitapp:latest
docker run -p 8501:8501 streamlitapp:latest

App will be deployed at http://localhost:8501/

Models

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3.

RuCLIP (Russian Contrastive Language–Image Pretraining) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and multimodal learning.

ML experiments

RuCLIP vs RuCLIP-SB

To solve the problem of image search by Russian-language queries, two models were considered: RuCLIP and RuCLIP-SB.

For the analysis, the CIFAR-100 dataset was selected, which contains 100 classes of 600 images with a size of 32x32. There are 500 training images and 100 test images in each class. This set is well suited for comparing models, as it has a wide variety of classes.

To evaluate the models, we will solve the classification problem. The definition of an image class can be considered as a search for an image by a query that is equal to the class label.

Precision, recall, accuracy, top-5 accuracy were analyzed. The RuCLIP model is better than RuCLIP-SB in all parameters, therefore, RuCLIP was chosen to solve the problem of image search by Russian-language queries.

RuCLIP vs RuCLIP-ONNX

The inference of RuCLIP and the optimized RuCLIP model via ONNX was investigated. Optimization gives a visible increase in speed.

Attention maps

Processing results for queries in the singular and plural are almost the same

Processing results depending on the number of iterations

Libraries

pandas — software library in Python for data processing and analysis.
numpy — software library in Python that adds support for large multidimensional arrays and matrices.
faiss — library of algorithms for finding nearest neighbors in linear space.
nmslib — cross-platform similarity search library.
streamlit — open-source app framework, the fastest way to build and share data apps.
torch — open source deep learning framework.

Datasets:

Cifar100; URL: https://www.cs.toronto.edu/%7Ekriz/cifar.html
Unsplash; URL: https://unsplash.com/data

Team

Developers:
- Anna Glushkova,
- Vasiliy Dronov,
- Kirill Keller,
- Alexandr Minin,
- Maxim Mashtakov,
- Vladislav Kuznetsov,
- Dmitry Moskalev
Team Lead:
- Dmitry Moskalev
Mentors:
- Amir Uteuov,
- Vladimir Kilyazov.

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
assets		assets
colabs		colabs
docs		docs
test		test
CLIP_attention_maps.py		CLIP_attention_maps.py
Dockerfile		Dockerfile
README.MD		README.MD
dummyindexer.py		dummyindexer.py
embedder.py		embedder.py
faissindexer.py		faissindexer.py
hnsw_indexer.py		hnsw_indexer.py
main.py		main.py
requirements.txt		requirements.txt
requirements_docker.txt		requirements_docker.txt
ruCLIP_attention_maps.py		ruCLIP_attention_maps.py
searchmodel.py		searchmodel.py

comptech-winter-school/image-finder

Folders and files

Latest commit

History

Repository files navigation

Image Finder CompTech2022

About

Survey

The target audience

System operation

Structure

Principle of operation

Usage

Docker Build

Models

ML experiments

RuCLIP vs RuCLIP-SB

RuCLIP vs RuCLIP-ONNX

Attention maps

Libraries

Datasets:

Team

About

Topics

Resources

Stars

Watchers

Forks

Languages