Codebase for Image-Based Query Search Engine

We have designed and implemented an image-based query search engine that strikes a good balance between efficiency and accuracy. Users can submit any image they want to search for, and the engine will return similar images in a custom database. The main components of the system are shown in the following diagram:

This work is a combination of three master's thesis projects. Welcome to check out our theses via the following links:

Yanan Hu (Feature Extraction): https://repository.tudelft.nl/islandora/object/uuid%3Ae0d2bc46-3caa-43ae-b83f-37b830757eac
Yuanyuan Yao (Nearest neighbour search): https://repository.tudelft.nl/islandora/object/uuid:4a2c9c6f-b2b8-41d6-9b70-69c4f246c964
Qi Zhang (Re-ranking): https://repository.tudelft.nl/islandora/object/uuid%3A32e02913-ba0d-446a-9807-1129ba4a314b

Note: It is true that there are many code files, but most of them are written for the convenience of testing modules. Only a small number of files are needed to use our model and reproduce our results, refer to the detailed descriptions below.

To simply use the pretrained models directly:

Create and activate a virtual python environment
Install packages using requirements.txt
pip install -r requirements.txt
Faiss needs to be installed manually: pip install faiss-gpu
The torch version and cuda version should be compatible
Download the pretrained network from https://drive.google.com/drive/folders/1JbGNvQgqKm7GiUvOqw1DSncSVR3k0xbm?usp=sharing and save it under data/networks
Change the paths in function extr_selfmade_dataset (src/networks/imageretrievalnet.py) to the paths of your datasets (which are just folders contain jpg images)
Create symbolic link to map your datasets under static/test/
Run offline.py to extract and save the features of images
```
python3 -m src.offline --datasets 'YOUR_DATASET_1, YOUR_DATASET_2, …, YOUR_DATASET_N' --gpu '0' --network 'resnet101-solar-best.pth' --K-nearest-neighbour 100
```
- The datasets will be merged to be your database. Given a query image, the engine will find the most similar images in the database.
- If no GPU is available, replace --gpu '0' by --NoGPU.
- If the database is large-scale (>100k), then you may need to use approximate nearest neighbour search methods, e.g., ANNOY. Select it by adding --matching_method 'ANNOY' --ifgenerate after the original command. It is normal that offline.py runs for a long time (even for days if the database is million-scale and HNSW or PQ_HNSW is chosen).
- Still, pay attention to the paths of the outputs. You can find and modify the settings in functions save_path_feature and load_path_feature (src/utils/general.py).
Run online.py
```
python3 -m src.online --datasets 'YOUR_DATASET_1, YOUR_DATASET_2, …, YOUR_DATASET_N' --gpu '0' --network 'resnet101-solar-best.pth' --K-nearest-neighbour 100
```
- The datasets and network should be exactly the same as the ones you choose when running offline.py
- If no GPU is available, replace --gpu '0' by --NoGPU.
- Use neighbour search methods if necessary. But do not include --ifgenerate since the required data/structures have been generated.
- After running a link will appear, click and operate on the GUI interface. Upload the query image and wait for the results.

If you want to tweak the model or reproduce our results:

To retrain the model

If you want to retrain the model yourself, the example training script is located in src/main_train.py. To train the model, you should firstly make sure you have downloaded the training datasets Sfm120k or GoogleLandmarksv2 in data/train/, then you can start training by running

   python3 -m src.main_train [-h] [--training-dataset DATASET] [--no-val]
                [--test-datasets DATASETS] [--test-whiten DATASET]
                [--test-freq N] [--arch ARCH] [--pool POOL]
                [--local-whitening] [--regional] [--whitening]
                [--not-pretrained] [--loss LOSS] [--loss-margin LM]
                [--image-size N] [--neg-num N] [--query-size N]
                [--pool-size N] [--gpu-id N] [--workers N] [--epochs N]
                [--batch-size N] [--optimizer OPTIMIZER] [--lr LR] [--ld LD]
                [--soa] [--weight-decay W] [--soa-layers N] [--sos] [--lambda N] 
                [--print-freq N] [--flatten-desc]
                EXPORT_DIR

To reproduce the results

   python3 -m src.test_rOP1m

Add --include1m if you want to include 1 million distractors. Before that download the pre-extracted feature vectors of the 1 million distractors via https://drive.google.com/file/d/1A8CEAXkMZ_o3zl1IRzQ_RSclciLhkTVY/view?usp=sharing. (Save it wherever you want, but do not forget to change the path in test_rOP1m.py)
Add --ifextracted if the features of images in revisited Oxford and Paris have already been extracted.
Choose test mode by specifying --mode. Use --mode 'mAP' (default) to reproduce the results of mean average precision, and use --mode 'num_images_to_be_retrieved' to reproduce the results of retrieval time. E.g., with --mode '100' you can get the retrieval time of top-100 results. It is recommended to first use --mode 'mAP' and then --mode 'num_images_to_be_retrieved' --ifextracted to mimic the offline-online procedure.

List of nearest neighbour search methods you can choose from

Nearest neighbour search methods are necessary for large-scale datasets (>100k). Implementations of all nearest neighbour search methods can be found in src/utils/nnsearch.py. (Not all of them are integrated into the final system.)

Product Quantization (--matching_method 'PQ')
matching_Nano_PQ(K, embedded_features_train, embedded_features_test, dataset, N_books=16, n_bits_perbook=8, ifgenerate=True)
ANNOY (--matching_method 'ANNOY')
matching_ANNOY(K, embedded_features_train, embedded_features_test, metric, dataset, n_trees=100, ifgenerate=True)
Hierarchical Navigable Small World (--matching_method 'HNSW')
matching_HNSW(K, embedded_features_train, embedded_features_test, dataset, m=4, ef=8, ifgenerate=True)
Product Quantization + Hierarchical Navigable Small World (--matching_method 'PQ_HNSW')
matching_HNSW_NanoPQ(K, embedded_features, embedded_features_test, dataset, N_books=16, N_words=256, m=4, ef=8, ifgenerate=True)

See the code comments for the meaning of the variables.
Recommondation: ANNOY (efficient), HNSW (accurate), PQ+HNSW (only when memory is an issue)

List of re-ranking methods you can choose from

You can choose from three re-ranking methods (QGE, SAHA, and LoFTR), the implementations of which can be found in src/utils/Reranking.py. The default one is QGE.

QGE QGE(ranks, qvecs, vecs, dataset, gnd, cache_dir, gnd_path2, AQE)
SAHA sift_online(query_num, qimages, sift_q_main_path, images, sift_g_main_path, ranks, dataset, gnd)
LoFTR loftr(loftr_weight_path, query_num, qimages, ranks, images, dataset, gnd)

If you want to use QGE, you need to create a path (the name of the dataset) under 'src/diffusion/tmp'.

If you change the database, please delete the cache of the previous Random Walk.

If you want to use LoFTR, you need to download the pretrained LoFTR weight from: https://github.com/zju3dv/LoFTR
You can put the LoFTR weight under this path: src/utils/weights/

Useful information can be found in the code comments in src/utils/Reranking.py and src/test_reranking.py.

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codebase for Image-Based Query Search Engine

To simply use the pretrained models directly:

If you want to tweak the model or reproduce our results:

About

Releases

Packages

Contributors 3

Languages

YYao-42/Image-Search-Engine-for-Historical-Research

Folders and files

Latest commit

History

Repository files navigation

Codebase for Image-Based Query Search Engine

To simply use the pretrained models directly:

If you want to tweak the model or reproduce our results:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages