# Image Retrieval

image retrieval system is already implemented. 

We need input for the system, and they are:

(requirements)
* project_dir
* database
* vocab_tree
* database_image_list
* query_image_list

(optional)
* num_images
* num_verifications
* max_num_features


-------------------------------------

We assume your project directory has following structure. (You can use symbolic link for images)
```
/path/to/project/...
+── images
│   +── image1.jpg
│   +── image2.jpg
│   +── ...
│   +── imageN.jpg
```

## Vocabulary

You can download pre-trained vocab tree from here: https://demuc.de/colmap/#download

Put the file into WORK_DIR


In [None]:
import os
from shell_util import run_cmd, run_cmd_get_output

EXE_DIR_PATH = "../build/src/exe/"
WORK_DIR = "../ir_demo/"
if not os.path.exists(WORK_DIR):
    os.mkdir(WORK_DIR)

In [None]:
DB_PATH = os.path.join(WORK_DIR, "database.db")
IMAGE_PATH = os.path.join(WORK_DIR, "images")

# 200k vocab trained on Paris6k with Colmap's vocabulary builder.
# VOCAB_TREE_PATH = os.path.join(WORK_DIR, "vocab_tree-200k-paris6k.bin")
# SAVE_INDEX_PATH = os.path.join(WORK_DIR, "vocab_tree-200k-paris6k_image_indexed.bin")

# Trained Vocabulary available at https://demuc.de/colmap/#download

# 64k vocab from offical site https://demuc.de/colmap/
# VOCAB_TREE_PATH = os.path.join(WORK_DIR, "vocab_tree-65536.bin")
# SAVE_INDEX_PATH = os.path.join(WORK_DIR, "vocab_tree-65536_image_indexed.bin")

# 256k vocab from offical site https://demuc.de/colmap/
VOCAB_TREE_PATH = os.path.join(WORK_DIR, "vocab_tree-262144.bin")
SAVE_INDEX_PATH = os.path.join(WORK_DIR, "vocab_tree-262144_image_indexed.bin")

# 1M vocab from offical site https://demuc.de/colmap/
# VOCAB_TREE_PATH = os.path.join(WORK_DIR, "vocab_tree-1048576.bin")
# SAVE_INDEX_PATH = os.path.join(WORK_DIR, "vocab_tree-1048576_image_indexed.bin")

## Alternatively, you can build it from scratch

use `vocab_tree_builder`

See `./tutorial/Generate Vocabulary with Paris6k.ipynb` for detailed instruction

# (Only for the first run) Create Database

To keep intermediate data for image retrieval, we use SQLite3 database. 

In [None]:
if not os.path.exists(DB_PATH):
    cmd = "{EXE_DIR_PATH}database_creator --database_path {DB_PATH}".format( \
                    EXE_DIR_PATH=EXE_DIR_PATH, \
                    WORK_DIR=WORK_DIR, \
                    DB_PATH=DB_PATH)
    run_cmd(cmd, echo=True)
else:
    print("Cannot overwrite existing database file. Please manually delete the file.")

# Extracting Features from Images Using COLMAP

In [None]:
cmd = "{EXE_DIR_PATH}feature_extractor --database_path {DB_PATH} --image_path {IMAGE_PATH}".format( \
                EXE_DIR_PATH=EXE_DIR_PATH, \
                DB_PATH=DB_PATH, \
                IMAGE_PATH=IMAGE_PATH)
run_cmd(cmd)

# Timing 4 min for Oxford5k

# Import features
Most cases, you want to use pregenerated feature to do fair comparision with other models. Colmap supports feature import.

use `feature_importer`

In [None]:
# TODO: how to import features? I want to use hessaff features. 

# Add Query Images to Database

If you want to use query images that are not in the current database (either cropped version, or new image), you have to put it in the database. 

In [None]:
# QUERY_IMAGE_PATH = "../eval/oxford5k_query_images"
# QUERY_IMAGE_LIST_PATH = "../eval/oxford5k_query_image_list.txt"

QUERY_IMAGE_PATH = "../eval/oxford5k_query_images_crop"
QUERY_IMAGE_LIST_PATH = "../eval/oxford5k_query_image_crop_list.txt"

cmd = "{EXE_DIR_PATH}feature_extractor --database_path {DB_PATH} --image_path {QUERY_IMAGE_PATH}".format( \
                EXE_DIR_PATH=EXE_DIR_PATH, \
                DB_PATH=DB_PATH, \
                QUERY_IMAGE_PATH=QUERY_IMAGE_PATH)
run_cmd(cmd)

# Run Image Retrieval

`vocab_tree_retriever` contains below steps:

1. Indexing image. 
    (check this) with image feature in DB, indexing with the given vocab tree
2. 

...

If you omit query_image_list_path, it will use all images in the database. 


If database is large, making index everytime take long time. Can I save generated index?
For experiment purpose. Is it good idea to save index? Can we use the same index again, if some of algorithm is changed?


```
args:
--project_path arg
--database_path arg
--vocab_tree_path arg
--database_image_list_path arg
--query_image_list_path arg
--num_images arg (=-1)
--num_verifications arg (=0)
--max_num_features arg (=-1)

```

# Test Image Retrieval

## To get real output use below command with file redirection



```bash
# Oxford5k. Query NoCrop. Vocab 64k. NoReranking
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-65536.bin --save_index_path ../ir_demo/vocab_tree-65536_image_indexed.bin --query_image_list_path ../eval/oxford5k_query_image_list.txt --num_verifications 0 | tee query_result_nocrop_vocab_64k_norerank.txt

# Oxford5k. Query NoCrop. Vocab 64k. Reranking Top 1000
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-65536.bin --save_index_path ../ir_demo/vocab_tree-65536_image_indexed.bin --query_image_list_path ../eval/oxford5k_query_image_list.txt --num_verifications 1000 | tee  query_result_nocrop_vocab_64k_geomverif_1000.txt

# Oxford5k. Query NoCrop. Vocab 200k Paris6k. NoReranking
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-200k-paris6k.bin --save_index_path ../ir_demo/vocab_tree-200k-paris6k_image_indexed.bin --query_image_list_path ../eval/oxford5k_query_image_list.txt --num_verifications 0 | tee  query_result_nocrop_vocab_200k_norerank.txt

# Oxford5k. Query NoCrop. Vocab 200k Paris6k. Reranking Top 1000
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-200k-paris6k.bin --save_index_path ../ir_demo/vocab_tree-200k-paris6k_image_indexed.bin --query_image_list_path ../eval/oxford5k_query_image_list.txt --num_verifications 1000 | tee  query_result_nocrop_vocab_200k_geomverif_1000.txt

# Oxford5k. Query NoCrop. Vocab 256k. NoReranking
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-262144.bin --save_index_path ../ir_demo/vocab_tree-262144_image_indexed.bin --query_image_list_path ../eval/oxford5k_query_image_list.txt --num_verifications 0 | tee  query_result_nocrop_vocab_256k_norerank.txt

# Oxford5k. Query NoCrop. Vocab 256k. Reranking Top 1000
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-262144.bin --save_index_path ../ir_demo/vocab_tree-262144_image_indexed.bin --query_image_list_path ../eval/oxford5k_query_image_list.txt --num_verifications 1000 | tee  query_result_nocrop_vocab_256k_geomverif_1000.txt
```

---------------------------
**Changing Database requires re-indexing step because visual_index saved db row id instead filename.**

```bash
# Oxford5k. Query Crop. Vocab 64k. NoReranking
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query_crop.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-65536.bin --save_index_path ../ir_demo/vocab_tree-65536_image_indexed_crop.bin --query_image_list_path ../eval/oxford5k_query_image_crop_list.txt --num_verifications 0 | tee  query_result_crop_vocab_64k_norerank.txt


# Oxford5k. Query Crop. Vocab 64k. Reranking Top 1000
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query_crop.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-65536.bin --save_index_path ../ir_demo/vocab_tree-65536_image_indexed_crop.bin --query_image_list_path ../eval/oxford5k_query_image_crop_list.txt --num_verifications 1000 | tee  query_result_crop_vocab_64k_geomverif_1000.txt

# Oxford5k. Query Crop. Vocab 200k Paris6k. NoReranking
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query_crop.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-200k-paris6k.bin --save_index_path ../ir_demo/vocab_tree-200k-paris6k_image_indexed_crop.bin --query_image_list_path ../eval/oxford5k_query_image_crop_list.txt --num_verifications 0 | tee  query_result_crop_vocab_200k_norerank.txt


# Oxford5k. Query Crop. Vocab 200k Paris6k. Reranking Top 1000
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query_crop.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-200k-paris6k.bin --save_index_path ../ir_demo/vocab_tree-200k-paris6k_image_indexed_crop.bin --query_image_list_path ../eval/oxford5k_query_image_crop_list.txt --num_verifications 1000 | tee  query_result_crop_vocab_200k_geomverif_1000.txt


# Oxford5k. Query Crop. Vocab 256k. NoReranking
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query_crop.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-262144.bin --save_index_path ../ir_demo/vocab_tree-262144_image_indexed_crop.bin --query_image_list_path ../eval/oxford5k_query_image_crop_list.txt --num_verifications 0 | tee  query_result_crop_vocab_256k_norerank.txt

# Oxford5k. Query Crop. Vocab 256k. Reranking Top 1000
../build/src/exe/vocab_tree_retriever --database_path ../ir_demo/database_oxf5k+query_crop.db --database_image_list_path ../ir_demo/oxf5k_list.txt --vocab_tree_path ../ir_demo/vocab_tree-262144.bin --save_index_path ../ir_demo/vocab_tree-262144_image_indexed_crop.bin --query_image_list_path ../eval/oxford5k_query_image_crop_list.txt --num_verifications 1000 | tee  query_result_crop_vocab_256k_geomverif_1000.txt
```

## Or, you can test with dummy query list conatining only single query

In [None]:
DB_IMAGE_LIST_PATH = "../ir_demo/oxf5k_list.txt"
DB_PATH = "../ir_demo/database_oxf5k+query.db"

NUM_VERIFICATIONS = 0 # Partial reranking
NUM_IMAGES = -1 # Use all images in DB. filtering out query images in DB will be done with database_image_list_path.

QUERY_IMAGE_LIST_PATH = "./oxford5k_query_image_list_test.txt"
SINGLE_QUERY_IMAGE_NAME = "all_souls_2.png"

# TODO pass database_image_list_path, so we prevent query images in database do not show up in search result. 
# Alternative way is using two different db for query and datapool. 

with open(QUERY_IMAGE_LIST_PATH, "w") as f:
    f.write(SINGLE_QUERY_IMAGE_NAME+"\n")
    
import cv2
import matplotlib.pyplot as plt
%matplotlib inline

def parse_query_result(output):
    result = {}
    for line_idx, line in enumerate(output.split("\n")[:-1]): # -1 for trailing new line        
        if "Querying for image " in line:
            query_name = line.split(" ")[3]
            result[query_name] = []
        else:      
            # print(line)
            fields = line.strip().split(", ")
            
            result_image_filename = fields[1].split("=")[1]            
            result_image_name = result_image_filename
            
            result_score = fields[2].split("=")[1]
            result[query_name].append((result_image_name, result_score))
    return result           

def show_image(image_name):
    
    image_path = os.path.join(IMAGE_PATH, image_name)
    # print("open image:", image_path)
    img_bgr = cv2.imread(image_path, cv2.IMREAD_COLOR)
    if img_bgr is None:
        QUERY_IMAGE_PATH="../eval/oxford5k_query_images"
        image_path = os.path.join(QUERY_IMAGE_PATH, image_name)
        # print("open image:", image_path)
        img_bgr = cv2.imread(image_path, cv2.IMREAD_COLOR)
        
    if img_bgr is None:
        print("cannot find image with name:", image_name)
        return
        
    
    img = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
    
    plt.figure()
    plt.imshow(img)
    plt.show()
    
    
cmd = "{EXE_DIR_PATH}vocab_tree_retriever --database_path {DB_PATH} --database_image_list_path {DB_IMAGE_LIST_PATH} --vocab_tree_path {VOCAB_TREE_PATH} --save_index_path {SAVE_INDEX_PATH} --query_image_list_path {QUERY_IMAGE_LIST_PATH} --num_images {NUM_IMAGES} --num_verifications {NUM_VERIFICATIONS}".format( \
                EXE_DIR_PATH=EXE_DIR_PATH, \
                DB_PATH=DB_PATH, \
                DB_IMAGE_LIST_PATH=DB_IMAGE_LIST_PATH, \
                VOCAB_TREE_PATH=VOCAB_TREE_PATH, \
                SAVE_INDEX_PATH=SAVE_INDEX_PATH, \
                QUERY_IMAGE_LIST_PATH=QUERY_IMAGE_LIST_PATH, \
                NUM_IMAGES=NUM_IMAGES, \
                NUM_VERIFICATIONS=NUM_VERIFICATIONS)    
print("cmd: ", cmd)
output = run_cmd_get_output(cmd)

result = parse_query_result(output)

for image_name, rank_list in result.items():
    
    print("QUERY image_name:", image_name)
    show_image(image_name)
    print("RESULT:")
    for rank, tup in enumerate(rank_list[:10]):
        filename, score = tup
        print("   top{}: {}. score: {}".format(rank+1, filename, score))
        show_image(filename)
    print()


In [None]:
cmd = "{EXE_DIR_PATH}vocab_tree_retriever --database_path {DB_PATH} --vocab_tree_path {VOCAB_TREE_PATH} --save_index_path {SAVE_INDEX_PATH} --query_image_list_path {QUERY_IMAGE_LIST_PATH} --num_images {NUM_IMAGES} --num_verifications 500".format( \
                EXE_DIR_PATH=EXE_DIR_PATH, \
                DB_PATH=DB_PATH, \
                VOCAB_TREE_PATH=VOCAB_TREE_PATH, \
                SAVE_INDEX_PATH=SAVE_INDEX_PATH, \
                QUERY_IMAGE_LIST_PATH=QUERY_IMAGE_LIST_PATH, \
                NUM_IMAGES=NUM_IMAGES, \
                NUM_VERIFICATIONS=NUM_VERIFICATIONS)    
print("cmd: ", cmd)
output = run_cmd_get_output(cmd)
print(output)

result = parse_query_result(output)

for image_name, rank_list in result.items():
    print("QUERY image_name:", image_name)
    show_image(image_name)
    print("RESULT:")
    for rank, tup in enumerate(rank_list[:10]):
        filename, score = tup
        print("   top{}: {}. score: {}".format(rank+1, filename, score))
        show_image(filename)
    print()