*Identify specific whales and dolphins through features specific to each animal*

![kaggle](https://github.com/poissonyzr/dol_kaggle/raw/805fa65c76ec0f9f0354692d6b66e399d65f6a9b/dol1.jpeg)

# 1. First look of this competition 
* As a Kaggle rookie, I wanted to do a bit of research first. I found that most of the articles were similar to each other, with each falling into one of two categories:

1. using the EfficientNet model with (a DOLG head) ,(GeM Pooling) and ArcFace classifier.

2.   A simple ensemble of public best kernels
>I can easily achieve the score of 0.758 but this is not what I want


# 2. Overview of image detection/retrival
* Based on prior work, I found that clustering embedding vectors generated by mapping the image data into a latent space generated some fairly good results. Let's map out this process with visualizations below:

![](https://github.com/poissonyzr/dol_kaggle/raw/805fa65c76ec0f9f0354692d6b66e399d65f6a9b/Figure_1.png)

![](https://github.com/poissonyzr/dol_kaggle/raw/805fa65c76ec0f9f0354692d6b66e399d65f6a9b/Figure_2.png)

# 3. A concrete approach
* Let's use resnet50 to map the image to a vector before storing the vectors in a database. 

**Importing libraries**

In [None]:
import numpy as np
import os
from pathlib import Path
import torch
import matplotlib.pyplot as plt
from PIL import Image
from pymilvus import utility

**Config Setup**

In [None]:
dataset_path = './Kaggle/happy-whale-and-dolphin/train_images'
images = []
vectors = []
vec_dim = len(vetcors[0])

**Setup Database**
* The vectors transformed by images need to be stored into a database which enables vector search. I use [**Milvus**](https://milvus.io/), a vector database, to store, index, and manage these massive embedding vectors. I first need to set up Docker to make Milvus work locally.

In [None]:
# download the latest docker-compose file
$ wget https://github.com/milvus-io/milvus/releases/download/v2.0.0-pre-ga/milvus-standalone-docker-compose.yml -O docker-compose.yml
# start the Milvus service
$ docker-compose up -d
# check the state of the containers
$ docker-compose ps

* With python api of Milvus, I can easily connect to our local milvus database and build a milvus collcetion.

In [None]:
!pip install pymilvus
import pymilvus as milvus

# connect to local Milvus service
milvus.connections.connect(host='127.0.0.1', port=19530)

# create collection
collection_name = 'reverse_image_search'
id_field = milvus.FieldSchema(name="id", dtype=milvus.DataType.INT64, is_primary=True, auto_id=True)
vec_field = milvus.FieldSchema(name="vec", dtype=milvus.DataType.FLOAT_VECTOR, dim=vec_dim)
schema = milvus.CollectionSchema(fields=[id_field, vec_field])
collection = milvus.Collection(name=collection_name, schema=schema)

**Images embedding**
* Now we need to embed these images as vectors. Here I'll use **[Towhee](https://towhee.io/)**, an end-to-end library for generating embedding vectors from images.

In [None]:
!pip install towhee
from towhee import pipeline
embedding_pipeline = pipeline('towhee/image-embedding-resnet50')

**Vectors storage and embedding in database**

In [None]:
for img_path in Path(dataset_path).glob('*'):
    vec = embedding_pipeline(str(img_path))
    norm_vec = vec / np.linalg.norm(vec)
    vectors.append(norm_vec.tolist())
    images.append(str(img_path.resolve()))
    
# insert data to Milvus
res = collection.insert([vectors])
collection.load()
img_dict = {}

# maintain mappings between primary keys and the original images for image retrieval
for i, key in enumerate(res.primary_keys):
    img_dict[key] = images[i]
    
query_img_path = './Kaggle/happy-whale-and-dolphin/test_images'
query_images = []
query_vectors = []
top_k = 5

for img_path in Path(query_img_path).glob('*'):
    vec = embedding_pipeline(str(img_path))
    norm_vec = vec / np.linalg.norm(vec)
    query_vectors.append(norm_vec.tolist())
    query_images.append(str(img_path.resolve()))

query_results = collection.search(data=query_vectors, anns_field="vec", param={"metric_type": 'L2'}, limit=top_k)

![](https://github.com/poissonyzr/dol_kaggle/raw/805fa65c76ec0f9f0354692d6b66e399d65f6a9b/Figure_4.png)

[Reverse image search workflow](https://docs.towhee.io/tutorials/reverse-image-search/)

**Search results visualization**

In [None]:
!pip install matplotlib
import matplotlib.pyplot as plt
from PIL import Image

for i in range(len(query_results)):
    results = query_results[i]
    query_file = query_images[i]

    result_files = [img_dict[result.id] for result in results]
    distances = [result.distance for result in results]

    fig_query, ax_query = plt.subplots(1,1, figsize=(5,5))
    ax_query.imshow(Image.open(query_file))
    ax_query.set_title("Searched Image\n")
    ax_query.axis('off')

    fig, ax = plt.subplots(1,len(result_files),figsize=(20,20))
    for x in range(len(result_files)):
        ax[x].imshow(Image.open(result_files[x]))
        ax[x].set_title('dist: ' + str(distances[x])[0:5])
        ax[x].axis('off')

![](https://github.com/poissonyzr/dol_kaggle/raw/805fa65c76ec0f9f0354692d6b66e399d65f6a9b/Figure_5.png)

![](https://github.com/poissonyzr/dol_kaggle/raw/805fa65c76ec0f9f0354692d6b66e399d65f6a9b/Figure_6.png)

![](https://github.com/poissonyzr/dol_kaggle/raw/805fa65c76ec0f9f0354692d6b66e399d65f6a9b/Figure_7.png)

![](https://github.com/poissonyzr/dol_kaggle/raw/805fa65c76ec0f9f0354692d6b66e399d65f6a9b/Figure_8.png)

                     👏 IF YOU FORK THIS OR FIND THIS HELPFUL 👏
                                 PLEASE UPVOTE!