In [1]:
import numpy as np
import pandas as pd
from pathlib import Path
data_dir = Path("../data").absolute()

In [2]:
df = pd.read_parquet(data_dir / "product_images.parquet")
df.sample(10)

Unnamed: 0,asin,title,primary_image
55729,B01DZS4Q08,PetSafe Stay & Play Wireless Pet Fence with Re...,https://m.media-amazon.com/images/I/41XO2xPvGp...
84372,B09KLLYNDD,DDgro Electronics Travel Packing Organizer Tec...,https://m.media-amazon.com/images/I/41wjQrfgTH...
43462,B07G2R8PCH,"Elephant/Mouse Flannel Animals & Hearts Coral,...",https://m.media-amazon.com/images/I/51v5M5exsZ...
46918,B086DHTR65,"Sparkle Vinyl Caramel, Fabric by the Yard",https://m.media-amazon.com/images/I/61VypwjXVb...
23963,B07R1PSNY5,IRIS USA 74 Quart WEATHERPRO Plastic Storage B...,https://m.media-amazon.com/images/I/41RbFu4iU-...
44648,B09N5JNVRP,CamelBak Arete 18 Hydration Backpack for Hikin...,https://m.media-amazon.com/images/I/413h1sJJ2v...
60538,B00UH1H23K,CARLSON'S Choke Tubes Remington Long Beard Por...,https://m.media-amazon.com/images/I/41h7U9PfNg...
71340,B00YDW1KW8,Supracor SpaCells Facial Sponge - Face Scrubbe...,https://m.media-amazon.com/images/I/51fAjqrbqo...
22935,B07SP5KQNG,Under Armour womens Tech Short-Sleeve V-Neck -...,https://m.media-amazon.com/images/I/41mdT7CBlv...
86123,B09BDWGX6K,"Cervical Neck Pillow for Sleeping, Memory Foam...",https://m.media-amazon.com/images/I/41Mo776VrX...


# Task description
## The Data
The dataframe contains the top 100k best-selling items on Amazon (as of November 2022) has 3 columns

1. `asin` - The Amazon identifier.
1. `title` - The product title, as listed on the Amazon store.
1. `primary_image` - The image to be listed in search results.

## Goal
The goal of the task is be able to search products both by textual similarity, and by image similarity.

For example, a customer walking down the street could take a picture of a red dress she likes and get similar items from Amazon.

Altenatively, that same customer might open the Amazon website and search for "red dress" and find items that correspond to that query.

## Implementation

### Embedding
We will use [CLIP](https://github.com/openai/CLIP) embedding for this task.
<img src="https://openaiassets.blob.core.windows.net/$web/clip/draft/20210104b/overview-b.svg" width="400">

CLIP allows us to link images with their description and map them to the same embedding space.

### Similarity search

Once the embedding is done, we need to run a nearest-neighbor search using the `cosine` similarity measure.

The products that are closest to the query vector should (hopefully) be similar to the customer's intentions.

The query vector could be a result of either `CLIP` image embedding or `CLIP` textual embedding.

We will use the [vecsim](https://github.com/argmaxml/vecsim) module to do the similarity search.

### Serving

We used [Flask](https://flask.palletsprojects.com/en/2.2.x/) to implement the web-server, the code is at `server.py`.

**Note**: The server code cotains several `TODO:` comments, you will need to implement. The server is currently functional and it outputs random results.

# Submission


1. Please fork this repo, and implement the missing parts.
1. Please fill in this form.
1. Once done, please schdule an interview with Uri to review the code

## Good luck !
