# 🔎 Use CLIP to search product photos using text or image

Have you ever had a wonderful pair of shoes that were just getting worn out, and you wanted to find another similar pair? Sometimes it's tough to know exactly what words to use.

[Amazon has a feature that lets you snap a picture and find similar items in their store](https://www.hunker.com/13771848/amazon-photo-search). That's great for Jeff Bezos, but not so good if you have your own ecommerce store and want that feature.

So in this notebook we're going to build a simple search engine that will let you **use images (or text) to search through a catalog of fashion items**.

In later notebooks we'll flesh it out with endpoints (for remote access), scaling, fine-tuning and cloud deployment. But for now let's keep it dead simple.

You can find a fuller version of this example [here](https://examples.jina.ai/fashion?utm_source=notebook-ecommerce-1).

## 📑 Our dataset

We're using a small subset of the [Kaggle Fashion Product Images (small) dataset](https://www.kaggle.com/paramaggarwal/fashion-product-images-small). The full dataset (while still small) is less practical to get into our notebook environment.

It consists of 

- A CSV file that contains image ids and metadata
- 770 low-res color photos of fashion items like so:

|  |  | |
| --- | --- | --- |
| ![](https://github.com/jina-ai/example-multimodal-fashion-search/blob/main/data/subset/10001.jpg?raw=true) | ![](https://github.com/jina-ai/example-multimodal-fashion-search/blob/main/data/subset/10008.jpg?raw=true) | ![](https://github.com/jina-ai/example-multimodal-fashion-search/blob/main/data/subset/10009.jpg?raw=true) |

## 🤖 Our tech stack

### DocArray

We'll use [DocArray](https://docarray.jina.ai?utm_source=notebook-ecommerce-1) to convert our CSV file and images into Documents that we can then search through.

DocArray is a good fit for this, since whatever kind of data we use (text or image in our case), we only need to think about one data format: A Document can contain pretty much anything. We can also use DocArray to [bulk apply]() processing on our images, which is much faster than messing around with for-loops.

We can also use it to [push our dataset to the cloud](https://docarray.jina.ai/fundamentals/documentarray/serialization/#from-to-cloud?utm_source=notebook-ecommerce-1), making it easier to use it later in other notebooks.

### CLIP-as-service

We'll use CLIP-as-service to quickly generate [vector embeddings](https://docarray.jina.ai/fundamentals/document/embedding?utm_source=notebook-ecommerce-1) for our images, and thus be able to perform nearest-neighbor search using text or images as input.

CLIP-as-service is a good match for this, since it requires minimal dependencies and has pretty low-latency. It's also free to use via Jina's cloud servers.

## 💬 Talk to us!

Want to find out more about neural search and the Jina AI ecosystem? Join us on [Slack](https://slack.jina.ai?utm_source=notebook-ecommerce-1)!

## 📝 Notes

- This is just part one in a multi-part series, where we'll use the Jina ecosystem to build an advanced text/image search engine for products.

## Downloading and processing our data

This is just some rough and ready code to download a toy dataset and unzip it to our `data` directory:

In [None]:
data_dir = "/content/data"

In [None]:
!rm -rf /content/data

In [None]:
import os

if not os.path.isdir(data_dir):
  os.makedirs(data_dir)

  os.chdir(data_dir)
  !wget -q https://github.com/jina-ai/example-multimodal-fashion-search/raw/main/data/subset/fashion_subset.csv
  !wget -q https://github.com/jina-ai/example-multimodal-fashion-search/raw/main/data/subset/fashion_subset.zip
  !unzip -qq fashion_subset.zip
  print("Downloaded!")
else:
  print("Data dir already exists, skipping download!")

os.chdir("/content")

Let's take a quick look at our CSV to see what kind of data we have:

In [None]:
!head -n 3 data/fashion_subset.csv

## Creating Documents

Every piece of data we work with in the [Jina ecosystem](https://github.com/jina-ai?utm_source=notebook-ecommerce-1) has to be in the form of a [Document](https://docarray.jina.ai/fundamentals/document?utm_source=notebook-ecommerce-1) or [DocumentArray](https://docarray.jina.ai/fundamentals/documentarray?utm_source=notebook-ecommerce-1). This means that whether we're dealing with text, images, audio, or whatever, we only have one data format to keep in mind.

Instead of manually creating all of our Documents we can simply load them from a CSV file with the [`from_csv()` method](https://docarray.jina.ai/datatypes/tabular?utm_source=notebook-ecommerce-1):

In [None]:
!pip install -qqq docarray

In [None]:
from docarray import DocumentArray

In [None]:
docs = DocumentArray.from_csv("./data/fashion_subset.csv")

Let's see what a DocumentArray looks like:

In [None]:
docs.summary()

As we can see, all Documents have `id` and `tags`.

And now we can dig into how an individual Document looks:

In [None]:
docs[0]

As we can see, the Document's tags (metadata) were brought in directly from the CSV file.

Where are the images though? Because we only took data from the CSV (which doesn't *directly* specify where the images are), we'll have to create a URI for each image under `doc.uri`.

Luckily the URI is based on the `id` column of the CSV, which was mapped to `doc.id`:

In [None]:
for doc in docs:
  doc.uri = f"{data_dir}/{doc.id}.jpg"

Because we're dealing with images, we'll need to:

- Load the URI to an image tensor (so we have something to feed into our encoder later)
- Resize all the images to a consistent size
- Ensure all the tensors are in the same format

We can do this in a function and then call `docs.apply(process_images)`

In [None]:
def process_images(doc):
  return doc.load_uri_to_image_tensor().set_image_tensor_shape((80, 60))

In [None]:
docs.apply(process_images, show_progress=True)

Great, now we can see each Document has an image tensor.

We can see what we've got with the `plot_image_sprites()` method:

In [None]:
docs.plot_image_sprites()

Let's also normalize them for consistency (we could have done this earlier, but then the plotting doesn't look so good). There's a nice explanation of normalization [here](https://inside-machinelearning.com/en/why-and-how-to-normalize-data-object-detection-on-image-in-pytorch-part-1/#Normalizing_data)

In [None]:
for doc in docs:
  doc.set_image_tensor_normalization()

## Generating embeddings

Now we've loaded our images, we can feed them into our encoder. In our cases we want to encode using the [CLIP model](https://openai.com/blog/clip?utm_source=notebook-ecommerce-1), so we can use [CLIP-as-service](https://clip-as-service.jina.ai/) since it's low-latency and simple to use.

Why is CLIP a good encoder for this? CLIP encodes both text and images to a common vector space, which means we can use text to search images, images to search text, images to search images, etc.

In our search engine we'll focus on:

- Image-to-image
- Text-to-image

---

❓ Want to learn more about CLIP? Check out notebook for [finetuning CLIP with anime datasets](https://colab.research.google.com/drive/189LHTpYaefMhKNIGOzTLHHavlgmoIWg9?usp=sharing)

---

In [None]:
!pip install -qqq clip-client

Now we just need to set the client and encode all of our Documents. We can do this for free with Jina's CLIP-as-service server!

In [None]:
from clip_client import Client

c = Client('grpcs://demo-cas.jina.ai:2096')

docs = c.encode(docs)

---

💡 The embeddings we just generated were from a pre-trained general purpose CLIP model. That model is good for fashion products, teapots, faces, puppies, and lots of other things.

For more accurate search results we'll finetune our model specifically for our fashion dataset in a future notebook. Or you can check the [finetuning CLIP for anime](https://colab.research.google.com/drive/189LHTpYaefMhKNIGOzTLHHavlgmoIWg9?usp=sharing) if you want to learn how to do that now.

---

## Visualizing our data

Let's [plot out our data into a 3D graph](https://docarray.jina.ai/fundamentals/documentarray/visualization/#embedding-projector?utm_source=notebook-ecommerce-1) so we can better see how embeddings are clustered.

Note: You'll need to manually interrupt the cell below when you're done by hitting the "stop" button. Otherwise it will block the rest of the notebook from running.

In [None]:
docs.plot_embeddings(image_sprites=True)

We can see that different types of item are in different clusters: the watches are all together, the bags are all together and so on.

You may also notice that there are two groups of shirts - those with people wearing them, and those where it's just the shirt in shot. In a later notebook we'll finetune the CLIP model so all T-shirts are closer together (i.e. ignoring the humans who we don't care about in this context)

## Pushing our data to the cloud

Jina allows free cloud hosting of DocumentArrays. Since we plan to use the same DocumentArray in later notebooks, let's [push it to the cloud](https://docarray.jina.ai/fundamentals/documentarray/serialization/?highlight=push%20pull#from-to-cloud?utm_source=notebook-ecommerce-1):

In [None]:
docs.push("770-fashion-small-with-clip-embeddings", show_progress=True)

## Search by image

Let's choose one random image from our DocumentArray and use that to search for similar images in the dataset:

In [None]:
image_query = docs.sample(1)
image_matches = docs.find(image_query)

Now let's see the results. The first image will be the query image and subsequent will be matches in order of similarity.

---

⚠️ Because we only have a limited dataset, you may not always get something *super* similar. In the [example with the full dataset](https://examples.jina.ai/fashion?utm_source=notebook-ecommerce-1) you'll get better matches.

---

In [None]:
image_matches[0].plot_image_sprites()

## Search by text

As before, we're dealing with [Documents](https://docarray.jina.ai/fundamentals/document?utm_source=notebook-ecommerce-1) and [DocumentArrays](https://docarray.jina.ai/fundamentals/documentarray?utm_source=notebook-ecommerce-1). So the text query we search with should also be wrapped in a Document.

In [None]:
query_string = "women's red t-shirt"

from docarray import Document

text_query = DocumentArray([Document(text=query_string)])

Unlike our previous image search, we'll need to encode our query Document this time (since last time our query Document had already been encoded).

In [None]:
text_query = c.encode(text_query)

In [None]:
text_matches = docs.find(text_query)

In [None]:
text_matches[0].plot_image_sprites()

Not bad for a small dataset and pre-trained (un-finetuned) model!

## 🎁 Wrapping up

Great - we've built a simple search engine in our notebook, with very few lines of code. But what's next?

In future notebooks we'll see how you can:

- [Finetune](https://finetuner.jina.ai?utm_source=notebook-ecommerce-1) the CLIP model for better results on your dataset.
- Store your data on disk, spin up replicas/shards, start serving your search engine via a RESTful or gRPC gateway.
- Use pre-existing [building blocks](https://hub.jina.ai?utm_source=notebook-ecommerce-1) to speed up development.
- Host and deploy your search engine on [Jina Cloud](https://docs.jina.ai/fundamentals/jcloud?utm_source=notebook-ecommerce-1).

Stay tuned to [the Jina AI blog](https://medium.com/jina-ai) or join our [Slack community](https://slack.jina.ai?utm_source=notebook-ecommerce-1) to keep up-to-date.

## 📚 Learn more

Want to dig more into the Jina ecosystem? Here are some resources:

- [Developer portal](https://learn.jina.ai?utm_source=notebook-ecommerce-1) - tutorials, courses, videos on using Jina
- [Fashion search notebook](https://colab.research.google.com/github/alexcg1/neural-search-notebooks/blob/main/fashion-search/1_build_basic_search/basic_search.ipynb) - build an image-to-image fashion search engine
- [DALL-E Flow](https://colab.research.google.com/github/jina-ai/dalle-flow/blob/main/client.ipynb)/[Disco Art](https://colab.research.google.com/github/jina-ai/discoart/blob/main/discoart.ipynb) - create AI-generated art in your browser