# Fashion search with `docarray`

Let's build a simple image-matching search engine for fashion product images from the [Fashion Product Images (small) dataset](https://www.kaggle.com/paramaggarwal/fashion-product-images-small).

[DocArray](https://github.com/jina-ai/docarray) is a library for nested, unstructured data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer the data with a Pythonic API.

In [None]:
!pip install "docarray[full]"

In [None]:
from docarray import Document, DocumentArray

## Load images

In [None]:
import os
if not os.path.isdir("./data"):
    !wget https://github.com/alexcg1/neural-search-notebooks/blob/main/docarray/fashion-search/data.zip?raw=true
    !unzip data.zip
    !rm -f data.zip

In [None]:
docs = DocumentArray.from_files("./data/*.jpg")

In [None]:
docs.plot_image_sprites() # Preview the images

## Apply preprocessing

In [None]:
from docarray import Document

def preproc(d: Document):
    return (d.load_uri_to_image_tensor()  # load
             .set_image_tensor_shape((80, 60))  # resize all to 200x200
             .set_image_tensor_normalization()  # normalize color 
             .set_image_tensor_channel_axis(-1, 0))  # switch color axis for the PyTorch model later

In [None]:
docs.apply(preproc)

## Embed images

In [None]:
!pip install torchvision

In [None]:
import torchvision
model = torchvision.models.resnet50(pretrained=True)  # load ResNet50
docs.embed(model, device='cpu')  # If running on non-gpu machine, change "cuda" to "cpu"

## Create query Document

Let's just use the first image from our dataset:

In [None]:
query_doc = Document(uri="data/20000.jpg")
query_doc.display()

In [None]:
query_docs = DocumentArray([query_doc])

In [None]:
query_docs.apply(preproc)

In [None]:
query_docs.embed(model, device="cuda") # If running on non-gpu machine, change "cuda" to "cpu"

## Get matches

In [None]:
query_docs.match(docs, limit=9)

## See the results

As you can see, the model is finding matches based on the input images - including the human wearing the clothes! In reality we want to match the clothes themselves, so later we'll fine-tune our model using Jina AI's [finetuner](https://finetuner.jina.ai).

In [None]:
(DocumentArray(query_doc.matches, copy=True)
    .apply(lambda d: d.set_image_tensor_channel_axis(0, -1)
                      .set_image_tensor_inv_normalization())
    .plot_image_sprites())

### Next steps

- [Finetune](https://finetuner.jina.ai) our model to improve matching
- Build into a real-world search engine with [Jina](https://github.com/jina-ai/jina) (example [here](http://examples.jina.ai/fashion))