# CLIP-as-service: Client

### Install prerequisites

In [None]:
!pip install -q clip-client 
!pip install -q ipywidgets # look nice in notebook

## Text-to-image cross-modal search

Let's build a text-to-image search using CLIP-as-service. Namely, user input a sentence and the program returns the matched images. We will use Totally Looks Like dataset and DocArray package. Note that DocArray is included within clip-client as an upstream dependency, so you don't need to install it separately.

In [None]:
# BUG: Disable warnings otherwise screen gets flashy
import warnings
warnings.filterwarnings('ignore')

# BUG: Install matplotlib for sprite render
!pip install -q matplotlib

### Pull pre-computed embeddings

Since you may be running this notebook on a laptop and not a GPU-powered beast, we'll skip the [dataset encoding](https://github.com/jina-ai/clip-as-service/#encode-images) and just download pre-computed embeddings from Jina Cloud

In [None]:
from docarray import DocumentArray

img_da = DocumentArray.pull('ttl-embedding', show_progress=True, local_cache=True)

In [None]:
img_da.plot_image_sprites()

### Connect to CLIP server

Be sure to run [`server.ipynb`](./server.ipynb) and take note of the server settings there

In [None]:
from clip_client import Client

host = "grpc://examples.jina.ai:51000"

c = Client(host)

### Find matches

In [None]:
input_texts = [
    "a happy potato",
    "professor cat is very serious",
    "there will be no tomorrow so lets eat unhealthy"
]

In [None]:
for txt in input_texts:
    print(txt)
    vec = c.encode([txt])
    r = img_da.find(query=vec, limit=9)
    r.plot_image_sprites()

## Image-to-text cross-modal search

We can also switch the input and output of the last program to achieve image-to-text search. Precisely, given a query image find the sentence that best describes the image.

We'll sample 10 images from our image DocumentArray and return the closest matching sentences from *Pride and Prejudice*.

### Download *Pride and Prejudice*

In [None]:
txt_da = DocumentArray.pull('ttl-textual', show_progress=True, local_cache=True)

### Plot matches

In [None]:
for d in img_da.sample(10):
    d.plot()
    results = txt_da.find(d.embedding, limit=1)
    
    for match in results:
        print(match.text)

### This...isn't great?

- We broke down *Pride and Prejudice* into sentences. Our parser recognized things like `Mr.` as one sentence as it has a `.` at the end. So `Mr` is seen as a valid search term and it's so vague it just presents random dudes.
- Likewise, a lot of sentences have people's names. There are so many "Janes" that "Jane" could look like anyone!