# Notebook 3: Simple Search

In this notebook you learn how to compare text using semantic search.

### Optional: Install the client
You can skip this step, if you have already installed the `aleph_alpha_client`. Make sure you have the [latest pip version](https://pip.pypa.io/en/stable/installation/) installed before proceeding. 

In [None]:
!pip install aleph-alpha-client

### Instantiate the client
To interact with our API, you have to instantiate a client. Here you also provide your token to authenticate yourself. If you don't have one already, create one in your [Aleph Alpha profile](https://app.aleph-alpha.com/profile).

In [None]:
from aleph_alpha_client import Client
client = Client(token="AA_TOKEN")

### Compare the similarity of two texts

To compare two texts, you need to embed both and calculate the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between them. To calculate the cosine similarity between the two texts, we provide a helper function.

In [None]:
from aleph_alpha_client import SemanticEmbeddingRequest, SemanticRepresentation, Prompt
from typing import Sequence
import math

# helper function to calculate similarity
def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
    "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
    sumxx, sumxy, sumyy = 0, 0, 0
    for i in range(len(v1)):
        x = v1[i]; y = v2[i]
        sumxx += x*x
        sumyy += y*y
        sumxy += x*y
    return sumxy/math.sqrt(sumxx*sumyy)

### Using semantic embeddings
Now we can use the semantic similarity to find similar texts.

Let's compare two texts, one in English and one in Italian to see how it works.

In this case, both sentences have the same meaning. So they should return a high similarity score.

In [None]:
# define the texts
texts = ["The sun is shining", "Il sole splende"]

symmetric_embeddings = []

for text in texts:
    symmetric_params = {
        "prompt": Prompt.from_text(text),
        "representation": SemanticRepresentation.Symmetric,
        "compress_to_size": 128
    }
    symmetric_request = SemanticEmbeddingRequest(**symmetric_params)
    symmetric_response = client.semantic_embed(request=symmetric_request, model="luminous-base")
    symmetric_embeddings.append(symmetric_response.embedding)

# show the similarity
print(cosine_similarity(symmetric_embeddings[0], symmetric_embeddings[1]))

### The embeddings
Let's also take a look at the embeddings themselves and what they actuall look like.

In the cell below we print the first 100 elements of the embedding.

The default behavior is to return the full embedding with 5120 dimensions (like in this case). You can also compress the returned embeddings to 128 dimensions. The compression is expected to result in a small drop in accuracy performance (4-6%), with the benefit of being much smaller, which makes comparing these embeddings much faster for use cases where speed is critical.

In [None]:
%%time
print(symmetric_embeddings[0][:100], "\n")
print(symmetric_embeddings[1][:100])

### Multimodal Similarity
Amazingly, as Luminous supports multimodal input, we can even semantically compare texts to images with Luminous Explore.

This is not an explicitly developed feature but rather an emergent property of the model.

*Please keep in mind, that multi-modal semantic similarity is probably less robust than text to text similarity.*

In [None]:
from aleph_alpha_client import Image, Text
from IPython.display import Image as ImageShow

url = "https://cdn-images-1.medium.com/max/1200/1*HunNdlTmoPj8EKpl-jqvBA.png"
image_prompt = Image.from_image_source(url)
positive_text = Text.from_text("A neural network Architecture with Attention and Embeddings")
negative_text = Text.from_text("An image of a beatuiful beach")
symmetric_embeddings = []

items = [image_prompt, positive_text, negative_text]

for item in items:
    symmetric_params = {
        "prompt": Prompt([item]),
        "representation": SemanticRepresentation.Symmetric,
        "compress_to_size": 128
    }
    symmetric_request = SemanticEmbeddingRequest(**symmetric_params)
    symmetric_response = client.semantic_embed(request=symmetric_request, model="luminous-base")
    symmetric_embeddings.append(symmetric_response.embedding)

# show the similarity
positive_embedding = cosine_similarity(symmetric_embeddings[0], symmetric_embeddings[1])
negative_embedding = cosine_similarity(symmetric_embeddings[0], symmetric_embeddings[2])

# print the Image and calculated embeddings
display(ImageShow(url=url, width=200, height=300))
print(f"The score for the positive example is: {positive_embedding}")
print(f"The score for the negative example is: {negative_embedding}")