# Notebook 3: Simple Search

In this notebook you learn how to compare text using semantic search.

### Optional: Install the client
You can skip this step, if you have already installed the `aleph_alpha_client`. Make sure you have the [latest pip version](https://pip.pypa.io/en/stable/installation/) installed before proceeding. 

In [1]:
!pip install aleph_alpha_client



### Instantiate the model
Instantiate a model by providing the `model_name` and `token` for authentification. If you don't have one already, create one in your [Aleph Alpha profile](https://app.aleph-alpha.com/profile). To use semantic embeddings ([luminous-explore](https://www.aleph-alpha.com/luminous-explore-a-model-for-world-class-semantic-representation)), we need to supply `luminous-base` as the model name.

In [5]:
from aleph_alpha_client import AlephAlphaModel
import os
model = AlephAlphaModel.from_model_name(model_name="luminous-base", token=os.getenv("API_TOKEN"))

### Compare the similarity of two texts

To compare two texts, you need to embed both and calculate the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between them. For demonstration purposes we prepared helper functions.

In [6]:
from aleph_alpha_client import SemanticEmbeddingRequest, SemanticRepresentation, Prompt
from typing import Sequence
import math

# helper function for symmetric embedding
def embed(text: str, representation: SemanticRepresentation):
    # Create a symmetric SemanticEmbeddingRequest
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=representation)
    semantic_embedding = model.semantic_embed(request)
    return semantic_embedding.embedding

# helper function to calculate similarity
def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
    "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
    sumxx, sumxy, sumyy = 0, 0, 0
    for i in range(len(v1)):
        x = v1[i]; y = v2[i]
        sumxx += x*x
        sumyy += y*y
        sumxy += x*y
    return sumxy/math.sqrt(sumxx*sumyy)

### Using semantic similarity
Now we can use the semantic similarity to find similar texts.

Let's compare two texts, one in English and one in Italian to see how it works.

In this case, both sentences have the same meaning. So they should return a high similarity score.

In [7]:
# define the texts
text_a = "The sun is shining"
text_b = "Il sole splende"

# show the similarity
print(cosine_similarity(embed(text_a, SemanticRepresentation.Symmetric), embed(text_b, SemanticRepresentation.Symmetric)))

0.9123379711230551


### The embedding
Let's also take a look at the embedding itself and how it looks.

In the cell below we print the first 100 elements of the embedding.

The embedding is 5120 elements long, so printing all of it would be quite a lot.

In [10]:
%%time
print(embed(text_a, SemanticRepresentation.Symmetric)[:100], "\n")
print(embed(text_b, SemanticRepresentation.Symmetric)[:100])

[1.2265625, 0.71875, -0.076660156, 0.79296875, -0.17480469, -1.9921875, 0.91015625, -1.1484375, 0.42578125, -1.1328125, 1.09375, 0.24804688, 0.85546875, -1.3359375, -0.60546875, -0.96484375, 0.07128906, -0.390625, -1.28125, -0.0009765625, 0.46679688, 1.46875, 0.29492188, 1.234375, 0.40625, -0.5078125, -2.078125, 2.703125, 0.17285156, -1.9609375, 0.6796875, 2.09375, -1.015625, -0.47851562, 1.109375, 0.076660156, -2.3125, -2.65625, -0.000831604, 0.9296875, 0.18945312, -1.109375, 0.44921875, -0.54296875, 1.359375, 0.7734375, 0.796875, 0.953125, -2.34375, -0.48632812, -0.42382812, -0.5625, 1.46875, 0.55078125, 0.18554688, 0.20214844, -0.040283203, -0.22558594, 0.4453125, 1.359375, 0.2734375, -2.578125, -0.25585938, -0.07519531, 0.15136719, -0.39453125, -1.1875, 0.87890625, 0.038330078, -1.5625, 0.048339844, 0.8203125, 1.2734375, 0.022094727, 1.1328125, -0.033935547, 1.6796875, -0.22070312, 0.06591797, -0.032226562, -0.67578125, -2.078125, -2.234375, -1.34375, -0.53515625, -1.4609375, -0.29

### Multimodal Similarity
Amazingly, as luminous supports multimodal input, we can even semantically compare texts to images with Luminous Explore.

This is not an explicitly developed feature but rather an emergent property of the model.

*Please keep in mind, that multi-modal semantic similarity is probably less robust than text to text similarity.*

In [16]:
from aleph_alpha_client import ImagePrompt
from IPython.display import Image

url = "https://cdn-images-1.medium.com/max/1200/1*HunNdlTmoPj8EKpl-jqvBA.png"
prompt = ImagePrompt.from_url(url)
positive_text = "A neural network Architecture with Attention and Embeddings"
negative_text = "An image of a beatuiful beach"
positive_embedding = cosine_similarity(embed(positive_text, SemanticRepresentation.Symmetric), embed(prompt, SemanticRepresentation.Symmetric))
negative_embedding = cosine_similarity(embed(negative_text, SemanticRepresentation.Symmetric), embed(prompt, SemanticRepresentation.Symmetric))

# print the Image and calculated embeddings
display(Image(url=url, width=300, height=300))
print(f"The score for the positive example is: {positive_embedding}")
print(f"The score for the neagtive example is: {negative_embedding}")

The score for the positive example is: 0.562357064961456
The score for the neagtive example is: 0.05955637145813491
