# Notebook 3: A first look into search
In this notebook we will take a look at luminous explore.

Luminous-Explore is our model for semantic similarity.

With explore, you can use the meaning of text to create awesome applications.

Try it out below!

In [10]:
!pip install aleph_alpha_client

# Some addition imports for search
from typing import Sequence
from aleph_alpha_client import ImagePrompt, AlephAlphaClient, AlephAlphaModel, SemanticEmbeddingRequest, SemanticRepresentation, Prompt
from IPython.display import Image  
import math
import os



## The model for embedding

Our Embedding model is called luminous-explore.

However to access its functionalities, it is important to define luminous-base as the used model

In [14]:
# instantiate the client and model
model = AlephAlphaModel.from_model_name(model_name = "luminous-base", token="API-TOKEN")

### Simple functions for embedding and searching
Here we provide a simple function for embedding text and a simple function for calculating the similarity between two texts.

Please take a moment to understand what each function does.

The cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them.

You don't have to understand the details of the code, but you should understand the general idea.

In [15]:
# function for symmetric embedding
def embed_symmetric(text: str):
    # Create an embeddingrequest with the type set to symmetric
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Symmetric)
    # create the embedding
    result = model.semantic_embed(request)
    return result.embedding

# function to calculate similarity
def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
    "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
    sumxx, sumxy, sumyy = 0, 0, 0
    for i in range(len(v1)):
        x = v1[i]; y = v2[i]
        sumxx += x*x
        sumyy += y*y
        sumxy += x*y
    return sumxy/math.sqrt(sumxx*sumyy)

### Using semantic similarity
Now we can use the semantic similarity to find similar texts.

Let's compare two texts, one in english and one in italian to see how it works.

In this case, both sentances have the same meaning. So they should return a high similarity score.

In [16]:
# define the texts
text_a = "The sun is shining"
text_b = "Il sole splende"

# show the similarity
print(cosine_similarity(embed_symmetric(text_a), embed_symmetric(text_b)))

0.9123379711230551


## The embedding
Let's also take a look at the embedding itself and how it looks.

In the cell below we print the first 100 elements of the embedding.

The embedding is 5120 elements long, so printing all of it would be quite a lot.

In [17]:
%%time
print(embed_symmetric(text_a)[:100])
print("\n")
print(embed_symmetric(text_b)[:100])

[1.2265625, 0.71875, -0.076660156, 0.79296875, -0.17480469, -1.9921875, 0.91015625, -1.1484375, 0.42578125, -1.1328125, 1.09375, 0.24804688, 0.85546875, -1.3359375, -0.60546875, -0.96484375, 0.07128906, -0.390625, -1.28125, -0.0009765625, 0.46679688, 1.46875, 0.29492188, 1.234375, 0.40625, -0.5078125, -2.078125, 2.703125, 0.17285156, -1.9609375, 0.6796875, 2.09375, -1.015625, -0.47851562, 1.109375, 0.076660156, -2.3125, -2.65625, -0.000831604, 0.9296875, 0.18945312, -1.109375, 0.44921875, -0.54296875, 1.359375, 0.7734375, 0.796875, 0.953125, -2.34375, -0.48632812, -0.42382812, -0.5625, 1.46875, 0.55078125, 0.18554688, 0.20214844, -0.040283203, -0.22558594, 0.4453125, 1.359375, 0.2734375, -2.578125, -0.25585938, -0.07519531, 0.15136719, -0.39453125, -1.1875, 0.87890625, 0.038330078, -1.5625, 0.048339844, 0.8203125, 1.2734375, 0.022094727, 1.1328125, -0.033935547, 1.6796875, -0.22070312, 0.06591797, -0.032226562, -0.67578125, -2.078125, -2.234375, -1.34375, -0.53515625, -1.4609375, -0.29

## Multimodal Similarity
Amazingly, as luminous supports multi-modal input, we can even semantically compare texts to images.

This is not an explicitly developed feature, but rather an emergent property of the model.

*Please keep in mind, that multi-modal semantic similarity is probably less robust than text to text similarity.*

In [37]:
# get an image from a url
url = "https://cdn-images-1.medium.com/max/1200/1*HunNdlTmoPj8EKpl-jqvBA.png"

# put the image into a prompt
image_prompt = ImagePrompt.from_url(url)

# display the image from the url
Image(url=url, width=300, height=300)

In [35]:
positive_example = "A neural network Architecture with Attention and Embeddings"
negative_example = "An image of a beatuiful beach"

print(f"The score for the psotive example is: {cosine_similarity(embed_symmetric(positive_example), embed_symmetric(image_prompt))}")
print(f"The score for the psotive example is: {cosine_similarity(embed_symmetric(negative_example), embed_symmetric(image_prompt))}")

The score for the psotive example is: 0.5610950285813592
The score for the psotive example is: 0.05519817023842499
