[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Aleph-Alpha/examples/blob/main/exercises/05_exercise_d.ipynb)

# Exercise D: A first look into search
In this notebook we will take a look at Luminous Explore.

Luminous Explore is our model for semantic similarity.

With explore, you can use the meaning of text to create awesome applications.

Try it out below!

In [None]:
!pip install aleph_alpha_client
# Some addition imports for search
from typing import Sequence
from aleph_alpha_client import ImagePrompt, Client, SemanticEmbeddingRequest, SemanticRepresentation, Prompt
import math
from scipy.spatial.distance import cosine
import os

## The model for embedding

Our Embedding model is called Luminous Explore.

However, to access its functionalities it is important to define luminous-base as the used model.

In [None]:
# instantiate the client and model
client = Client(token="AA_TOKEN")

### Simple functions for embedding and searching
Here we provide a simple function for embedding text and as well as a function for calculating the similarity between two texts.

Please take a moment to understand what each function does.

The cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them.

You don't have to understand the details of the code, but you should understand the general idea.

In [None]:
# function for symmetric embedding
def embed_symmetric(text: str):
    # Create an embeddingrequest with the type set to symmetric
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Symmetric)
    # create the embedding
    result = client.semantic_embed(request, model="luminous-base")
    return result.embedding

# function to calculate similarity
def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
    return 1 - cosine(v1, v2)

### Tasks: 
1. Play around with the semantic similarity of the word embeddings
    - What difference does language make?
    - What difference does the size of the text make?
    - Can you find a semantic opposite of a text?


In [None]:
# define the texts
text_a = "The sun is shining"
text_b = "Il sole splende"

# show the similarity
print(cosine_similarity(embed_symmetric(text_a), embed_symmetric(text_b)))

## The embedding
Let's also take a look at the embedding itself and how it looks.

In the cell below we print the first 100 elements of the embedding.

The embedding is 5120 elements long, so printing all of it would be quite a lot.

In [None]:
%%time
print(embed_symmetric(text_a)[:100])
print("\n")
print(embed_symmetric(text_b)[:100])