# Notebook 6: Combined applications

In this notebook we will combine functionalities to create a simple application.

In [1]:
from typing import Sequence
from aleph_alpha_client import ImagePrompt, AlephAlphaClient, AlephAlphaModel, SemanticEmbeddingRequest, SemanticRepresentation, Prompt, SummarizationRequest, CompletionRequest, EvaluationRequest, Document
import math
import os

Because we are using both completion and search in this notebook, we define two models:
- One for search with luminous-base(explore)
- One for completion with luminous-extended

In [2]:
# instantiate the client and model
search_model = AlephAlphaModel(
    AlephAlphaClient(host="https://api.aleph-alpha.com", token=os.getenv("API_TOKEN")),
    model_name = "luminous-base"
)

model = AlephAlphaModel(
    AlephAlphaClient(host="https://api.aleph-alpha.com", token=os.getenv("API_TOKEN")),
    model_name = "luminous-extended"
)

## Pre-defined functions
To make life a bit easier for you we have defined a few functions that you can use in this notebook.


They are described in this table:
|function|description|
|---|---|
|`embed_symmetric`| Embeds a text using the symmetric model|
|`embed_query`| Embeds a query using the asymmetric model|
|`embed_document`| Embeds a document using the asymmetric model|
|`cosine_similarity`| Calculates the cosine similarity between two vectors|
|`generate_summary`| Generates a summary of a document|
|`split_text`| Splits a text into paragraphs|
|`evaluate`| Uses the evaluate functionality of luminous to evaluate how good a completion fits to a query|

In [3]:
# function for symmetric embeddings 
def embed_symmetric(text: str):
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Symmetric)
    result = search_model.semantic_embed(request)
    return result.embedding

# function for asymmetric embeddings of Queries
def embed_query(text: str):
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Query)
    result = search_model.semantic_embed(request)
    return result.embedding

# function for asymmetric embeddings of Documents
def embed_document(text: str):
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Document)
    result = search_model.semantic_embed(request)
    return result.embedding

# function to calculate similarity
def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
    "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
    sumxx, sumxy, sumyy = 0, 0, 0
    for i in range(len(v1)):
        x = v1[i]; y = v2[i]
        sumxx += x*x
        sumyy += y*y
        sumxy += x*y
    return sumxy/math.sqrt(sumxx*sumyy)

# function for getting a summary
def generate_summary(text: str):
    request = SummarizationRequest(document=Document.from_text(text))
    result = model.summarize(request)
    return result.summary

# function that splits text by paragraphs
def split_text(text: str):
    return text.split("\n\n")

# function that evaluate two texts
def evaluate(text1: str, text2: str):
    request = EvaluationRequest(prompt=Prompt.from_text(text1), completion_expected=text2)
    result = model.evaluate(request)
    return result

### Using a real-life text
Next, we will load a longer text from a file.

This text will probably be too long to put in a single prompt, so we will have to deal with that.

Let's create a function that searches a list of texts based on a question and then generates an answer.
- Use search to find the most relevant text
- Use a completion request with your own prompt to generate an answer

In [4]:
texts_to_search = [
    "The judiciary of Poland (Polish: sądownictwo w Polsce) are the authorities exercising the judicial power of the Polish state on the basis of Chapter 8 of the Constitution of Poland.[a] As in almost all countries of continental Europe, the Polish judiciary operates within the framework of civil law.The courts (sądy), designated by the Constitution as those exercising the administration of justice (wymiar sprawiedliwości), are the bodies that review the vast majority of cases, with the exception of those specifically assigned to the two tribunals (trybunały). ",
    "Nicholas Blake 'Nick' Solak (born January 11, 1995) is an American professional baseball second baseman and outfielder for the Texas Rangers of Major League Baseball (MLB). Solak attended Naperville North High School in Naperville, Illinois, and the University of Louisville in Louisville, Kentucky.",
    "Die Buddhistenkrise war ein Zeitraum politisch-religiöser Anspannungen in Südvietnam vom 8. Mai bis 2. November 1963. Sie wurde durch das Verbot der buddhistischen Flagge durch die Regierung Ngô Đình Diệms ausgelöst, und endete mit einem Putsch der Armee der Republik Vietnam, wobei Ngô Đình Diệm festgenommen und später getötet wurde."
]

In [5]:
text_embeddings = [embed_document(text) for text in texts_to_search]

def search_and_answer(question : str):
    
    # get the embeddings of the question
    query_embedding = embed_query(question)    
    
    # calculate the similarity between the question and the texts
    scores = [cosine_similarity(query_embedding, text_embedding) for text_embedding in text_embeddings]
    
    # select the text with the highest similarity
    best_text_index = scores.index(max(scores))
    
    # create a completion task with the best text
    prompt = Prompt(f"context: {texts_to_search[best_text_index]}\nquestion:{question}\nanswer:")
    
    request = CompletionRequest(prompt=prompt, stop_sequences=["\n"])
    
    # call the model to complete the task
    result = model.complete(request)    
    
    # return the answer
    return result.completions[0].completion

In [6]:
search_and_answer("Who was Nick?")

'Nick Solak (born January 11, 1995) is an American professional baseball second baseman and outfielder for the Texas Rangers of Major League Baseball (MLB). Solak attended Naperville North High School in Naperville, Illinois, and the University of Louisville in Louisville, Kentucky.'

### Task 2: 
Create a function that creates a guided summary of a text.
- The function should take a text as input as well as some form of guidance
- Tipp: use multiple functions from the previous exercises
- There are several ways to solve this task. Try to find a solution that works for you.
- Tipp: look at the available functions


In [7]:
text_to_summarize = ""

# Load the text to summarize from the file
with open("text_to_summarize.txt", "r") as f:
    text_to_summarize = f.read()

print(text_to_summarize[:100] + "...")

Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural int...


In [15]:
# Create a function that creates a guided summary of a text.
# This is a pretty complicated task, so don't worry if you can't get it to work.
def guided_summary(text : str, guidance : str):
    # Split the text into paragraphs
    splits = split_text(text)

    # embed each of the splits
    embeddings = []
    for split in splits:
        embeddings.append(embed_document(split))
    
    # Embed the guidance
    embedded_guidance = embed_query(guidance)
    
    # calculate a similarity for each of the splits
    list_of_scored_splits =  []   # list of tuples (split, score, position)
    list_of_scores = []
    for i in range(len(splits)):
        
        # calculate the similarity between the guidance and the split embedding
        similarity_score = cosine_similarity(embedded_guidance, embeddings[i])
        
        # add the similarity score to the list of scored splits
        list_of_scored_splits.append((splits[i], similarity_score, i))
        
        # add the similarity score to the list of scores for accessing the top 3
        list_of_scores.append(similarity_score)
        
    
    # select the top 5 splits by score
    top_5_splits = sorted(list_of_scored_splits, key=lambda x: x[1], reverse=True)[:5]
    
    # sort the top 5 splits by position
    top_5_splits = sorted(top_5_splits, key=lambda x: x[2])
    
    # join the top 5 splits into a guided text
    guided_text = "\n".join([x[0] for x in top_5_splits])
        
    # use the guided text to generate a summary
    summary = generate_summary(guided_text)
    
    return summary
    

In [18]:
print(guided_summary(text_to_summarize, "What is an algorithm?"))

• Artificial intelligence is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals and humans.
• AI research has developed methods for dealing with uncertain or incomplete information, employing concepts from probability and economics. Many of these algorithms proved to be insufficient for solving large reasoning problems because they experienced a "combinatorial explosion". Even humans rarely use the step-by-step deduction that early AI research could model. They solve most of their problems using fast, intuitive judgments.
