# Notebook 6: Combined applications

In this notebook we will combine functionalities to create a simple application.

### Optional: Install the client
You can skip this step, if you have already installed the `aleph_alpha_client`. Make sure you have the [latest pip version](https://pip.pypa.io/en/stable/installation/) installed before proceeding.

In [None]:
!pip install aleph_alpha_client

### Instantiate Luminous
Instantiate a model by providing the `model_name` and `token` for authentification. If you don't have one already, create one in your [Aleph Alpha profile](https://app.aleph-alpha.com/profile).

Because we are using both completion and search in this notebook, we define a model for search as well as completion.

In [1]:
from aleph_alpha_client import AlephAlphaModel
model = AlephAlphaModel.from_model_name(model_name="luminous-extended", token="API_TOKEN")
search_model = AlephAlphaModel.from_model_name(model_name = "luminous-base", token="API_TOKEN")

## Pre-defined functions
To make life a bit easier for you, we have defined a few functions that you can use in this notebook.

|function|description|
|:---|:---|
|`embed`| Embeds a text using the symmetric or asymmetric model|
|`cosine_similarity`| Calculates the cosine similarity between two vectors|
|`generate_summary`| Generates a summary of a document|
|`split_text`| Splits a text into paragraphs|
|`evaluate`| Uses the evaluate functionality of luminous to evaluate how good a completion fits to a query|

In [9]:
import math
from typing import Sequence
from aleph_alpha_client import Prompt, SemanticEmbeddingRequest, SemanticRepresentation, SummarizationRequest, EvaluationRequest, Document

# helper function to embed text using the symmetric or asymmetric model
def embed(text: str, representation: SemanticRepresentation):
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=representation)
    result = search_model.semantic_embed(request)
    return result.embedding

# function to calculate similarity
def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
    "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
    sumxx, sumxy, sumyy = 0, 0, 0
    for i in range(len(v1)):
        x = v1[i]; y = v2[i]
        sumxx += x*x
        sumyy += y*y
        sumxy += x*y
    return sumxy/math.sqrt(sumxx*sumyy)

# function for getting a summary
def generate_summary(text: str):
    request = SummarizationRequest(document=Document.from_text(text))
    result = model.summarize(request)
    return result.summary

# function that splits a text by paragraphs
def split_text(text: str):
    return text.split("\n\n")

# function to evaluate two texts
def evaluate(text1: str, text2: str):
    request = EvaluationRequest(prompt=Prompt.from_text(text1), completion_expected=text2)
    result = model.evaluate(request)
    return result

### Task 1: Combining Search and Completion

This text will probably be too long to put it into a single prompt, so we will have to deal with that.

Let's create a function that searches a list of texts based on a question and then generates an answer.
- Use search to find the most relevant text
- Use a completion request with your own prompt to generate an answer

In [10]:
texts_to_search = [
    "The judiciary of Poland (Polish: sądownictwo w Polsce) are the authorities exercising the judicial power of the Polish state on the basis of Chapter 8 of the Constitution of Poland.[a] As in almost all countries of continental Europe, the Polish judiciary operates within the framework of civil law.The courts (sądy), designated by the Constitution as those exercising the administration of justice (wymiar sprawiedliwości), are the bodies that review the vast majority of cases, with the exception of those specifically assigned to the two tribunals (trybunały). ",
    "Nicholas Blake 'Nick' Solak (born January 11, 1995) is an American professional baseball second baseman and outfielder for the Texas Rangers of Major League Baseball (MLB). Solak attended Naperville North High School in Naperville, Illinois, and the University of Louisville in Louisville, Kentucky.",
    "Die Buddhistenkrise war ein Zeitraum politisch-religiöser Anspannungen in Südvietnam vom 8. Mai bis 2. November 1963. Sie wurde durch das Verbot der buddhistischen Flagge durch die Regierung Ngô Đình Diệms ausgelöst, und endete mit einem Putsch der Armee der Republik Vietnam, wobei Ngô Đình Diệm festgenommen und später getötet wurde."
]

In [25]:
from aleph_alpha_client import CompletionRequest

text_embeddings = [embed(text, SemanticRepresentation.Document) for text in texts_to_search]

def search(query: str):
    query_embedding = embed(query, SemanticRepresentation.Query)    
    # calculate the similarity between the question and the texts
    scores = [cosine_similarity(query_embedding, text_embedding) for text_embedding in text_embeddings]
    # select the text with the highest similarity
    best_text_index = scores.index(max(scores))
    return texts_to_search[best_text_index]

def answer(results: str, question: str):
    prompt = Prompt(f"context: {results}\nquestion:{question}\nanswer:")
    request = CompletionRequest(prompt=prompt, stop_sequences=["\n"])
    response = model.complete(request)    
    return response.completions[0].completion

In [24]:
query = "Who was Nick?"
result = search(query)
answer(result, query)

'Nick Solak (born January 11, 1995) is an American professional baseball second baseman and outfielder for the Texas Rangers of Major League Baseball (MLB). Solak attended Naperville North High School in Naperville, Illinois, and the University of Louisville in Louisville, Kentucky.'

### Task 2: 
Create a function that creates a guided summary of a text.
- The function should take a text as input as well as some form of guidance    
- There are several ways to solve this task. Try to find a solution that works for you.

In [7]:
import requests

text_link = "https://raw.githubusercontent.com/Aleph-Alpha/examples/main/exercises/text_to_summarize.txt"
req = requests.get(text_link)
text_to_summarize = req.text

print(text_to_summarize[:100] + "...")

Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural int...


In [15]:
# Create a function that creates a guided summary of a text.
# This is a pretty complicated task, so don't worry if you can't get it to work.
def guided_summary(text : str, guidance : str):
    # Split the text into paragraphs
    splits = split_text(text)
    embeddings = [embed(split, SemanticRepresentation.Document) for split in splits]
    
    # Embed the guidance
    embedded_guidance = embed(guidance, SemanticRepresentation.Query)
    
    # calculate a similarity for each of the splits
    list_of_scored_splits =  []   # list of tuples (split, score, position)
    list_of_scores = []
    for i in range(len(splits)):
        
        # calculate the similarity between the guidance and the split embedding
        similarity_score = cosine_similarity(embedded_guidance, embeddings[i])
        
        # add the similarity score to the list of scored splits
        list_of_scored_splits.append((splits[i], similarity_score, i))
        
        # add the similarity score to the list of scores for accessing the top 3
        list_of_scores.append(similarity_score)
        
    
    # select the top 5 splits by score
    top_5_splits = sorted(list_of_scored_splits, key=lambda x: x[1], reverse=True)[:5]
    
    # sort the top 5 splits by position
    top_5_splits = sorted(top_5_splits, key=lambda x: x[2])
    
    # join the top 5 splits into a guided text
    guided_text = "\n".join([x[0] for x in top_5_splits])
        
    # use the guided text to generate a summary
    summary = generate_summary(guided_text)
    
    return summary
    

In [18]:
print(guided_summary(text_to_summarize, "What were breakthrough discoveries?"))

• Artificial intelligence is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals and humans.
• AI research has developed methods for dealing with uncertain or incomplete information, employing concepts from probability and economics. Many of these algorithms proved to be insufficient for solving large reasoning problems because they experienced a "combinatorial explosion". Even humans rarely use the step-by-step deduction that early AI research could model. They solve most of their problems using fast, intuitive judgments.
