[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Aleph-Alpha/examples/blob/main/exercises/07_exercise_f.ipynb)

# Exercise F: Combined applications

In this notebook we will learn how to combine functionalities to create a simple application.

In [None]:
!pip install aleph_alpha_client
from typing import Sequence
from aleph_alpha_client import ImagePrompt, AlephAlphaClient, AlephAlphaModel, SemanticEmbeddingRequest, SemanticRepresentation, Prompt, SummarizationRequest, CompletionRequest, EvaluationRequest
import math
import os
import requests

Because we are using both completion and search in this notebook, we define two models:
- One for search with Luminous Explore (luminous-base)
- One for completion (luminous-extended)

In [None]:
# instantiate the client and model
search_model = AlephAlphaModel.from_model_name("luminous-base","API_TOKEN")

model = AlephAlphaModel.from_model_name("luminous-extended","API_TOKEN")

## Pre-defined functions
To make life a bit easier for you, we have defined a few functions that you can use in this notebook.


They are described in the table below:
|function|description|
|---|---|
|`embed_symmetric`| Embeds a text using the symmetric model|
|`embed_query`| Embeds a query using the asymmetric model|
|`embed_document`| Embeds a document using the asymmetric model|
|`cosine_similarity`| Calculates the cosine similarity between two vectors|
|`generate_summary`| Generates a summary of a document|
|`split_text`| Splits a text into paragraphs|
|`evaluate`| Uses the evaluate functionality of luminous to evaluate how good a completion fits to a query|

In [None]:
# function for symmetric embeddings 
def embed_symmetric(text: str):
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Symmetric)
    result = search_model.semantic_embed(request)
    return result.embedding

# function for asymmetric embeddings of Queries
def embed_query(text: str):
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Query)
    result = search_model.semantic_embed(request)
    return result.embedding

# function for asymmetric embeddings of Documents
def embed_document(text: str):
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Document)
    result = search_model.semantic_embed(request)
    return result.embedding

# function to calculate similarity
def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
    "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
    sumxx, sumxy, sumyy = 0, 0, 0
    for i in range(len(v1)):
        x = v1[i]; y = v2[i]
        sumxx += x*x
        sumyy += y*y
        sumxy += x*y
    return sumxy/math.sqrt(sumxx*sumyy)

# function for getting a summary
def generate_summary(text: str):
    request = SummarizationRequest(prompt=Prompt.from_text(text))
    result = model.summarize(request)
    return result.summary

# function that splits text by paragraphs
def split_text(text: str):
    return text.split("\n\n")

# function that evaluate two texts
def evaluate(text1: str, text2: str):
    request = EvaluationRequest(prompt=Prompt.from_text(text1), completion_expected=text2)
    result = model.evaluate(request)
    return result

### Task 1:
Create a function that searches a list of texts based on a question and then generates an answer.
- Use search to find the most relevant text
- Use a completion request with your own prompt to generate an answer

In [None]:
texts_to_search = [
    "The judiciary of Poland (Polish: sądownictwo w Polsce) are the authorities exercising the judicial power of the Polish state on the basis of Chapter 8 of the Constitution of Poland.[a] As in almost all countries of continental Europe, the Polish judiciary operates within the framework of civil law.The courts (sądy), designated by the Constitution as those exercising the administration of justice (wymiar sprawiedliwości), are the bodies that review the vast majority of cases, with the exception of those specifically assigned to the two tribunals (trybunały). ",
    "Nicholas Blake 'Nick' Solak (born January 11, 1995) is an American professional baseball second baseman and outfielder for the Texas Rangers of Major League Baseball (MLB). Solak attended Naperville North High School in Naperville, Illinois, and the University of Louisville in Louisville, Kentucky.",
    "Die Buddhistenkrise war ein Zeitraum politisch-religiöser Anspannungen in Südvietnam vom 8. Mai bis 2. November 1963. Sie wurde durch das Verbot der buddhistischen Flagge durch die Regierung Ngô Đình Diệms ausgelöst, und endete mit einem Putsch der Armee der Republik Vietnam, wobei Ngô Đình Diệm festgenommen und später getötet wurde."
]

This is a complicated Task, so try solve one task at a time.

In [None]:
text_embeddings = [embed_document(text) for text in texts_to_search]

def search_and_answer(question : str):
    
    # TODO get the query embeddings of the question and the document embeddings of the texts_to_search
    query_embedding = None #ToDo
    document_embeddings = None #ToDo
    
    # TODO calculate the similarity between the question and the texts
    scores = []
    for document in document_embeddings:
        pass #ToDo
    
    # select the text with the highest similarity
    best_text_index = scores.index(max(scores))
    best_text = texts_to_search[best_text_index]
    
    # TODO create a completion task with the best text
    prompt = Prompt.from_text("""Write a good QA prompt here""")
    
    request = CompletionRequest(prompt=prompt, stop_sequences=["\n"])
    
    # call the model to complete the task
    result = model.complete(request)    
    
    # return the answer
    return result.completions[0].completion

In [None]:
search_and_answer("Who was Nick?")

### Task 2: 
Create a function that creates a guided summary of a text.
- The function should take a text as input as well as some form of guidance
- Tipp: use multiple functions from the previous exercises
- There are several ways to solve this task. Try to find a solution that works for you.
- Tipp: look at the available functions


In [11]:
# This pulls the data from the web
text_link = "https://raw.githubusercontent.com/Aleph-Alpha/examples/main/exercises/text_to_summarize.txt"
req = requests.get(text_link)
text_to_summarize = req.text

In [None]:
# Create a function that creates a guided summary of a text.
# This is a pretty complicated task, so don't worry if you can't get it to work.
# The guidance is a string that contains the content that should be in the summary.
def guided_summary(text : str, guidance : str):
    
    # Split the text into paragraphs
    splits = split_text(text)

    # TODO embed each of the splits
    embeddings = []
    for split in splits:
        
        # TODO get the embedding of the split
        split_embedding = None
        
        embeddings.append(split_embedding)
    
    # TODO Embed the guidance
    embedded_guidance = None
    
    # calculate a similarity for each of the splits
    list_of_scored_splits =  []   # list of tuples (split, score, position)
    list_of_scores = []
    for i in range(len(splits)):
        
        # TODO calculate the similarity between the guidance and the split embedding
        similarity_score = None
        
        # add the similarity score to the list of scored splits
        list_of_scored_splits.append((splits[i], similarity_score, i))
        
        # add the similarity score to the list of scores for accessing the top 3
        list_of_scores.append(similarity_score)
        
    
    # select the top 5 splits by score
    top_5_splits = sorted(list_of_scored_splits, key=lambda x: x[1], reverse=True)[:5]
    
    # sort the top 5 splits by position
    top_5_splits = sorted(top_5_splits, key=lambda x: x[2])
    
    # join the top 5 splits into a guided text
    guided_text = "\n".join([x[0] for x in top_5_splits])
        
    # TODO use the guided text to generate a summary
    summary = None # ToDo call the generate_summary function
    
    return summary
    