[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Aleph-Alpha/examples/blob/main/exercises/08_exercise_g.ipynb)

# Exercise G: Create an evaluation for a prompt
Finally, in this notebook we will try to create an evaluation script for a simple Q&A task.

This will help to assess how good a task can be completed.

In [None]:
!pip install aleph_alpha_client
from typing import Sequence
from aleph_alpha_client import ImagePrompt, AlephAlphaClient, AlephAlphaModel, SemanticEmbeddingRequest, SemanticRepresentation, Prompt, SummarizationRequest, CompletionRequest, EvaluationRequest, Document
import math
import os

In [None]:
# instantiate the client and model
search_model = AlephAlphaModel.from_model_name("luminous-base","API_TOKEN")

model = AlephAlphaModel.from_model_name("luminous-extended","API_TOKEN")

## Pre-defined functions
To make life a bit easier for you we have defined a few functions that you can use in this notebook.


They are described in this table:
|function|description|
|---|---|
|`embed_symmetric`| Embeds a text using the symmetric model|
|`embed_query`| Embeds a query using the asymmetric model|
|`embed_document`| Embeds a document using the asymmetric model|
|`cosine_similarity`| Calculates the cosine similarity between two vectors|
|`generate_summary`| Generates a summary of a document|
|`split_text`| Splits a text into paragraphs|
|`evaluate`| Uses the evaluate functionality of luminous to evaluate how good a completion fits to a query|

In [None]:
# function for symmetric embeddings 
def embed_symmetric(text: str):
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Symmetric)
    result = search_model.semantic_embed(request)
    return result.embedding

# function for asymmetric embeddings of Queries
def embed_query(text: str):
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Query)
    result = search_model.semantic_embed(request)
    return result.embedding

# function for asymmetric embeddings of Documents
def embed_document(text: str):
    request = SemanticEmbeddingRequest(prompt=Prompt.from_text(text), representation=SemanticRepresentation.Document)
    result = search_model.semantic_embed(request)
    return result.embedding

# function to calculate similarity
def cosine_similarity(v1: Sequence[float], v2: Sequence[float]) -> float:
    "compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||*||v2||)"
    sumxx, sumxy, sumyy = 0, 0, 0
    for i in range(len(v1)):
        x = v1[i]; y = v2[i]
        sumxx += x*x
        sumyy += y*y
        sumxy += x*y
    return sumxy/math.sqrt(sumxx*sumyy)

# function for getting a summary
def generate_summary(text: str):
    request = SummarizationRequest(document=Document.from_text(text))
    result = model.summarize(request)
    return result.summary

# function that splits text by paragraphs
def split_text(text: str):
    return text.split("\n\n")

# function that evaluate two texts
def evaluate(text1: str, text2: str):
    request = EvaluationRequest(prompt=Prompt.from_text(text1), completion_expected=text2)
    result = model.evaluate(request)
    return result

## Additional functions
Here are three additional functions.

They are for easier use of search as well as question answering.

In [None]:
# pre-prompt used for the question answering task
pre_prompt = "This system answers questions based on a text.\nText:A concept of design science was introduced in 1957 by R. Buckminster Fuller[1][2] who defined it as a systematic form of designing.\nQUestion: When was the concept of design science introduced?\nAnswer: 1957\n###\nText: The Function-Behaviour-Structure ontology – or short, the FBS ontology – is an ontology of design objects, i.e. things that have been or can be designed. The Function-Behaviour-Structure ontology conceptualizes design objects in three ontological categories.\nQuestion: How many categories are there?\nAnswer: 3\n###\n"

# embeds a list of texts
def embed_document_texts(texts: Sequence[str]):
    return [embed_document(text) for text in texts]

# returns the top-scoring documents for a query
def search(query: str, embedded_documents, top_n: int = 1):
    embedded_query = embed_query(query)
    scores = [cosine_similarity(embedded_query, embedded_document) for embedded_document in embedded_documents]
    return sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_n]

# function that returns the answer to a question and uses the search function
def answer_question(question: str, embedded_documents):
    context = search(question, top_n=1, embedded_documents=embedded_documents)[0]
    req = CompletionRequest(prompt=Prompt.from_text(pre_prompt + f"Context: \"{context}\"\nQuestion: \"{question}\"\nAnswer:"), stop_sequences=["###", "\n"])
    result = model.complete(req)
    return result.completions[0].completion


## Resources

Here we have some texts and samples can be used to test your evaluation function.

In the *qa_samples*-object you can find a few samples of questions to the text with answers and the correct context.

Feel free to add more samples if you like.

In [None]:
qa_texts = [
    "WhatsApp Messenger, or simply WhatsApp, is an internationally available freeware, cross-platform centralized instant messaging (IM) and voice-over-IP (VoIP) service owned by American company Meta Platforms (formerly Facebook). It was first released in 2009.",
    "Mountain View is a city in Santa Clara County, California,[11] United States. Named for its views of the Santa Cruz Mountains,[12] it has a population of 82,376.[8]",
    "The opposite of high tech is low technology, referring to simple, often traditional or mechanical technology; for example, a slide rule is a low-tech calculating device.[5][6][7] When high tech becomes old, it becomes low tech, for example vacuum tube electronics.",
    "A startup or start-up is a company or project undertaken by an entrepreneur to seek, develop, and validate a scalable business model.[1][2] While entrepreneurship refers to all new businesses, including self-employment and businesses that never intend to become registered, startups refer to new businesses that intend to grow large beyond the solo founder.",
    ]


embedded_texts = embed_document_texts(qa_texts)

qa_samples = [{
    "question": "When was WhatsApp released?", 
    "correct_answer" : "2009", 
    "correct_context" : 0},
    {
    "question": "In which state is Mountain View located?", 
    "correct_answer" : "California", 
    "correct_context" : 1},
    {
    "question": "What is the opposite of high tech?", 
    "correct_answer" : "Low tech", 
    "correct_context" : 2},
    {
    "question": "Who undertakes a startup?", 
    "correct_answer" : "Entrepreneur", 
    "correct_context" : 3},
    ]


Let's check if our function works as expected.

In [None]:
answer_question("What is the opposite of high tech?", embedded_texts)

### Task: 
Write an automated evaluation for both the search as well as the answering function


In [None]:
# Create a function that runs through all the qa samples and checks if they return the correct answer
def eval_test_set():
    pass