---

### Promemoria: questo 📘 `.NET Interactive` deve essere eseguito da VS Code con [questi prerequisiti](../PREREQS.md).

---

<!-- #### How to use this notebook: 

* Just read the text and scroll along until you run into code blocks.
* Code blocks have computer code inside them — hover over the block and you can run the code.
* Run the code by hitting the ▶️ "play" button to the left. If the code runs you'll see a ✔️. If not, you'll get a ❌.
* The output and status of the code block will appear just below itself — you need to scroll down further to see it.
* Sometimes a code block will ask you for input in a hard-to-notice dialog box 👆 at the top of your notebook window.  -->

<!-- --- -->

# Recipe IV. 🍝 Memories
<!-- ## 🧑‍🍳 Cook well beyond the model's memory limits -->


La lunghezza di una richiesta dipende dal modello di LLM in uso.\
I modelli più recenti possono accettare richieste più lunghe, mentre quelli più vecchi possono accettare solo richieste più brevi.\
Di conseguenza, c'è un limite alla quantità di contesto che si può fornire in un determinato prompt.

| Model | Maximum Tokens** |
|---|---|
| ada | 2049 |
| babbage | 2049 |
| curie-001 | 2049 |
| davinci-003 | 4097 |
| GPT-4 | 8192 |

** _1 token corrisponde a circa 3 caratteri; 1 pagina del libro corrisponde a circa 500 token._

Un metodo che sta diventando sempre più popolare è quello di utilizzare il cosiddetto "**embedding**", che consiste nel rappresentare del testo come vettori numerici di grandi dimensioni.

Quando si utilizzano i modelli OpenAI o Azure OpenAI Service, il modello `ada` è una scelta economica e sufficiente per la maggior parte dei casi d'uso.\
Cominciamo a imparare generando alcuni embeddings e vediamo come funzionano in pratica.

## Step 1. Istanziare un 🔥 kernel sia per il completamento che per la generazione di incorporazioni.

Si noti che il codice sottostante include alcune nuove righe che si riferiscono all'uso del modello `text-embedding-ada-002` da usare per generare il vettore di numeri per un pezzo di testo.

In [2]:
#r "nuget: Microsoft.SemanticKernel, 0.9.61.1-preview"

#!import ../config/Settings.cs

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.KernelExtensions;
using System.IO;
using Microsoft.SemanticKernel.Configuration;
using Microsoft.SemanticKernel.SemanticFunctions;
using Microsoft.SemanticKernel.CoreSkills;
using Microsoft.SemanticKernel.Memory;

var (useAzureOpenAI, model, azureEndpoint, apiKey, orgId) = Settings.LoadFromFile();

var kernel =  Microsoft.SemanticKernel.Kernel.Builder
.Configure(c =>
{
    if (useAzureOpenAI) {
        c.AddAzureOpenAITextCompletion("davinci", model, azureEndpoint, apiKey);
        c.AddAzureOpenAIEmbeddingGeneration("ada", "text-embedding-ada-002", azureEndpoint, apiKey);
    } else {
        c.AddOpenAITextCompletion("davinci", model, apiKey, orgId);
        c.AddOpenAIEmbeddingGeneration("ada", "text-embedding-ada-002", apiKey, orgId);
    }
})
.WithMemoryStorage(new VolatileMemoryStore())
.Build();


## Step 2. Aggiungere 🍝 memories per permettere al 🔥 kernel di cucinare piatti più ricchi

Immaginate una raccolta di fatti raccolti su di voi su Internet come segue:

In [3]:
const string memoryCollectionName = "Facts About Me";

await kernel.Memory.SaveInformationAsync(memoryCollectionName, id: "LinkedIn Bio", 
    text: "I currently work in the hotel industry at the front desk. I won the best team player award.");

await kernel.Memory.SaveInformationAsync(memoryCollectionName, id: "LinkedIn History", 
    text: "I have worked as a tourist operator for 8 years. I have also worked as a banking associate for 3 years.");

await kernel.Memory.SaveInformationAsync(memoryCollectionName, id: "Recent Facebook Post", 
    text: "My new dog Trixie is the cutest thing you've ever seen. She's just 2 years old.");
    
await kernel.Memory.SaveInformationAsync(memoryCollectionName, id: "Old Facebook Post", 
    text: "Can you believe the size of the trees in Yellowstone? They're huge! I'm so committed to forestry concerns.");

Console.WriteLine("Four GIGANTIC vectors were generated just now from those 4 pieces of text above.");

Four GIGANTIC vectors were generated just now from those 4 pieces of text above.


> ✅ È necessario avere accesso al modello `text-embedding-ada-002` per eseguire correttamente quanto sopra. Si noti che il passo 1 per questa unità è diverso da tutti gli altri notebook perché ha questo requisito in più per funzionare.

Immaginate poi di voler porre al vostro LLM una domanda su di voi. Cosa farebbe?\
Dato che non sa nulla di voi, si **inventerà** semplicemente qualcosa su di voi.

In [5]:
// Create the semantic function
var myFunction = kernel.CreateSemanticFunction(@"
Tell me about me and {{$input}} in less than 70 characters.
", maxTokens: 100, temperature: 0.8, topP: 1);

// Invoke the semantic function passing "my work history" as the input
var result = await myFunction.InvokeAsync("my work history");

Console.WriteLine(result);


You're an experienced professional with a passion for excellence.


Ad esempio, la funzione semantica di cui sopra potrebbe dire:

`You are a creative problem solver with a varied work history.`

Questo potrebbe valere per chiunque, ovviamente :).

Invece di sperare che l'LLM fornisca la risposta più corretta, possiamo usare le 🍝 **memories** per creare un completamento più accurato.\
Lo facciamo trovando le memories **più simili** che abbiamo salvato, cercando tra le memorie memorizzate, assegnando il numero massimo di risultati che vogliamo ottenere con `limit` e impostando una soglia di rilevanza per la ricerca con `minRelevanceScore`.

In [6]:
string ask = "Tell me about me and my work history.";
var relatedMemory = "I know nothing.";
var counter = 0;

var memories = kernel.Memory.SearchAsync(memoryCollectionName, ask, limit: 5, minRelevanceScore: 0.77);

await foreach (MemoryQueryResult memory in memories)
{
    // The first result is the most relevant
    if (counter == 0) { relatedMemory = memory.Text; }
    Console.WriteLine($"Result {++counter}:\n  >> {memory.Id}\n  Text: {memory.Text}  Relevance: {memory.Relevance}\n");
}

Result 1:
  >> LinkedIn History
  Text: I have worked as a tourist operator for 8 years. I have also worked as a banking associate for 3 years.  Relevance: 0,8252466106558247

Result 2:
  >> LinkedIn Bio
  Text: I currently work in the hotel industry at the front desk. I won the best team player award.  Relevance: 0,8025544060686295



Ora possiamo porre la stessa domanda, ma con il contesto più rilevante che abbiamo memorizzato in `relatedMemory` per ottenere una risposta più accurata:

In [5]:
var myFunction = kernel.CreateSemanticFunction(@"
{{$input}}
Tell me about me and my work history in less than 70 characters.
", maxTokens: 100, temperature: 0.1, topP: .1);

var result = await myFunction.InvokeAsync(relatedMemory);

Console.WriteLine(result);

You have a diverse work history with a variety of skills and experiences.


## Step 3: Preparatevi per il momento **WOW**.

<!-- ### Manipulating 🍝 memories is how the token window limitation is addressed. -->

<!-- Recall the table showing the maximum tokens that can be used per model:

| Model | Maximum Tokens** |
|---|---|
| ada | 2049 |
| babbage | 2049 |
| curie-001 | 2049 |
| davinci-003 | 4097 |
| GPT-4 | 8192 |

** _1 token is approximately 3 characters; 1 page of book is roughly 500 tokens_ -->

<!-- Given this same basic technique of gathering the most similar memories that are appropriate to a prompt, it's possible to have many more memories stored and available on-hand to compare with a given prompt. And it's not necessary to include just the top hit, but also more hits that are just as similar to the "most relevant" memory available. 

This is how an entire book can be used by Semantic Kernel as a memory source to feed into a prompt by only selecting the relevant chunks of text — i.e. that which relates to the prompt. To do so you would:

1. Generate embeddings for each of the paragraphs in the book.
2. For a given prompt, find the most similar paragraphs within the book.
3. Staying within the limitation of the token size window, gather all the related paragraphs.
4. You now have a prompt with a great deal of relevant 🥑 context to send to the model.
5. Reap the benefits of an "informed" LLM AI weighing in on a particular subject for you.

Let's review this in practice. Say I have a 500-page book. 

1. I take each page and generate the embedding with `Memory.SaveInformationAsync`
2. I then take my prompt, `the best scenes are ones with flowers in it and deserve to be summarized` and use `Memory.SearchAsync` to locate the pages with flower scenes in them.
3. Let's say there are three pages that are relevant. Those three pages will be used to compose a new prompt that's simply the three pages appended to each other along with the original prompt. If instead you need to include ten pages, and exceed the token window, then summarize each of the ten pages separately into ten shorter passages. Do this until you meet the token window requirements.
4. You have the prompt to give to the model you've chosen. It has pulled the relevant information out of the 500-page book, and will do its best to summarize what you care about the most.
5. Ta-da! You'll get what you've asked for. -->

#### Extra

Per illustrare questo punto, possiamo prendere il famoso discorso di Gettysburg di Abraham Lincoln e usarlo per generare un nuovo discorso:

- 🧩 Dividiamo l'intero discorso in "chunk";
- 🔢 Utilizziamo `ada` per fare l'embedding attraverso `kernel.Memory.SaveInformationAsync()`;
- 🔍 Utilizziamo l'API Azure Cognitive Search per cercare il chunk più simile alla nostra richiesta.

Questo ci da un'idea di come può essere elaborato un file di grandi dimensioni.

In [11]:
using System;
using System.IO;
using System.Text;

public static List<string> ChunkTextFile(string filePath, int recommendedLength)
{
    List<string> chunks = new List<string>();

    // Read in the text file
    string text = File.ReadAllText(filePath);

    // Break the text into chunks of the recommended length
    int startIndex = 0;
    while (startIndex < text.Length)
    {
        int endIndex = startIndex + recommendedLength;
        if (endIndex > text.Length)
        {
            endIndex = text.Length;
        }

        // Look for a natural breakage point like a paragraph or just before a new heading
        while (endIndex < text.Length && !char.IsWhiteSpace(text[endIndex]))
        {
            endIndex++;
        }

        // Get the chunk of text
        string chunk = text.Substring(startIndex, endIndex - startIndex);

        // Strip the whitespace at the start and end of the string
        chunk = chunk.Trim();

        // Add the chunk to the list
        chunks.Add(chunk);

        // Move to the next chunk
        startIndex = endIndex;
    }

    return chunks;
}

// Get the list of chunks from the file
List<string> chunks = ChunkTextFile("./lincoln.txt", 140);

const string lincolnMemoryCollectionName = "Abe's Words";

// Add the chunks to memory
int counter = 0;
foreach (string chunk in chunks)
{
    Console.WriteLine($"Chunk {counter}: {chunk}");

    await kernel.Memory.SaveInformationAsync(lincolnMemoryCollectionName, id: $"Chunk {counter++}", 
        text: chunk);
}


Chunk 0: Four score and seven years ago our fathers brought forth upon this continent a new nation, conceived in liberty, and dedicated to the proposition
Chunk 1: that all men are created equal. (Applause.) Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived
Chunk 2: and so dedicated, can long endure. We are met on a great battle field of that war; we are met to dedicate a portion of it as the final resting
Chunk 3: place of those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this, but in a larger
Chunk 4: sense, we cannot dedicate, we cannot consecrate, we cannot hallow this ground.
The brave men, living and dead, who struggled here have consecrated
Chunk 5: it far above our poor power to add or to detract. (Applause.) The world will little note, nor long remember, what we say here; but it can never
Chunk 6: forget what they did here. (Applause.) It is for us, the living, rathe

Possiamo ora interrogare questi chunk per trovare quelli più simili che corrispondono a una semplice domanda: `"What should the people do?"`.

In [12]:
var aCounter = 0;
var myPrompt = "What should the people do?";
var myMemory = "";
var memories = kernel.Memory.SearchAsync(lincolnMemoryCollectionName, myPrompt, limit: 5, minRelevanceScore: 0.77);

await foreach (MemoryQueryResult memory in memories) {
    Console.WriteLine($"Result {++aCounter}:\n  >> {memory.Id}\n  Text: {memory.Text}  Relevance: {memory.Relevance}\n");
    myMemory += memory.Text + " ";
}

Console.WriteLine("Memory to feed back into the prompt will be:\n  >> " + myMemory+ "\n");
var myLincolnFunction = kernel.CreateSemanticFunction(@"
Lincoln said:
---
{{$input}}
---
So what should the people do?
", maxTokens: 100, temperature: 0.1, topP: .1);

var lincolnResult = await myLincolnFunction.InvokeAsync(myMemory);

Console.WriteLine("Generated response ... 'according to Lincoln':\n" + lincolnResult);


Result 1:
  >> Chunk 10
  Text: the people and for the people, shall not perish from the earth. (Long applause.)  Relevance: 0,8015134934072992

Result 2:
  >> Chunk 6
  Text: forget what they did here. (Applause.) It is for us, the living, rather to be dedicated here to the unfinished work that they have thus far  Relevance: 0,7704788293903422

Memory to feed back into the prompt will be:
  >> the people and for the people, shall not perish from the earth. (Long applause.) forget what they did here. (Applause.) It is for us, the living, rather to be dedicated here to the unfinished work that they have thus far 

Generated response ... 'according to Lincoln':

The people should continue the unfinished work of those who have gone before them, and strive to create a better future for all.


Un esempio su larga scala, è l'applicazione di esempio disponibile su GitHub Q&A at [https://aka.ms/sk/repo](https://aka.ms/sk/repo).\
Prende un intero repo di codice, lo converte in embeddings e permette di "chattare" con il repo stesso. Tenete presente che sarebbe generalmente impossibile inserire l'intero repo nella finestra di un'intelligenza artificiale LLM, ed è qui che entra in gioco l'uso delle 🍝 memories .

# ⏭️ I prossimi passi

<!-- Run through more advanced examples in the notebooks that are available in our GitHub repo at [https://aka.ms/sk/repo](https://aka.ms/sk/repo). -->

[Vediamo i 🧄 Connectors!](../e5-connectors/notebook.ipynb)

<!-- Or stay a longer while and add more facts about yourself in the `MemoryCollection`. -->