# Notebook 04: Multi-Connector Optimization with Explicit Summarization Prompts

## Objective

In this notebook, we aim to demonstrate the power of multi-connector optimization. Specifically, we'll use a straightforward summarization prompt to show how the system can intelligently choose between different connectors (ChatGPT and Oobabooga models) based on performance metrics like speed and cost.

## Prerequisites

- **OpenAI Key**: Make sure you've run `00-AI-settings.ipynb` to set up your OpenAI key. We'll use ChatGPT as our primary connector.
- **Oobabooga Model**: Make sure you have at least one Oobabooga model up and running. This model should be configured in the `Multiconnector` section of your `settings.json` file. The number of models you can run and their size will depend on your VRam capabilities. Suggestions are made in the multistart scripts and default settings.
- **Prior Knowledge**: A basic understanding of the multi-connector pipeline is required. If you're new to this, consider going through `03-multiConnector-intro-with-arithmetic-mocks.ipynb` first.

## 1. Setup

### 1.1 Import Required Libraries

First things first, let's import all the necessary libraries. These libraries will help us in loading configurations, running the multi-connector.

In [1]:
//Import package for loading hierarchichal settings from settings.json
#r "nuget: Microsoft.Extensions.Configuration"
#r "nuget: Microsoft.Extensions.Configuration.Json"
#r "nuget: Microsoft.Extensions.Configuration.Binder"

// Import Oobabooga connector package
#r "nuget: MyIA.SemanticKernel.Connectors.AI.Oobabooga"
// Import Multiconnector package
#r "nuget: MyIA.SemanticKernel.Connectors.AI.MultiConnector"

Let's import semanti kernel package too. This is pretty much optional since Multiconnector package contains a reference to it. But it will tell us which version is in use.

In [2]:
// Import Semantic Kernel
#r "nuget: Microsoft.SemanticKernel"

### 1.2 Load Settings

Here, we load the OpenAI and Multiconnector configurations from the settings file.

In [3]:
// Load configuration using builder package
using System.IO;
using Microsoft.Extensions.Configuration;
using MyIA.SemanticKernel.Connectors.AI.MultiConnector.Configuration;

var builder = new ConfigurationBuilder()
    .SetBasePath(Directory.GetCurrentDirectory())
    .AddJsonFile("config/settings.json", optional: false, reloadOnChange: true);

IConfiguration configuration = builder.Build();

var openAIConfiguration = configuration.GetSection("OpenAI").Get<OpenAIConfiguration>();
var multiOobaboogaConnectorConfiguration = configuration.GetSection("MultiConnector").Get<MultiOobaboogaConnectorConfiguration>();

### 1.3 Set up MultiTextCompletion settings

There are many parameters controlling how the multiconnector will work and perform optimization. We need to create an instance of the corresponding class.

Also, because we'll be measuring costs to perform our optimization, we need to create an creditor object dedicated to that, and we'll configure the settings to only account for completion request costs, discarding concern about duration. 

We'll stick to default parameters for everything else.

In [4]:
using MyIA.SemanticKernel.Connectors.AI.MultiConnector;
using System.Text.Json;
using System.Text.Json.Serialization;
using MyIA.SemanticKernel.Connectors.AI.MultiConnector.PromptSettings;

var creditor = new CallRequestCostCreditor();

// The most common settings for a MultiTextCompletion are illustrated below, most of them have default values and are optional
var settings = new MultiTextCompletionSettings()
{
    Creditor = creditor,
    // We set connectors comparer to only attend to completion cost
    ConnectorComparer = MultiTextCompletionSettings.GetWeightedConnectorComparer(0,1),
    AnalysisSettings = new()
    {
        // We set the maximum number of tests to perform to 1.
        // Alternatively, we could define a temperature transformation to set a positive temperature and gather distinct test results from our single prompt
        NbPromptTests = 1
    } 
};

string jsonString = JsonSerializer.Serialize(settings, new JsonSerializerOptions() { WriteIndented = true });
display(jsonString);

{
  "FreezePromptTypes": false,
  "PromptTruncationLength": 20,
  "AdjustPromptStarts": false,
  "EnablePromptSampling": true,
  "MaxInstanceNb": 10,
  "AnalysisSettings": {
    "EnableAnalysis": false,
    "AnalysisFilePath": ".\\MultiTextCompletion-analysis.json",
    "AnalysisDelay": "00:00:01",
    "AnalysisAwaitsManualTrigger": false,
    "EnableConnectorTests": true,
    "TestPrimaryCompletion": true,
    "TestsPeriod": "00:00:10",
    "MaxDegreeOfParallelismTests": 1,
    "MaxDegreeOfParallelismConnectorsByTest": 3,
    "EnableTestEvaluations": true,
    "EvaluationPeriod": "00:00:10",
    "MaxDegreeOfParallelismEvaluations": 5,
    "UseSelfVetting": false,
    "EnableSuggestion": true,
    "SuggestionPeriod": "00:01:00",
    "UpdateSuggestedSettings": true,
    "SaveSuggestedSettings": false,
    "DeleteAnalysisFile": true,
    "MultiCompletionSettingsFilePath": ".\\MultiTextCompletionSettings.json",
    "NbPromptTests": 1,
    "VettingPromptTransform

## 2. Initialization

With all the settings created, we can now create the semantic kernel that we'll use to run our tests.

### 2.1 Create primary and secondary completions

In this step, we'll initialize our primary and secondary text completions. The primary completion will be based on the OpenAI configuration, while the secondary ones will be based on the Oobabooga models.

In [5]:
using System.Threading;
using Microsoft.SemanticKernel.AI.TextCompletion;
using Microsoft.SemanticKernel.Connectors.AI.OpenAI.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.AI.OpenAI.TextCompletion;

// Creating a cancellation token source to be able to cancel the request
CancellationTokenSource cleanupToken = new();

//Creating the primary connector. We use the OpenAI connector here, either text or chat completion depending on the configuration
 ITextCompletion openAiConnector;

 string testOrChatModelId;
 if (openAIConfiguration.ChatModelId != null)
 {
     testOrChatModelId = openAIConfiguration.ChatModelId;
     openAiConnector = new OpenAIChatCompletion(testOrChatModelId, openAIConfiguration.ApiKey);
 }
 else
 {
     testOrChatModelId = openAIConfiguration.ModelId;
     openAiConnector = new OpenAITextCompletion(testOrChatModelId, openAIConfiguration.ApiKey);
 }

 // Creating the corresponding named completion
 var openAiNamedCompletion = new NamedTextCompletion(testOrChatModelId, openAiConnector)
{
    MaxTokens = openAIConfiguration.MaxTokens,
    CostPer1000Token = openAIConfiguration.CostPer1000Token,
    TokenCountFunc = MultiOobaboogaConnectorConfiguration.TokenCountFunctionMap[openAIConfiguration.TokenCountFunction],
    //We did not observe any limit on Open AI concurrent calls
    MaxDegreeOfParallelism = 5,
};

display($"Name of Primary Completion: {openAiNamedCompletion.Name}");

 // Creating the secondary connectors. We use a dedicated helper, but you can create them manually if you want.
 var oobaboogaCompletions = multiOobaboogaConnectorConfiguration.CreateNamedCompletions();

 for(int i = 0; i < oobaboogaCompletions.Count; i++)
 {
        display($"Name of Secondary Completion #{i}: {oobaboogaCompletions[i].Name}");
 }
 

Name of Primary Completion: gpt-3.5-turbo

Name of Secondary Completion #0: TheBloke_Synthia-13B-v1.2-GGUF

Name of Secondary Completion #1: TheBloke_Mistral-7B-OpenOrca-GGUF

### 2.2 Create Kernel

Now that we have our primary and secondary completions, we can create a semantic kernel and instantiate our multiconnector completion from settings, primary and secondary completions.

We use the dedicated helper for that.

In [6]:
var builder = Microsoft.SemanticKernel.Kernel.Builder;

builder.WithMultiConnectorCompletionService(
    serviceId: null,
    settings: settings,
    mainTextCompletion: openAiNamedCompletion,
    setAsDefault: true,
    analysisTaskCancellationToken: cleanupToken.Token,
    otherCompletions: oobaboogaCompletions.ToArray());

var kernel = builder.Build();

### 2.3 Create simple inline semantic function

We'll create a simple inline semantic function that takes a long text and summarizes it. This function will serve as our test case for multi-connector optimization.

In the next notebook, we'll move on with considering skills, inputs of various complexities, static and finally dynamic plans.

In [7]:
using Microsoft.SemanticKernel;
using MyIA.SemanticKernel.Connectors.AI.MultiConnector;

var text = @"A long time ago, people wanted to tell others their stories. First, they wrote letters with their hands. They would send these letters to friends far away. Sometimes, people waited a lot of days to get a letter.

After that, a big machine called the printing press was made. It could make many copies of a story quickly. More people could read the same thing without waiting.

Next, there was a telephone. With it, people could talk and listen to friends who were far. They didn’t have to wait for letters anymore.

Then, there was a thing called television. People could watch stories on it, like a play. They didn’t need to go outside.

Lastly, came mobile phones and computers. People could send messages fast. With the internet, they could also use something called social media to share stories with many people at once.";

var prompt = $"Summarize the following text in one sentence:\n{text}\n\nSummary:";

var simpleSemanticFunction = kernel.CreateSemanticFunction(prompt, requestSettings: new MultiCompletionRequestSettings(){MaxTokensMulti = 100} );

## 3. Running and optimizing settings

Now that everything is in order we'll follow the following workflow:

- The plan is run once. The primary connector defined (Chat GPT) is used to generate our completion.
    - Performance in cost and in duration is recorded.
    - Samples are collected automatically during the run
    - Result of the plan is shown.
- An analysis task is run from samples collected during the run.
    - Each connector is tested on the samples.
    - The primary connector (ChatGPT) evaluates the test runs, vetting each connector's capability to handle each corresponding prompt type.
    - New settings are computed from the evaluation. Vetted connectors are promoted to handle the corresponding prompt types.
    - MultiCompletion settings are updated according to the analysis results.
- The original plan is reloaded and run again. This time, the secondary connectors may be used to generate some or all of the completions according to the updated settings.
    - Performance in cost and in duration is recorded.
    - Result of the plan is shown

### 3.1 Run function with Primary Connector

For this first example, we want our multiconnector to do the job automatically for us, so we'll configure our settings accordingly before we run our function.

We'll simply register to the final optimization event in order to figure out when our multiconnector has finished vetting our secondary connectors.

Note that because the optimization task in only triggered with new prompts, events won't be triggered and code won't terminate if you run it twice. If you wish to rerun optimization, you should rerun the code block that creates the settings. 

In [8]:
using MyIA.SemanticKernel.Connectors.AI.MultiConnector.Analysis;

// We enable promlpt sampling and analysis so that the multiconnector tests our prompt on our secondary connectors after it is run on the primary connector
settings.EnablePromptSampling = true;
settings.AnalysisSettings.EnableAnalysis = true;

// Subscribe to the Evaluation completed event
TaskCompletionSource<EvaluationCompletedEventArgs> evaluationCompletedTaskSource = new();
settings.AnalysisSettings.EvaluationCompleted += (sender, args) =>
{
    evaluationCompletedTaskSource.SetResult(args);
};

// Subscribe to the SuggestionCompleted event
TaskCompletionSource<SuggestionCompletedEventArgs> suggestionCompletedTaskSource = new();
settings.AnalysisSettings.SuggestionCompleted += (sender, args) =>
{
    suggestionCompletedTaskSource.SetResult(args);
};

// System.Diagnostics.Debugger.Launch();
// System.Diagnostics.Debugger.Break();

// Run the semantic function with our primary connector
 var result = await kernel.RunAsync(simpleSemanticFunction,cancellationToken: cleanupToken.Token).ConfigureAwait(false);
display($"Result from primary connector: {result}");

display($"Cost from running primary connector's completion: {creditor.OngoingCost}");

// Wait for the evaluation completed event to be raised
var analysisResults = await evaluationCompletedTaskSource.Task.ConfigureAwait(false);
display($"Evaluation for secondary connectors finished");

// Wait for the suggestion completed event to be raised
var optimizationResults = await suggestionCompletedTaskSource.Task.ConfigureAwait(false);
display($"Optimization task finished");

Result from primary connector: The text describes the evolution of communication methods from handwritten letters to the printing press, telephone, television, and finally mobile phones and computers with internet and social media capabilities.

Cost from running primary connector's completion: 0,000324

Evaluation for secondary connectors finished

Optimization task finished

### 3.2 Optimization results

Let's see the results of the evaluation and the suggested new settings.

We want at least one secondary connector to be vetted on the same prompt, and to exhibit better performances, that is, faster response and/or lower cost.

By default those 2 criteria are weighted equally to select the best connector for a given prompt type, but this is one of the many parameters you can change.

We'll serialize both the evaluation results and the suggested settings. Since those are likely to be truncated, you should use the available options proposed to display their entire content.

### 3.2.1 Analysis results

First, let's see the results of our tests and vetting evaluations.

In [9]:
var strAnalysisResults = JsonSerializer.Serialize(analysisResults.CompletionAnalysis, new JsonSerializerOptions() { WriteIndented = true });
display($"Analysis results: {strAnalysisResults}");

Analysis results: {
  "Samples": [],
  "TestTimestamp": "2023-11-14T02:03:49.6683475+01:00",
  "Tests": [],
  "EvaluationTimestamp": "2023-11-14T02:03:53.3419153+01:00",
  "Evaluations": [
    {
      "Test": {
        "ConnectorName": "gpt-3.5-turbo",
        "Prompt": "Summarize the following text in one sentence:\nA long time ago, people wanted to tell others their stories. First, they wrote letters with their hands. They would send these letters to friends far away. Sometimes, people waited a lot of days to get a letter.\n\nAfter that, a big machine called the printing press was made. It could make many copies of a story quickly. More people could read the same thing without waiting.\n\nNext, there was a telephone. With it, people could talk and listen to friends who were far. They didn\u2019t have to wait for letters anymore.\n\nThen, there was a thing called television. People could watch stories on it, like a play. They didn\u2019t need to go outside.\n\nLastly, came mo

### 3.2.2 Suggested settings

Based on our analysis, the multicompletion engine can update its settings with the results for each secondary connector

In [10]:
var strSuggestedSettings = JsonSerializer.Serialize(optimizationResults.SuggestedSettings, new JsonSerializerOptions() { WriteIndented = true });
display($"Updated settings: {strSuggestedSettings}");

Updated settings: {
  "FreezePromptTypes": false,
  "PromptTruncationLength": 20,
  "AdjustPromptStarts": false,
  "EnablePromptSampling": true,
  "MaxInstanceNb": 10,
  "AnalysisSettings": {
    "EnableAnalysis": true,
    "AnalysisFilePath": ".\\MultiTextCompletion-analysis.json",
    "AnalysisDelay": "00:00:01",
    "AnalysisAwaitsManualTrigger": false,
    "EnableConnectorTests": true,
    "TestPrimaryCompletion": true,
    "TestsPeriod": "00:00:10",
    "MaxDegreeOfParallelismTests": 1,
    "MaxDegreeOfParallelismConnectorsByTest": 3,
    "EnableTestEvaluations": true,
    "EvaluationPeriod": "00:00:10",
    "MaxDegreeOfParallelismEvaluations": 5,
    "UseSelfVetting": false,
    "EnableSuggestion": true,
    "SuggestionPeriod": "00:01:00",
    "UpdateSuggestedSettings": true,
    "SaveSuggestedSettings": false,
    "DeleteAnalysisFile": true,
    "MultiCompletionSettingsFilePath": ".\\MultiTextCompletionSettings.json",
    "NbPromptTests": 1,
    "Vetti

### 3.3 Run function with updated settings

After having confirmed that at least one of our secondary connectors was vetted with better performances, we can run the same function again with optimized settings.

 Although this is not strictly necessary here because running the same prompt won't trigger a new sample collection, we disable analysis and prompt sampling so that the multiconnector does not test our prompt on our secondary connectors after it is run on the primary connector.

In [11]:
// By disabling prompt sampling and automatic analysis, we freeze the settings to the ones suggested by the optimization task 
settings.EnablePromptSampling = false;
settings.AnalysisSettings.EnableAnalysis = false;

creditor.Reset();

// Run the semantic function with our primary connector
 var secondaryResult = await kernel.RunAsync(simpleSemanticFunction,cancellationToken: cleanupToken.Token).ConfigureAwait(false);

 display($"Result from optimized connector: {secondaryResult}");

display($"Cost from running secondary connector's completion: {creditor.OngoingCost}");

Result from optimized connector: The history of communication has evolved from handwritten letters to modern technology like telephones, television, and the internet.

Cost from running secondary connector's completion: 0,0000615

## Conclusion

In this notebook, we've walked through the process of setting up and optimizing a multi-connector system. We've seen how it can intelligently offload tasks from a primary, more expensive connector to a secondary, more cost-effective one without sacrificing performance. 

In the next notebooks, we'll delve into more complex scenarios involving skills, varying input data complexities, and dynamic plans.

In the mean time, those advanced use cases are currently illustrated in the integration tests.