# Notebook 3: Understanding Multi-Connector Text Completion with Arithmetic Mocks

## Introduction

In this notebook, we'll explore the capabilities of the Multi-Connector package for text completion, using simple test connectors that were introduce to demonstrate the general behavior.

We create Artithmetic capable text completion connectors. Those connectors support all or some of the 4 basic arithmetic operations. They can also answer to a prompt asking to vet the result of such a connector, by computing the result and comparing the connector's answer.

A multiconnector is then created with one slow artihmetic connector capable of all operations, and several cheaper connectors capable of only one basic operation each.
The Multiconnector is capable of testing the various connectors, and eventually routing arithmetic operations from the slow connector to the cheapest specialized connectors. 

This will set the stage for more advanced use-cases involving real large and small LLMs performing real world semantic functions.

## Setup

Import Semantic Kernel SDK and Multi-Connector from NuGet

In [None]:
// Import Semantic Kernel
#r "nuget: Microsoft.SemanticKernel, [Your_Version]"
// Import Multi-Connector
#r "nuget: MyIA.SemanticKernel.Connectors.AI.MultiConnector, [Your_Version]"

## Creating the connectors characteristics

We define a series of variable to represent the capabilities of our mode. We want a primary connector that is more capable but slower and more expensive than our specialized secondary connectors.
We also define how the multiconnector should prioritize performance gains, by assigning weights to cost and duration gains.
We'll see later how this affects the multiconnector's routing decisions.

In [None]:
//Defining primary and secondary characteristics
var primaryDuration = 20; // in milliseconds effectively awaited for
var secondaryDuration = 2;
var primaryCost = 0.02m;
var secondaryCost = 0.01m;

// Defining how the multiconnector should prioritize performance gains
var durationWeight = 1;
var costWeight = 1;

## Creating the Settings

Here, we configure settings to enable analysis and let the connector discover the best settings. We only require 1 test for vetting as it is easy to figure operands that properly disciminate between the 4 basic arithmetic operations.

In [None]:
using MyIA.SemanticKernel.Connectors.AI.MultiConnector;
using MyIA.SemanticKernel.Connectors.AI.MultiConnector.Analysis;
using System.IO;

//We configure settings to enable analysis, and let the connector discover the best settings, updating on the fly and deleting analysis file 
var settings = new MultiTextCompletionSettings()
{
    AnalysisSettings = new MultiCompletionAnalysisSettings()
    {
        EnableAnalysis = true,
        NbPromptTests = 1,
        AnalysisAwaitsManualTrigger = true,
        AnalysisDelay = TimeSpan.Zero,
        TestsPeriod = TimeSpan.Zero,
        EvaluationPeriod = TimeSpan.Zero,
        SuggestionPeriod = TimeSpan.Zero,
        UpdateSuggestedSettings = true,
        //Uncomment the following lines for additional debugging information
        DeleteAnalysisFile = false,
        SaveSuggestedSettings = true
    },
    PromptTruncationLength = 11,
    ConnectorComparer = MultiTextCompletionSettings.GetWeightedConnectorComparer(durationWeight, costWeight),
    // Uncomment to enable additional logging of MultiTextCompletion calls, results and/or test sample collection
    LogCallResult = true,
    //LogTestCollection = true,
};

// Cleanup in case the previous test failed to delete the analysis file
if (File.Exists(settings.AnalysisSettings.AnalysisFilePath))
{
    File.Delete(settings.AnalysisSettings.AnalysisFilePath);

    display($"Deleted preexisting analysis file: {settings.AnalysisSettings.AnalysisFilePath}");
}

## Creating Arithmetic Connectors

We create one slow, omnipotent arithmetic connector and four fast, specialized arithmetic connectors.
We can define a method that creates the connectors from their characteritics. 

In [None]:
using Microsoft.SemanticKernel.AI.TextCompletion;
using MyIA.SemanticKernel.Connectors.AI.MultiConnector;
using MyIA.SemanticKernel.Connectors.AI.MultiConnector.ArithmeticMocks;

//Method to build the completion connectors according to characteristics and settings

public List<NamedTextCompletion> CreateCompletions(MultiTextCompletionSettings settings, TimeSpan primaryCallDuration, 
    decimal primaryCostPerRequest, TimeSpan secondaryCallDuration, decimal secondaryCostPerRequest, CallRequestCostCreditor? creditor)
{
    var toReturn = new List<NamedTextCompletion>();

    //Build primary connectors with default multi-operation engine
    var primaryConnector = new ArithmeticCompletionService(settings,
        new List<ArithmeticOperation>() { ArithmeticOperation.Add, ArithmeticOperation.Divide, ArithmeticOperation.Multiply, ArithmeticOperation.Subtract },
        new(),
        primaryCallDuration,
        primaryCostPerRequest, creditor);
    var primaryCompletion = new NamedTextCompletion("Primary", primaryConnector)
    {
        CostPerRequest = primaryCostPerRequest,
    };

    toReturn.Add(primaryCompletion);

    //Build secondary specialized connectors, specialized single-operation engine
    foreach (var operation in primaryConnector.SupportedOperations)
    {
        var secondaryConnector = new ArithmeticCompletionService(settings,
            new List<ArithmeticOperation>() { operation },
            new ArithmeticEngine()
            {
                ComputeFunc = (arithmeticOperation, operand1, operand2) => ArithmeticEngine.Compute(operation, operand1, operand2)
            },
            secondaryCallDuration,
            secondaryCostPerRequest, creditor);
        var secondaryCompletion = new NamedTextCompletion($"Secondary - {operation}", secondaryConnector)
        {
            CostPerRequest = secondaryCostPerRequest
        };

        toReturn.Add(secondaryCompletion);
    }

    return toReturn;
}


// We configure a primary completion with default performances and cost, secondary completion have a gain of 2 in performances and in cost, but they can only handle a single operation each

var creditor = new CallRequestCostCreditor();

var completions = CreateCompletions(settings, TimeSpan.FromMilliseconds(primaryDuration), primaryCost, 
    TimeSpan.FromMilliseconds(secondaryDuration), secondaryCost, creditor);

## Creating the operation computing completion Jobs

We create jobs that will be used to test the connectors.
We create a method that returns completion jobs from the operands.
We then call that method with operands that properly disciminate accross basic operations.
We figured 8 and 2 are good operands since we have: 8+2=10 != 8*2=16 != 8-2=6 != 8/2=4 

In [None]:
    // Method to create sample prompts
    protected CompletionJob[] CreateSampleJobs(ArithmeticOperation[] operations, int operand1, int operand2)
    {
        var requestSettings = new CompleteRequestSettings()
        {
            Temperature = 0,
            MaxTokens = 10
        };
        var prompts = operations.Select(op => ArithmeticEngine.GeneratePrompt(op, operand1, operand2)).ToArray();
        return prompts.Select(p => new CompletionJob(p, requestSettings)).ToArray();
    }

    // Using the method with 8 and 2

     var completionJobs = CreateSampleJobs(Enum.GetValues(typeof(ArithmeticOperation)).Cast<ArithmeticOperation>().ToArray(), 8, 2);

## Creating a Job Running Helper Method

This helper method will run the completion jobs and collect the results.

In [None]:
using System.Diagnostics;

public static async Task<List<(string result, TimeSpan duration, decimal expectedCost)>> RunPromptsAsync(CompletionJob[] completionJobs, MultiTextCompletion multiConnector, Func<string, string, decimal> completionCostFunction)
{
    List<(string result, TimeSpan duration, decimal expectedCost)> toReturn = new();
    foreach (var job in completionJobs)
    {
        var stopWatch = Stopwatch.StartNew();
        var result = await multiConnector.CompleteAsync(job.Prompt, job.RequestSettings).ConfigureAwait(false);
        stopWatch.Stop();
        var duration = stopWatch.Elapsed;
        var cost = completionCostFunction(job.Prompt, result);
        toReturn.Add((result, duration, cost));
    }

    return toReturn;
}

## Creating the MultiCompletion and Setting Up Events

Here, we create the MultiCompletion instance and set up events to capture optimization results.

In [None]:
var multiConnector = new MultiTextCompletion(settings, completions[0], this.CleanupToken.Token, loggerFactory: this.TestOutputHelper, otherCompletions: completions.Skip(1).ToArray());

// Create a task completion source to signal the completion of the optimization
var optimizationCompletedTaskSource = new TaskCompletionSource<SuggestionCompletedEventArgs>();

// Subscribe to the OptimizationCompleted event
settings.AnalysisSettings.SuggestionCompleted += (sender, args) =>
{
    // Signal the completion of the optimization
    optimizationCompletedTaskSource.SetResult(args);
};

// Subscribe to the OptimizationCompleted event
settings.AnalysisSettings.AnalysisTaskCrashed += (sender, args) =>
{
    // Signal the completion of the optimization
    optimizationCompletedTaskSource.SetException(args.CrashEvent.Exception);
};

## Running the First Pass and Collecting Results

We run the first pass, which involves running the jobs with the primary connector, collecting samples, running tests on those samples on the secondary connectors, and evaluating the results using vetting capabilities, and optimizing settings accordingly. 

In [None]:
settings.EnablePromptSampling = true;

var primaryResults = await RunPromptsAsync(completionJobs, multiConnector, completions[0].GetCost).ConfigureAwait(false);

var firstPassEffectiveCost = creditor.OngoingCost;
decimal firstPassExpectedCost = primaryResults.Sum(tuple => tuple.expectedCost);
//We remove the first prompt in time measurement because it is longer on first pass due to warmup
var firstPassDurationAfterWarmup = TimeSpan.FromTicks(primaryResults.Skip(1).Sum(tuple => tuple.duration.Ticks));

// We disable prompt sampling to ensure no other tests are generated
settings.EnablePromptSampling = false;

// release optimization task
settings.AnalysisSettings.AnalysisAwaitsManualTrigger = false;
settings.AnalysisSettings.ReleaseAnalysisTasks();
// Get the optimization results
var optimizationResults = await optimizationCompletedTaskSource.Task.ConfigureAwait(false);

## Running the Second Pass and Collecting Results

After optimization, we run the second pass to see how well the MultiConnector performs, using optimized settings.

In [None]:
 creditor.Reset();

 // Redo the same requests with the new settings
 var secondaryResults = await RunPromptsAsync(completionJobs, multiConnector, (s, s1) => expectedCost).ConfigureAwait(false);
 decimal secondPassExpectedCost = secondaryResults.Sum(tuple => tuple.expectedCost);
 var secondPassEffectiveCost = creditor.OngoingCost;

 //We also remove the first prompt in time measurement on second pass to align comparison

 var secondPassDurationAfterWarmup = TimeSpan.FromTicks(secondaryResults.Skip(1).Sum(tuple => tuple.duration.Ticks));

## Asserting the optimization succeeded

From all the collected evidence, we can now verify that our multiconnector did optimize the computation cost using the appropriate secondary connectors. 

In [None]:
 // Asserting results are correct:
 
 for (int index = 0; index < completionJobs.Length; index++)
 {
     string? prompt = completionJobs[index].Prompt;
     var parsed = ArithmeticEngine.ParsePrompt(prompt);
     var realResult = ArithmeticEngine.Compute(parsed.operation, parsed.operand1, parsed.operand2).ToString(CultureInfo.InvariantCulture);
     display($"Prompt: {prompt}, Real result: {realResult}, First pass result: {primaryResults[index].result}, Second pass result: {secondaryResults[index].result}");
 }

// Asserting cost gains are as expected

display($"First pass expected cost: {firstPassExpectedCost}, First pass effective cost: {firstPassEffectiveCost}");
display($"Second pass expected cost: {secondPassExpectedCost}, Second pass effective cost: {secondPassEffectiveCost}");
 
// Asserting duration gains are as expected

display($"First pass duration: {firstPassDurationAfterWarmup}, Second pass duration: {secondPassDurationAfterWarmup}");

## Conclusion

We've successfully demonstrated how the Multi-Connector can optimize costs and performance by routing operations to the most specialized connectors. This sets the stage for more advanced scenarios where we'll offload semantic functions from larger models to smaller, specialized ones.