## Embedding texts that are longer than the model's maximum context length
OpenAI's embedding models cannot embed text that exceeds a maximum length. The maximum length varies by model, and is measured by _tokens_, not string length. If you are unfamiliar with tokenization, check out [How to count tokens with tiktoken](How_to_count_tokens_with_tiktoken.ipynb).

This notebook shows how to handle texts that are longer than a model's maximum context length. We'll demonstrate using embeddings from `text-embedding-ada-002`, but the same ideas can be applied to other models and tasks. To learn more about embeddings, check out the [OpenAI Embeddings Guide](https://beta.openai.com/docs/guides/embeddings).

## Installation
Install the Azure Open AI SDK using the below command.

In [11]:
#r "nuget: Azure.AI.OpenAI, *-*"
#r "nuget: Microsoft.DeepDev.TokenizerLib, 1.3.2"

In [2]:
using Microsoft.DotNet.Interactive;

In [4]:
var azureOpenAIKey = await Kernel.GetPasswordAsync("Provide your OPEN_AI_KEY");

// Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
var azureOpenAIEndpoint = await Kernel.GetInputAsync("Provide the OPEN_AI_ENDPOINT");

// Enter the deployment name you chose when you deployed the model.
var deployment = await Kernel.GetInputAsync("Provide deployment name");

### Import namesapaces and create an instance of `OpenAiClient` using the `azureOpenAIEndpoint` and the `azureOpenAIKey`

In [5]:
using Azure;
using Azure.AI.OpenAI;
using System.Collections.Generic;

In [6]:
OpenAIClient client = new (new Uri(azureOpenAIEndpoint), new AzureKeyCredential(azureOpenAIKey));

In [17]:
var longText = string.Join(" ", Enumerable.Repeat("AGI", 5000));

In [19]:
var embeddingResponse = await client.GetEmbeddingsAsync(deployment, new EmbeddingsOptions(longText));


Error: Azure.RequestFailedException: This model's maximum context length is 8191 tokens, however you requested 10000 tokens (10000 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
Status: 400 (model_error)

Content:
{
  "error": {
    "message": "This model's maximum context length is 8191 tokens, however you requested 10000 tokens (10000 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}


Headers:
Access-Control-Allow-Origin: REDACTED
apim-request-id: REDACTED
X-Content-Type-Options: REDACTED
openai-processing-ms: REDACTED
x-ms-region: REDACTED
X-Request-ID: REDACTED
ms-azureml-model-error-reason: REDACTED
ms-azureml-model-error-statuscode: REDACTED
x-ms-client-request-id: 7e293405-8c6a-4a58-8115-515161dc49be
Strict-Transport-Security: REDACTED
Date: Tue, 03 Oct 2023 14:00:03 GMT
Content-Length: 294
Content-Type: application/json

   at Azure.Core.HttpPipelineExtensions.ProcessMessageAsync(HttpPipeline pipeline, HttpMessage message, RequestContext requestContext, CancellationToken cancellationToken)
   at Azure.AI.OpenAI.OpenAIClient.GetEmbeddingsAsync(String deploymentOrModelName, EmbeddingsOptions embeddingsOptions, CancellationToken cancellationToken)
   at Submission#18.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

Clearly we want to avoid these errors, particularly when handling programmatically with a large number of embeddings. Yet, we still might be faced with texts that are longer than the maximum context length. Below we describe and provide recipes for the main approaches to handling these longer texts: (1) simply truncating the text to the maximum allowed length, and (2) chunking the text and embedding each chunk individually.

## 1. Truncating the input text
The simplest solution is to truncate the input text to the maximum allowed length. Because the context length is measured in tokens, we have to first tokenize the text before truncating it. The API accepts inputs both in the form of text or tokens, so as long as you are careful that you are using the appropriate encoding, there is no need to convert the tokens back into string form. Below is an example of such a truncation function.

In [20]:
using System.Collections.Generic;
using Microsoft.DeepDev;


var tokenizer = await TokenizerBuilder.CreateByModelNameAsync("text-embedding-ada-002");

public string TruncateTextTokens(string text, int maxTokens){
    if (string.IsNullOrWhiteSpace(text)){
        return text;
    }
    var encoded = tokenizer.Encode(text, Array.Empty<string>()).Take(maxTokens).ToArray();
    return tokenizer.Decode(encoded);
}

In [21]:
var maxTokens = 8191;

var truncated = TruncateTextTokens(longText, maxTokens);
longText.Length.Display();
truncated.Length.Display();


## 2. Chunking the input text
Though truncation works, discarding potentially relevant text is a clear drawback. Another approach is to divide the input text into chunks and then embed each chunk individually. Then, we can either use the chunk embeddings separately, or combine them in some way, such as averaging (weighted by the size of each chunk).

Now we define a function that encodes a string into tokens and then breaks it up into chunks.

Finally, we can write a function that safely handles embedding requests, even when the input text is longer than the maximum context length, by chunking the input tokens and embedding each chunk individually. The `average` flag can be set to `True` to return the weighted average of the chunk embeddings, or `False` to simply return the unmodified list of chunk embeddings.

In [23]:
public IEnumerable<string> ChunkedText(string text, int maxTokens, bool average = false){
    if (string.IsNullOrWhiteSpace(text)){
        yield break;
    }
    var chunkSize = maxTokens;
    var encoded = tokenizer.Encode(text, Array.Empty<string>()).ToArray();
    var batchCount = (encoded.Length / chunkSize) + (encoded.Length % chunkSize == 0 ? 0 : 1);
    var encodedChucks = new List<int[]>();
    for(var i = 0; i < batchCount; i++)
    {
        var slice = encoded.Skip(i * maxTokens).Take(maxTokens).ToArray();
        encodedChucks.Add(slice);
        
    }
    if(average){
        var averageChunk = encodedChucks.Sum(x => x.Length) / (double)(encodedChucks.Count);
        chunkSize = (int)Math.Ceiling(averageChunk);
        batchCount = (encoded.Length / chunkSize) + (encoded.Length % chunkSize == 0 ? 0 : 1);
        encodedChucks = new List<int[]>();
        for(var i = 0; i < batchCount; i++)
        {
            var slice = encoded.Skip(i * maxTokens).Take(maxTokens).ToArray();
            encodedChucks.Add(slice);
        }
    }

    foreach(var chunk in encodedChucks){
        yield return tokenizer.Decode(chunk);
    }
}

In [25]:
var batchSize =  16;
var chunks = ChunkedText(longText, 3000, true).ToList();
var batchCount = (chunks.Count / batchSize) + (chunks.Count % batchSize == 0 ? 0 : 1);
for(var i = 0; i < batchCount; i++)
{
   var embeddings = await client.GetEmbeddingsAsync(deployment, new EmbeddingsOptions(chunks.Skip(i*batchSize).Take(batchSize)));
    embeddings.Value.Data.Display();
}

index,value
,
,
,
,
0,"Azure.AI.OpenAI.EmbeddingItemEmbedding[ -0.01849635, 0.007244624, -0.007131842, -0.024108943, -0.012545407, -0.00031160342, -0.022848431, -0.0016079815, -0.013507376, -0.025714437, 0.016957197, 0.010263218, -0.011019525, -0.012366282, 0.0016204208, 0.032720227, 0.01857596, 0.01042244, 0.007821806, -0.014356563 ... (1516 more) ]Index0"
,
Embedding,"[ -0.01849635, 0.007244624, -0.007131842, -0.024108943, -0.012545407, -0.00031160342, -0.022848431, -0.0016079815, -0.013507376, -0.025714437, 0.016957197, 0.010263218, -0.011019525, -0.012366282, 0.0016204208, 0.032720227, 0.01857596, 0.01042244, 0.007821806, -0.014356563 ... (1516 more) ]"
Index,0
1,"Azure.AI.OpenAI.EmbeddingItemEmbedding[ -0.015453695, 0.0014483652, -0.0033909671, -0.02639057, -0.00070449687, 0.0024443779, -0.013000941, -0.00092062075, -0.011459593, -0.029861955, 0.015359875, 0.011446189, -0.01529286, -0.016164057, -0.0026420727, 0.029406251, 0.01582898, 0.006755128, 0.0080820285, -0.015721757 ... (1516 more) ]Index1"
,

Unnamed: 0,Unnamed: 1
Embedding,"[ -0.01849635, 0.007244624, -0.007131842, -0.024108943, -0.012545407, -0.00031160342, -0.022848431, -0.0016079815, -0.013507376, -0.025714437, 0.016957197, 0.010263218, -0.011019525, -0.012366282, 0.0016204208, 0.032720227, 0.01857596, 0.01042244, 0.007821806, -0.014356563 ... (1516 more) ]"
Index,0

Unnamed: 0,Unnamed: 1
Embedding,"[ -0.015453695, 0.0014483652, -0.0033909671, -0.02639057, -0.00070449687, 0.0024443779, -0.013000941, -0.00092062075, -0.011459593, -0.029861955, 0.015359875, 0.011446189, -0.01529286, -0.016164057, -0.0026420727, 0.029406251, 0.01582898, 0.006755128, 0.0080820285, -0.015721757 ... (1516 more) ]"
Index,1

Unnamed: 0,Unnamed: 1
Embedding,"[ -0.015453695, 0.0014483652, -0.0033909671, -0.02639057, -0.00070449687, 0.0024443779, -0.013000941, -0.00092062075, -0.011459593, -0.029861955, 0.015359875, 0.011446189, -0.01529286, -0.016164057, -0.0026420727, 0.029406251, 0.01582898, 0.006755128, 0.0080820285, -0.015721757 ... (1516 more) ]"
Index,2

Unnamed: 0,Unnamed: 1
Embedding,"[ -0.013779954, 0.0037425405, -0.004922829, -0.027216071, -0.0010910232, -0.0009786148, -0.01724478, 0.003451601, -0.014599875, -0.03089249, 0.019307805, 0.014441181, -0.017205106, -0.010652354, 0.0004682308, 0.029913874, 0.012966646, 0.008430635, 0.0052997284, -0.014996611 ... (1516 more) ]"
Index,3


In some cases, it may make sense to split chunks on paragraph boundaries or sentence boundaries to help preserve the meaning of the text.