# 02 REST API | 02 Other Models (Whisper, Embedding, DALL-E)

## Whisper - Speech to Text

The Whisper model is a speech to text model from OpenAI that you can use to transcribe audio files. The model is trained on a large dataset of English audio and text. The model is optimized for transcribing audio files that contain speech in English. The model can also be used to transcribe audio files that contain speech in other languages. The output of the model is English text.

- See the [documentation](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/whisper-overview) for more information about the Whisper model.
- Get started with the Whisper model in this [Quickstart](https://learn.microsoft.com/en-us/azure/ai-services/openai/whisper-quickstart?tabs=command-line).


## Embedding - Text to Vector

Embeddings provide a vectorized representation of words or phrases, encapsulating their ***meaning*** and ***context***. Each embedding consists of a vector filled with floating-point numbers, where the distance between any two embeddings in this vector space signifies the semantic similarity between the respective inputs. In machine learning terminology, an embedding is recognized as a feature vector. Other examples of feature vectors include one-hot encoding and bag-of-words representations.

- See the [documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=console) for more information about the Embedding model.

## Pre-requisites

In order to run this sample, you need to have the following pre-requisites:

- Deploy the Whisper model to your OpenAI resource. See the [documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/whisper-quickstart?tabs=command-line#prerequisites) for more information.
- Deploy the Embedding model to your OpenAI resource. See the [documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=console#prerequisites) for more information.

## Samples

The following samples demonstrate how to use the Whisper model to transcribe audio files and how to use the Embedding model to get the vector representation of words or phrases.



### Step 1:  Setup Parameters

In [None]:
#r "nuget: DotNetEnv, 2.5.0"
#r "nuget: System.Text.Json, 7.0.3"
#r "nuget: Newtonsoft.Json, 13.0.1"
using DotNetEnv;

using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text.Json.Nodes;
using System.Text.Json;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.IO;

static string _configurationFile = @"../01_DemoEnvironment/conf/application.env";
Env.Load(_configurationFile);

string apiBase = Environment.GetEnvironmentVariable("SKIT_AOAI_ENDPOINT"); 
string apiKey = Environment.GetEnvironmentVariable("SKIT_AOAI_APIKEY"); 
string whisperDeploymentName = Environment.GetEnvironmentVariable("SKIT_WHISPER_DEPLOYMENTNAME"); 
string adaDeploymentName = Environment.GetEnvironmentVariable("SKIT_EMBEDDING_DEPLOYMENTNAME"); 

string apiVersion = "2023-09-01-preview";
static int maxResponseToken = 200; 
int overallMaxTokens = 4096; 
string assetsFolder = Path.Combine(Directory.GetCurrentDirectory(), "..", "..", "assets");

Expected output
```
Installed Packages
    DotNetEnv, 2.5.0
    Newtonsoft.Json, 13.0.1
    System.Text.Json, 7.0.3
```

In [None]:
// Simple helper method to print a string to the console, wrapping lines at a given length
static void PrintToConsole(string input, int maxLineLength)
{
    int currentIndex = 0;
    while (currentIndex < input.Length)
    {
        // Determine the length of the substring to print
        int length = Math.Min(maxLineLength, input.Length - currentIndex);
        
        // Find the last whitespace character in the substring
        for (int i = currentIndex + length - 1; i >= currentIndex; i--)
        {
            if (char.IsWhiteSpace(input[i]))
            {
                length = i - currentIndex + 1;
                break;
            }
        }
        
        // Print the substring to the console
        Console.WriteLine(input.Substring(currentIndex, length));
        
        // Update the current index
        currentIndex += length;
    }
}

### Step 3:  Transcribe audio file with the Whisper model

For this sample, we have included a `wikipediaOcelot.wav` file to be used as an example. You can use your own audio file by changing the `filePath` variable.

Additional sample audio files can be found in the [Cognitive Services Speech sample data](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/sampledata).


In [None]:

string filePath = Path.Combine(assetsFolder, "docs" , "02_REST_API", "wikipediaOcelot.wav");
string endpoint = $"{apiBase}openai/deployments/{whisperDeploymentName}/audio/transcriptions?api-version={apiVersion}";

using (HttpClient httpClient = new HttpClient())
using (var formData = new MultipartFormDataContent())
using (var fileContent = new StreamContent(File.OpenRead(filePath)))
{
    httpClient.BaseAddress = new Uri(endpoint);
    httpClient.DefaultRequestHeaders.Add("api-key",apiKey);

    formData.Add(fileContent, "file", Path.GetFileName(filePath));
    fileContent.Headers.ContentType = new MediaTypeHeaderValue("multipart/form-data");
    var response = await httpClient.PostAsync(endpoint, formData);

    if (response.IsSuccessStatusCode)
    {
        var responseBody = await response.Content.ReadAsStringAsync();
        PrintToConsole(responseBody,80);
    }
    else
    {
        Console.WriteLine($"Error: {response}");        
    }
}


Expected output
```
{"text":"The ocelot, Lepardus paradalis, is a small wild cat native to the 
southwestern United States, Mexico, and Central and South America. This 
medium-sized cat is characterized by solid black spots and streaks on its coat, 
round ears, and white neck and undersides. It weighs between 8 and 15.5 
kilograms, 18 and 34 pounds, and reaches 40 to 50 centimeters – 16 to 20 inches 
– at the shoulders. It was first described by Carl Linnaeus in 1758. Two 
subspecies are recognized, L. p. paradalis and L. p. mitis. Typically active 
during twilight and at night, the ocelot tends to be solitary and territorial. 
It is efficient at climbing, leaping, and swimming. It preys on small 
terrestrial mammals such as armadillo, opossum, and 
lagomorphs."}
```

### Step 4:  Get Embedding for a word or phrase

In [None]:
string query = "What is the capital of Germany?";
endpoint = $"{apiBase}openai/deployments/{adaDeploymentName}/embeddings?api-version={apiVersion}";

using (HttpClient httpClient = new HttpClient())
{
    httpClient.BaseAddress = new Uri(endpoint);
    httpClient.DefaultRequestHeaders.Add("api-key",apiKey);
    
    // Create HttpContent and set its ContentType
    string jsonQuery = $"{{\"input\":\"{query}\"}}";
    HttpContent content = new StringContent(jsonQuery, Encoding.UTF8, "application/json");
    HttpResponseMessage response = await httpClient.PostAsync("", content);
        if (response.IsSuccessStatusCode)
        {
            string result = await response.Content.ReadAsStringAsync();
            Console.WriteLine(result);
        }
        else
        {
            Console.WriteLine($"Error: {response.ReasonPhrase}");
        }


}

## Next Steps

See how you can use JSON mode to get JSON formatted results from a LLM call. [JSON Mode](./03_JsonMode.ipynb)