# Automatic Speech Recognition - OpenAI Whisper Models

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

---

---
Welcome to [Amazon SageMaker Jumpstart](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html)! You can use Amazon SageMaker Jumpstart to solve many Machine Learning tasks through one-click in SageMaker Studio, or through [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/overview.html#use-prebuilt-models-with-sagemaker-jumpstart). 

In this demo notebook, we demonstrate how to use the SageMaker Python SDK for Automatic Speech Recognition. Automatic Speech recognition or speech-to-text, is a capability which enables a program to process human speech into a written format. Here, we show how to use state-of-the-art pre-trained openai whisper models for automatic speech recognition (ASR). The following OpenAI whisper models are available currently in the SageMaker Jumpstart.

| Model Name | Parameters | Multilingual |
|------------|------------|--------------|
| tiny       | 39 M       | ✓            |
| base       | 74 M       | ✓            |
| small      | 244 M      | ✓            |
| medium     | 769 M      | ✓            |
| large      | 1550 M     | ✓            |
| large-v2   | 1550 M     | ✓            |


---

1. [Set Up](#1.-Set-Up)
2. [Select a pre-trained model](#2.-Select-a-pre-trained-model)
3. [Deploy an Endpoint](#3.-Deploy-an-Endpoint)
4. [Query endpoint and parse response](#4.-Query-endpoint-and-parse-response)
5. [Supported Parameters](#4.-Supported-Parameters)
6. [Clean up the endpoint](#6.-Clean-up-the-endpoint)

### 1. Set Up

---
Before executing the notebook, there are some initial steps required for set up

---

In [None]:
%pip install --upgrade sagemaker --quiet

## 2. Select a pre-trained model

In [None]:
model_id = "huggingface-asr-whisper-large-v2"

In [None]:
import IPython
import ipywidgets as widgets
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
from sagemaker.jumpstart.filters import And


filter_value = And("task == asr", "framework == huggingface")
asr_models = list_jumpstart_models(filter=filter_value)

dropdown = widgets.Dropdown(
    value=model_id,
    options=asr_models,
    description="Sagemaker Pre-Trained Automatic Speech Recognition Models:",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)
display(IPython.display.Markdown("## Select a pre-trained model from the dropdown below"))
display(dropdown)

### 3. Deploy an Endpoint

***

Using SageMaker, we can perform inference on the pre-trained model. 

***

In [None]:
# Deploying the model

from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.serializers import JSONSerializer

# The model is deployed on the ml.g5.2xlarge instance. To see all the supported parameters by the JumpStartModel
# class use this link - https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.jumpstart.model.JumpStartModel
my_model = JumpStartModel(model_id=dropdown.value)
predictor = my_model.deploy()

### 4. Query endpoint and parse response

---
We will download one of the audio file for the automatic speech recognition. We will pass this file to the predictor for inference.

---

In [None]:
import json
import boto3
from sagemaker.jumpstart import utils

# The wav files must be sampled at 16kHz (this is required by the automatic speech recognition models), so make sure to resample them if required. The input audio file must be less than 30 seconds.
s3_bucket = utils.get_jumpstart_content_bucket()
key_prefix = "training-datasets/asr_notebook_data"
input_audio_file_name = "sample1.wav"

s3_client = boto3.client("s3")
s3_client.download_file(s3_bucket, f"{key_prefix}/{input_audio_file_name }", input_audio_file_name)

with open(input_audio_file_name, "rb") as file:
    wav_file_read = file.read()

# If you receive client error (413) please check the payload size to the endpoint. Payloads for SageMaker invoke endpoint requests are limited to about 5MB
response = predictor.predict(wav_file_read)
print(response["text"])

### 5. Supported Parameters

***
This model supports many parameters while performing inference. They include:

* **max_length:** Model generates text until the output length. If specified, it must be a positive integer.
* **language and task:** We specify the output language and task here. The model supports the task of transcription or translation.
* **max_new_tokens:** The maximum numbers of tokens to generate.
* **num_return_sequences:** Number of output sequences returned. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelihood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments

***

In [None]:
# The file must be sampled at 16kHz (this is required by the automatic speech recognition models), so make sure to resample them if required. Also, the input audio file must be less than 30 seconds.
input_audio_file_name = "sample_french1.wav"

s3_client.download_file(s3_bucket, f"{key_prefix}/{input_audio_file_name }", input_audio_file_name)

with open(input_audio_file_name, "rb") as file:
    wav_file_read = file.read()

payload = {"audio_input": wav_file_read.hex(), "language": "french", "task": "translate"}

predictor.serializer = JSONSerializer()
predictor.content_type = "application/json"

# If you receive client error (413) please check the payload size to the endpoint. Payloads for SageMaker invoke endpoint requests are limited to about 5MB
response = predictor.predict(payload)
# We will get the output translated to english for the french audio file
print(response["text"])

### 6. Clean up the endpoint

In [None]:
# Delete the SageMaker endpoint
predictor.delete_model()
predictor.delete_endpoint()

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|automatic-speech-recognition.ipynb)