# Deploy KanjuTech Transcription and Speaker Diarization Model Package from AWS Marketplace 

KanjuTech's Transcription and Diarization model ensures secure end-to-end recognition of multi-participant conversations. The model efficiently handles over 12 hours of recording in just one hour on ml.p3.2xlarge. The Stable version of the model supports 10 languages with human-level accuracy (WER 3-8%). The Confusion Error Rate (CER) for audio with 6+ speakers is 2.2%.

The Release Candidate version supports an additional 19 languages with lower and unstable quality.

This sample notebook shows you how to deploy the [**KanjuTech Transcription and Diarization Model**](https://aws.amazon.com/marketplace/pp/prodview-ngtdx4ayt4emo) using Amazon SageMaker.

> **Note**: This reference notebook cannot run unless you make the suggested changes in the notebook.

## Pre-requisites:
1. **Note**: This notebook contains elements that render correctly in the Jupyter interface. Open it from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that the IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions, and you have the authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. your AWS account has a [**KanjuTech Transcription and Diarization Model**](https://aws.amazon.com/marketplace/pp/prodview-ngtdx4ayt4emo) subscription. If so, skip the step: [Subscribe to the model package](#1.-Subscribe-to-the-model-package)

## Contents:
1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)
2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
    1. [Create an endpoint](#A.-Create-an-endpoint)
    2. [Create input payload](#B.-Create-input-payload)
    3. [Perform real-time inference](#C.-Perform-real-time-inference)
    4. [Delete endpoint and model](#D.-Delete-endpoint-and-model)
3. [Perform batch inference](#3.-Perform-batch-inference) 
    1. [Create encoded input data](#A.-Create-encoded-input-data)
    2. [Run batch transform job](#B.-Run-batch-transform-job)
    3. [Delete encoded input data](#C.-Delete-encoded-input-data)
    4. [Delete the model](#D.-Delete-the-model)
4. [Visualize output](#4.-Visualize-output)
5. [Release Candidate version (optional)](#5.-Release-Candidate-version-(optional))
6. [Troubleshooting](#6.-Troubleshooting)
7. [Questions](#7.-Questions)
    
We recommend using ml.p3.2xlarge instance for real-time and batch inference.

The maximum audio file size for real-time inference is 15MB, and for batch transform job is 75MB for each file. The duration limits for one audio file for real-time inference are about:
- 7 mins on ml.g4dn.xlarge,
- 11 mins on ml.p3.2xlarge.

It's no duration limits batch inference.

## 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page [**KanjuTech Transcription and Diarization Model**](https://aws.amazon.com/marketplace/pp/prodview-ngtdx4ayt4emo).
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agree with EULA, pricing, and support terms. 
1. Once you click on the **Continue to configuration** button and choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify it in the following cell.

In [None]:
model_package_arn = "<Specify the Model package ARN that corresponds to your AWS region>"

In [None]:
import sagemaker as sage
from sagemaker import ModelPackage
from sagemaker import get_execution_role
import boto3
import s3fs
import base64
import os
import json

In [None]:
role = get_execution_role()

sagemaker_session = sage.Session()

bucket = 's3://<Name-of-your-existing-S3-bucket>' # Write the name of your S3 bucket where you store your input files and want to save the output
runtime = boto3.client("runtime.sagemaker")

real_time_content_type = "application/json"
batch_transform_content_type = "application/json"
accept = "application/json"

## 2. Create an endpoint and perform real-time inference

See [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html) if you want to understand how real-time inference with Amazon SageMaker works.

### A. Create an endpoint

In [None]:
model_name = "kanjutech-transcription-speaker-diarization" # Write the endpoint name

In [None]:
# Specify instance type
real_time_inference_instance_type = "ml.p3.2xlarge"

>  **Note**: We recommend using ml.p3.2xlarge instance for real-time inference.

In [None]:
# Create a deployable model from the model package.
model = ModelPackage(role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session)

# Deploy the model
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)

# Wait until it prints "!" after "----------"

Once the endpoint has been created, you can perform real-time inference.

If you get an error here, please see the [Troubleshooting](#6.-Troubleshooting).

**WARNING!** 

**Remember to** [**Delete your endpoint and resources**](#D.-Delete-endpoint-and-model) whenever you finish your work with real-time inference to stop incurring your charges!

For more information, please visit this [page](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-delete-resources.html).

### B. Create input payload

Input audio files from the S3 bucket and encode in base64. The maximum audio file size for real-time inference is 15MB for each file. 

The duration limits for one audio file for real-time inference are about:
- 7 mins on ml.g4dn.xlarge,
- 11 mins on ml.p3.2xlarge.

In [None]:
# Specify S3 folders
endpoint_input = bucket+'/'+'endpoint-audio' # Your folder on the S3 bucket where you store input audio
endpoint_output = bucket+'/'+'endpoint-transcript' # Folder for results of real-time transcription

In [None]:
fs = s3fs.S3FileSystem()
fs_ls = fs.ls(endpoint_input)
paths = list(filter(lambda k: '.' in k, fs_ls))

# For this example, we encode only one file from paths
input_file_path = paths[0]

with fs.open(input_file_path, "rb") as f:
    edata = base64.b64encode(f.read())

Create JSON input with encoded data and transcription parameters.

**language** (*str*) - Use "auto" to automatically detect the language or specify the language code:
- English: "en"
- Spanish: "es"
- French: "fr"
- Portuguese: "pt"
- Russian: "ru"
- Indonesian: "id"
- German: "de"
- Japanese: "ja"
- Turkish: "tr"
- Italian: "it"

**num_speakers** (*int* or *str*) - Use "auto" to automatically identify the number of speakers or specify the exact number.

In [None]:
# Specify audio parameters
language = "en"
num_speakers = 2

In [None]:
json_file = {}
json_file['file'] = edata.decode('utf-8')
json_file['language'] = language
json_file['num_speakers'] = num_speakers
json_file['f_name'] = os.path.basename(input_file_path)
data = json.dumps(json_file)

### C. Perform real-time inference

Invoke the endpoint for real-time inference

In [None]:
results = runtime.invoke_endpoint(
    EndpointName=model_name,
    Body=data, 
    ContentType=real_time_content_type,  
    Accept=accept,  
)

# Save the transcript to S3
with fs.open(endpoint_output+'/'+json_file['f_name'].split('.')[0]+'_'+language+'_'+str(num_speakers)+'.json', "wb") as f:
    f.write((results['Body']).read())

### D. Delete endpoint and model

Now that you have successfully performed a real-time inference, you no longer need the endpoint. You can terminate the endpoint to avoid being charged.

In [None]:
model.sagemaker_session.delete_endpoint(model_name)
model.sagemaker_session.delete_endpoint_config(model_name)
model.delete_model()

**WARNING!** 

**Remember to** [**Delete your endpoint and resources**](#D.-Delete-endpoint-and-model) whenever you finish your work with real-time inference to stop incurring your charges!

For more information, please visit this [page](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-delete-resources.html).

## 3. Perform batch inference

See [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) if you want to understand how batch inference with Amazon SageMaker works.

### A. Create encoded input data

Upload input files to S3 for batch transform job. The maximum audio file size for the batch transform job is 75MB for each file.

In [None]:
# Specify S3 folders
batch_input = bucket+'/'+'batch-audio' # Your folder on the S3 bucket where you store input audio
batch_encode = bucket+'/'+'batch-audio-encode' # Folder for encoded audio files
batch_output = bucket+'/'+'batch-transcript' # Folder for results of batch transcription

Create JSON files with encoded data and transcription parameters.

**language** (*str*) - Use "auto" to automatically detect the language or specify the language code:
- English: "en"
- Spanish: "es"
- French: "fr"
- Portuguese: "pt"
- Russian: "ru"
- Indonesian: "id"
- German: "de"
- Japanese: "ja"
- Turkish: "tr"
- Italian: "it"

**num_speakers** (*int* or *str*) - Use "auto" to automatically identify the number of speakers or specify the exact number.

In [None]:
# Specify batch parameters
language = "auto"
num_speakers = "auto"

In [None]:
fs = s3fs.S3FileSystem()
fs_ls = fs.ls(batch_input)
paths = list(filter(lambda k: '.' in k, fs_ls))
for input_file_path in paths:
    with fs.open('s3://' + input_file_path, "rb") as f:
        edata = base64.b64encode(f.read())
    json_file = {}
    json_file['file'] = edata.decode('utf-8')
    json_file['language'] = language
    json_file['num_speakers'] = num_speakers
    json_file['f_name'] = os.path.basename(input_file_path)
    data = json.dumps(json_file)
    with fs.open(batch_encode+'/'+json_file['f_name'].split('.')[0]+'_'+language+'_'+str(num_speakers)+'.json', "w") as f:
        f.write(data)

### B. Run batch transform job

In [None]:
# Specify instance type
batch_transform_inference_instance_type = "ml.p3.2xlarge"

>  **Note**: We recommend using ml.p3.2xlarge instance for batch inference.

In [None]:
# Create a deployable model from the model package.
model = ModelPackage(role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session)

In [None]:
transformer = model.transformer(1, 
                                batch_transform_inference_instance_type, 
                                output_path=batch_output, 
                                accept=accept, 
                                max_payload=100)
transformer.transform(batch_encode, content_type=batch_transform_content_type)
transformer.wait()

# Wait until it prints text after ".........." and finishes the batch inference after that

If you get an error here, please see the [Troubleshooting](#6.-Troubleshooting).

### C. Delete encoded input data

Now that you have successfully performed a batch inference, you no longer need the encoded input files. Clean up the folder to prevent processing finished data again.

In [None]:
fs_ls = fs.ls(batch_encode)
paths = list(filter(lambda k: '.' in k, fs_ls))
for file in paths:
    fs.rm(file)

### D. Delete the model

In [None]:
model.delete_model()

## 4. Visualize output

To explore examples of visualization and converting transcription, see this [file](https://github.com/KanjuTech/aws-marketplace/blob/main/results_converter.ipynb).

## 5. Release Candidate version (optional)

The Release Candidate version supports additional languages with lower and unstable quality. If you want to access these languages, please specify the model version in your request, as in the example.

Additional languages and codes:
- Chinese: "zh"
- Vietnamese: "vi"
- Tagalog: "tl"
- Korean: "ko"
- Thai: "th"
- Polish: "pl"
- Ukrainian: "uk"
- Dutch: "nl"
- Romanian: "ro"
- Hungarian: "hu"
- Greek: "el"
- Swedish: "sv"
- Czech: "cs"
- Bulgarian: "bg"
- Slovak: "sk"
- Croatian: "hr"
- Danish: "da"
- Finnish: "fi"
- Norwegian: "no"

In [None]:
# Specify parameters
language = "pl"
num_speakers = "auto"
model_version = "rc" # "stable" or "rc"

In [None]:
json_file = {}
json_file['file'] = edata.decode('utf-8')
json_file['language'] = language
json_file['num_speakers'] = num_speakers
json_file['f_name'] = os.path.basename(input_file_path)
json_file['version'] = model_version # Optional. If not specified, input will be processed using the Stable version.
data = json.dumps(json_file)

## 6. Troubleshooting

### Cannot create already existing endpoint configuration

This error occurs when the user interrupts the inference deployment and tries to rerun it. To restart the deployment, first delete the previously created configurations. You can find this command in the [Delete endpoint and model](#D.-Delete-endpoint-and-model) cell.

Please wait for the deployment to complete. This process may take several minutes.

### ResourceLimitExceeded

If you receive an error due to the lack of a quota for your instance type, you can increase it by sending a request:
1. Open the **Amazone SageMaker** [**Service Quotas**](https://console.aws.amazon.com/servicequotas/home/services/sagemaker/quotas) page.
2. Filter **Service quotas** by "ml.p3.2xlarge for endpoint usage" for real-time inference or by "ml.p3.2xlarge for transform job usage" for batch inference.
3. Select and click on the **Request increase at account-level** button.
4. Enter the total amount you want the quota to be and click the **Request** button.
5. Wait until AWS Support increases your quotas for this instance type.

> **Note**: To speed up the processing of your request, please indicate in your correspondence with AWS Support that this type of instance is required for this product.

For more information about requesting a quota increase, visit this [page](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html).

## 7. Questions

If you have any questions about our product, feel free to email us at aws@kanju.tech or schedule a [meeting](https://calendly.com/kanjutech).