This tutorial can be downloaded as part of the [Wallaroo Tutorials repository](https://github.com/WallarooLabs/Wallaroo_Tutorials/blob/main/wallaroo-model-cookbooks/hf-whisper).

## ER Whisper Demo

The following tutorial demonstrates deploying the [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on a `Wallaroo` pipeline and performing  inferences on it using the [BYOP](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/wallaroo-sdk-model-arbitrary-python/) feature.


## Requirements

The following Python libraries are required to run this tutorial:

* [librosa](https://pypi.org/project/librosa/)
* [datasets](https://pypi.org/project/datasets/)

These can be installed with the following command:

```python
pip install librosa datasets --user
```

## Tutorial Steps

### Import Libraries

The first step is to import the libraries we'll be using.  These are included by default in the Wallaroo instance's JupyterHub service or are installed with the Wallaroo SDK.

* References
  * [Wallaroo SDK Guides](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/)

In [None]:
import json
import os

import wallaroo
from wallaroo.pipeline   import Pipeline
from wallaroo.deployment_config import DeploymentConfigBuilder
from wallaroo.framework import Framework

import pyarrow as pa
import numpy as np
import pandas as pd

# the librosa library: https://pypi.org/project/librosa/

import librosa
from datasets import load_dataset

# ignoring warnings for demonstration
import warnings
warnings.filterwarnings('ignore')

In [2]:
wallaroo.__version__

'2023.4.0+5d935fefc'

## Open a Connection to Wallaroo

The next step is connect to Wallaroo through the Wallaroo client.  The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the `wallaroo.Client()` command, which provides a URL to grant the SDK permission to your specific Wallaroo environment.  When displayed, enter the URL into a browser and confirm permissions.  Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use `wl = wallaroo.Client()`.  If logging in externally, update the `wallarooPrefix` and `wallarooSuffix` variables with the proper DNS information.  For more information on Wallaroo DNS settings, see the [Wallaroo DNS Integration Guide](https://docs.wallaroo.ai/wallaroo-operations-guide/wallaroo-configuration/wallaroo-dns-guide/).

For this tutorial, the `request_timeout` option is increased to allow the model conversion and pipeline deployment to proceed without any warning messages.

* References
  * [Wallaroo SDK Essentials Guide: Client Connection](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-client/)

In [3]:
wl = wallaroo.Client(request_timeout=60000)

### Set Variables and Helper Functions

We'll set the name of our workspace, pipeline, models and files.  Workspace names must be unique across the Wallaroo workspace.  For this, we'll add in a randomly generated 4 characters to the workspace name to prevent collisions with other users' workspaces.  If running this tutorial, we recommend hard coding the workspace name so it will function in the same workspace each time it's run.

We'll set up some helper functions that will either use existing workspaces and pipelines, or create them if they do not already exist.

In [4]:
def get_workspace(name, client):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

The names for our workspace, pipeline, model, and model files are set here to make updating this tutorial easier.  Workspace names must be unique across the Wallaroo instance.  To verify unique names, the randomization code below is provided to allow the workspace name to be unique.  If this is not required, set `suffix` to `''`.

In [5]:
import string
import random

# make a random 4 character suffix to prevent overwriting other user's workspaces
suffix= ''.join(random.choice(string.ascii_lowercase) for i in range(4))

suffix=''

workspace_name = f'whisper-tiny-demo{suffix}'
pipeline_name = 'whisper-hf-byop'
model_name = 'whisper-byop'
model_file_name = './models/model-auto-conversion_hugging-face_complex-pipelines_asr-whisper-tiny.zip'


### Create Workspace and Pipeline

We will now create the Wallaroo workspace to store our model and set it as the current workspace.  Future commands will default to this workspace for pipeline creation, model uploads, etc.  We'll create our Wallaroo pipeline that is used to deploy our arbitrary Python model.

* References
  * [Wallaroo SDK Essentials Guide: Workspace Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-workspace/)

In [6]:
workspace = get_workspace(workspace_name, wl)
wl.set_current_workspace(workspace)

pipeline = wl.build_pipeline(pipeline_name)

display(wl.get_current_workspace())

{'name': 'whisper-tiny-demo', 'id': 5, 'archived': False, 'created_by': '3a089938-572e-4a76-a78f-de64fd2317a7', 'created_at': '2023-12-18T21:20:03.326534+00:00', 'models': [{'name': 'whisper-byop', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2023, 12, 18, 21, 20, 15, 49238, tzinfo=tzutc()), 'created_at': datetime.datetime(2023, 12, 18, 21, 20, 15, 49238, tzinfo=tzutc())}], 'pipelines': [{'name': 'whisper-hf-byop', 'create_time': datetime.datetime(2023, 12, 18, 21, 20, 3, 353219, tzinfo=tzutc()), 'definition': '[]'}]}

## Configure & Upload Model

For this example, we will use the `openai/whisper-tiny` model for the `automatic-speech-recognition` pipeline task from the official `🤗 Hugging Face` [hub](https://huggingface.co/openai/whisper-tiny/tree/main).

To manually create an `automatic-speech-recognition` pipeline from the `🤗 Hugging Face` hub link above:

1. Download the original model from the the official `🤗 Hugging Face` [hub](https://huggingface.co/openai/whisper-tiny/tree/main).

```python
from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model="openai/whisper-tiny")
pipe.save_pretrained("asr-whisper-tiny/")
```

As a last step, you can `zip` the folder containing all needed files as follows:

```bash
zip -r asr-whisper-tiny.zip asr-whisper-tiny/
```

### Configure PyArrow Schema

You can find more info on the available inputs for the `automatic-speech-recognition` pipeline under the [official source code](https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/automatic_speech_recognition.py#L294) from `🤗 Hugging Face`.

The input and output schemas are defined in Apache pyarrow Schema format.

The model is then uploaded with the `wallaroo.client.model_upload` method, where we define:

* The name to assign the model.
* The model file path.
* The input and output schemas.

The model is uploaded to the Wallaroo instance, where it is containerized to run with the Wallaroo Inference Engine.

* References
  * [Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Arbitrary Python](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/wallaroo-sdk-model-arbitrary-python/)
  * [Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Hugging Face](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/wallaroo-sdk-model-upload-hugging-face/)

In [7]:
input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float32())), # required: the audio stored in numpy arrays of shape (num_samples,) and data type `float32`
    pa.field('return_timestamps', pa.string()) # optional: return start & end times for each predicted chunk
]) 

output_schema = pa.schema([
    pa.field('text', pa.string()), # required: the output text corresponding to the audio input
    pa.field('chunks', pa.list_(pa.struct([('text', pa.string()), ('timestamp', pa.list_(pa.float32()))]))), # required (if `return_timestamps` is set), start & end times for each predicted chunk
])

In [9]:
model = wl.upload_model(model_name, 
                        model_file_name, 
                        framework=Framework.HUGGING_FACE_AUTOMATIC_SPEECH_RECOGNITION, 
                        input_schema=input_schema, 
                        output_schema=output_schema)
model

Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime............................................successful

Ready


0,1
Name,whisper-byop
Version,42b1709e-1c0b-41e1-9125-ae0b73186d96
File Name,model-auto-conversion_hugging-face_complex-pipelines_asr-whisper-tiny.zip
SHA,ddd57c9c8d3ed5417783ebb7101421aa1e79429365d20326155c9c02ae1e8a13
Status,ready
Image Path,proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mlflow-deploy:v2023.4.0-4297
Architecture,
Updated At,2023-18-Dec 21:42:37


### Deploy Pipeline

The model is deployed with the `wallaroo.pipeline.deploy(deployment_config)` command.  For the deployment configuration, we set the containerized aka `sidekick` memory to 8 GB to accommodate the size of the model, and CPUs to at least 4.  To optimize performance, a GPU could be assigned to the containerized model.

* References
  * [Wallaroo SDK Essentials Guide: Pipeline Deployment Configuration](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline-deployment-config/)
  * [Wallaroo SDK Essentials Guide: Pipeline Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline/)
  * [GPU Support](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline-deployment-config/#gpu-support)

In [11]:
deployment_config = DeploymentConfigBuilder() \
    .cpus(0.25).memory('1Gi') \
    .sidekick_memory(model, '8Gi') \
    .sidekick_cpus(model, 4.0) \
    .build()

In [12]:
pipeline = wl.build_pipeline(pipeline_name)
pipeline.add_model_step(model)

pipeline.deploy(deployment_config=deployment_config)

Waiting for deployment - this will take up to 60000s ....................... ok


0,1
name,whisper-hf-byop
created,2023-12-18 21:20:03.353219+00:00
last_updated,2023-12-18 21:42:37.326830+00:00
deployed,True
arch,
tags,
versions,"9075f7f9-0df7-4ea5-84e9-7ac3ff9ea084, e1bb243c-d30f-4b55-9985-58c2c22cc0e3, 95617b4f-2895-46b1-a429-be52153a92c1, ef56a3a3-8612-4d5b-baf6-6f960e250c98, f2cd64fc-93cc-4e96-bf20-7f37ccfd3e5a, 9a57b52f-0b99-42d9-bd67-8d2db2a21a49, 9cd7a58a-61b2-461e-a008-545833a3091f, edddded5-9856-4aaf-b8fd-bb965b08577f"
steps,whisper-byop
published,False


After a couple of minutes we verify the pipeline deployment was successful.

In [13]:
pipeline.status()

{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.3.222',
   'name': 'engine-9d89d5fcd-x7vw4',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'whisper-hf-byop',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'whisper-byop',
      'version': '42b1709e-1c0b-41e1-9125-ae0b73186d96',
      'sha': 'ddd57c9c8d3ed5417783ebb7101421aa1e79429365d20326155c9c02ae1e8a13',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.244.4.222',
   'name': 'engine-lb-584f54c899-zlhlg',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': [{'ip': '10.244.3.223',
   'name': 'engine-sidekick-whisper-byop-2-6bd955bbf4-sf58q',
   'status': 'Running',
   'reason': None,
   'details': [],
   'statuses': '\n'}]}

### Load example dataset

We will use the following libraries to load our sample data.

* [datasets](https://huggingface.co/docs/datasets/index) from `🤗 Hugging Face`, a library for accessing and sharing datasets for Audio, CV, and NLP tasks - in particular we're interested in the `Narsil/asr_dummy` [dataset](https://huggingface.co/datasets/Narsil/asr_dummy/tree/main) containing sample speech audio for `ASR` tasks;
* [librosa](https://github.com/librosa/librosa), a python package for music and audio analysis that will help us parse some audio files for pipeline inference.

Now we can load the dataset of our choice and use `librosa` to store the audio files to numpy arrays. Load the first two audio files from the dataset.

In [15]:
dataset = load_dataset("Narsil/asr_dummy")

audio_1, sr_1 = librosa.load(dataset["test"][0]["file"])
audio_2, sr_2 = librosa.load(dataset["test"][1]["file"])

audio_files = [(audio_1, sr_1), (audio_2, sr_2)]

The following allows us to listen to the audio files before performing an inference on their converted numpy array values.

In [16]:
import IPython
from IPython.display import Audio

def display_audio(audio: np.array, sr: int) -> None:
    IPython.display.display(Audio(data=audio, rate=sr))

In [17]:
for audio, sr in audio_files:
    display_audio(audio, sr)

We can now create the pandas DataFrame to be passed for inference according to the input schema we have defined in the previous step.

In [18]:
input_data = {
        "inputs": [audio_1, audio_2],
        "return_timestamps": ["word", "word"],
}
dataframe = pd.DataFrame(input_data)
dataframe

Unnamed: 0,inputs,return_timestamps
0,"[0.00032296643, 0.0003370901, 0.00028548433, 0...",word
1,"[0.0010076487, 0.0012469155, 0.00080459623, 0....",word


### Run inference on the example dataset

We perform a sample inference with the provided DataFrame, and display the results.

In [19]:
%%time
result = pipeline.infer(dataframe, timeout=10000)

CPU times: user 108 ms, sys: 24 ms, total: 132 ms
Wall time: 2.15 s


In [20]:
display(result)

Unnamed: 0,time,in.inputs,in.return_timestamps,out.chunks,out.text,check_failures
0,2023-12-18 21:43:04.663,"[0.0003229664, 0.0003370901, 0.0002854843, 0.0...",word,"[{'text': ' He', 'timestamp': [0.0, 1.08]}, {'...","He hoped there would be Stu for dinner, turni...",0
1,2023-12-18 21:43:04.663,"[0.0010076487, 0.0012469155, 0.0008045962, 0.0...",word,"[{'text': ' Stuff', 'timestamp': [29.78, 29.78...",Stuff it into you. His belly calcled him.,0


### Evaluate results

Clet's compare the results side by side with the audio inputs.

In [21]:
for (audio, sr), transcription in zip(audio_files, result['out.text'].values):
    print("Input audio:\n")
    display_audio(audio, sr)
    
    print(f"Transcription: {transcription}\n")

Input audio:



Transcription:  He hoped there would be Stu for dinner, turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick, peppered, flour-fat and sauce.

Input audio:



Transcription:  Stuff it into you. His belly calcled him.



### Undeploy Pipelines

With the demonstration complete, we undeploy the pipelines to return the resources back to the Wallaroo instance.

In [22]:
pipeline.undeploy()

Waiting for undeployment - this will take up to 60000s .................................... ok


0,1
name,whisper-hf-byop
created,2023-12-18 21:20:03.353219+00:00
last_updated,2023-12-18 21:42:37.326830+00:00
deployed,False
arch,
tags,
versions,"9075f7f9-0df7-4ea5-84e9-7ac3ff9ea084, e1bb243c-d30f-4b55-9985-58c2c22cc0e3, 95617b4f-2895-46b1-a429-be52153a92c1, ef56a3a3-8612-4d5b-baf6-6f960e250c98, f2cd64fc-93cc-4e96-bf20-7f37ccfd3e5a, 9a57b52f-0b99-42d9-bd67-8d2db2a21a49, 9cd7a58a-61b2-461e-a008-545833a3091f, edddded5-9856-4aaf-b8fd-bb965b08577f"
steps,whisper-byop
published,False
