# NOTE
This notebook, and all others involving Docker, cannot be run on Colab, and should be run either on your local machine or on your Vertex AI Workbench instance by cloning the TIL repository from Github

# API Model Integration

In this notebook we will take a look at integrating a whisper model for ASR into a real-world application using FastAPI.
FastAPI is a great way to deploy and utilise it in a production environment. The setup involves creating an API that receives a question and context as input and returns the predicted answer. We will create a Docker container for the FastAPI app.


#NOTE: Code does not work on Colab

### Recap: Saving a model

In [None]:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

model_path = "whisper_model.pth"

model.save_pretrained(model_path)
processor.save_pretrained(model_path)

In [None]:
## Runs into some asyncio issues

!pip install uvicorn librosa fastapi

Collecting python-multipart
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting uvicorn
  Downloading uvicorn-0.34.2-py3-none-any.whl.metadata (6.5 kB)
Collecting fastapi
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting jupyter
  Downloading jupyter-1.1.1-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting starlette<0.47.0,>=0.40.0 (from fastapi[standard])
  Downloading starlette-0.46.2-py3-none-any.whl.metadata (6.2 kB)
Collecting fastapi-cli>=0.0.5 (from fastapi-cli[standard]>=0.0.5; extra == "standard"->fastapi[standard])
  Downloading fastapi_cli-0.0.7-py3-none-any.whl.metadata (6.2 kB)
Collecting email-validator>=2.0.0 (from fastapi[standard])
  Downloading email_validator-2.2.0-py3-none-any.whl.metadata (25 kB)
Collecting jupyterlab (from jupyter)
  Downloading jupyterlab-4.4.0-py3-none-any.whl.metadata (16 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-ma

In [None]:
from fastapi import FastAPI, Request
from pydantic import BaseModel
from typing import Optional
from transformers import WhisperProcessor, WhisperForConditionalGeneration, pipeline
import torch
import os
import numpy as np
import librosa
import base64
import io

app = FastAPI()

# Fetch the model directory from the environment variable
model_directory = "src/models"
whisper_directory = os.path.join(model_directory, "whisper_model")
processor_directory = os.path.join(model_directory, "whisper_processor")

# Check if we have a fine-tuned model
if os.path.exists(os.path.join(model_directory, "whisper_model", "model.safetensors")):
    # Load fine-tuned model
    processor = WhisperProcessor.from_pretrained(processor_directory)
    model = WhisperForConditionalGeneration.from_pretrained(whisper_directory)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

asr = pipeline("automatic-speech-recognition", model = model, tokenizer = processor.tokenizer, feature_extractor = processor.feature_extractor, device = device)

@app.post("/stt")
async def stt(request: Request):
    """
    Performs ASR given the file path of an audio file
    Returns transcription of the audio
    """
    input_json = await request.json()

    predictions = []
    for instance in input_json["instances"]:
        audio_bytes = base64.b64decode(instance["b64"])
        audio_np, _ = librosa.load(io.BytesIO(audio_bytes), sr=16000)
        fmt_input = {'raw': audio_np, "sampling_rate": 16000}
        result = asr(fmt_input)

        transcription = result['text']
        predictions.append(transcription)
    return {"predictions": predictions}


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)


INFO:     Started server process [1525]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1525]


### Create a Dockerfile
Create a `Dockerfile` in the same directory as your FastAPI app (`app.py`). This file will define the Docker image that includes your app and all its dependencies.

```Docker
# example deep learning VM
# for a full list see us-docker.pkg.dev/deeplearning-platform-release/gcr.io/
# and for details see https://cloud.google.com/deep-learning-vm/docs/images#supported-frameworks
FROM us-docker.pkg.dev/deeplearning-platform-release/gcr.io/pytorch-gpu.2-2.py310

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE 1

# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED 1

# pip gives a warning if you install packages as root
# set this flag to just ignore the warning
ENV PIP_ROOT_USER_ACTION=ignore

RUN pip install -U pip
WORKDIR /workspace

# install other requirements
COPY requirements.txt .
RUN pip install -r requirements.txt

# copy the source code maintaining the directory structure
COPY src /workspace/src

# Add src to Python path
ENV PYTHONPATH=/workspace

# Expose the port
EXPOSE 8000

# start model service
CMD uvicorn src.app:app --port 8000 --host 0.0.0.0




### Ensure you have a similar directory tree as the following image. Take note of the directories

For further reference, app.py should be in the src directory, and models should be in the src directory for easy access by the app.py

test.py is the code given in this notebook below, where you can send requests to test the server.

The Dockerfile (not shown here) should be in the same level as test.py (parent directory OF src)

<img src ="https://i.imgur.com/E8PyWTZ.png"/>

### Remove outdated generation config in Whisper (if you use Whisper)

`forced_decoder_ids` is no longer supported, so you need to remove it to get Whisper running.

<img src="https://i.imgur.com/ERGi2RC.png" />

### Build the Docker Image
From your project directory (where your `Dockerfile` and `app.py` are located), run the following command to build the Docker image
```bash
docker build -t stt_app:1.0.0 .
```

<img src="https://i.imgur.com/SGtqAfH.png" alt="Building Docker" />

### Run the Docker Container
```bash
docker run -p 8000:8000 stt_app:1.0.0
```

Docker runs the container and map port 8000 of the container to port 8000 on your host, allowing us to access the FastAPI application using the browser, `requests` library or Postman.

In [None]:
import requests
import base64

# The endpoint URL
url = "http://localhost:8000/stt"

# Path to an audio file
audio_file_path = "audio.mp3"  # Replace with your actual file path

# Read the file and encode it to base64
with open(audio_file_path, "rb") as audio_file:
    audio_content = audio_file.read()
    base64_encoded = base64.b64encode(audio_content).decode("utf-8")

# Create payload with base64 encoded audio
data = {"instances": [{"b64": base64_encoded}]}

# Sending the POST request with JSON data
response = requests.post(url, json=data)

# Print the response from the server
print("Status Code:", response.status_code)
print("Response:", response.json())


FileNotFoundError: [Errno 2] No such file or directory: 'path/to/your/audio_file.mp3'