# Deepcamp: Codelab 5

**In this tutorial we will cover**:

- Deployment of apps based on ML or DL models


**Author**:
- Alessio Devoto (alessio.devoto@uniroma1.it)


**Duration**: 50 mins 



# Deployment of Machine Learning models 

Let's say you designed and trained a very cool and effective deep learning model. 

The model is now ready to be tested on a different data and users. What will you do ? 

- You could get in touch with someone who takes care of designing the infrastructure for your system ➡ expensive and time consuming
- You can use a framework for ML apps deployment and start it on any server (of course, you must own a server)

---

There are a few libraries we can use for this purpose: [Gradio](https://gradio.app/quickstart/), [StreamLit](https://streamlit.io/) and even come with UI features. If you want a lower level control (at the cose of more complexity), you can use [FastApi](https://fastapi.tiangolo.com/) or [Flask](https://flask.palletsprojects.com/en/2.3.x/).

Today, we are going to use Gradio, a new library developed by Huggingface for showcasing their own models.

Install necessary libraries as usual

In [None]:
%%capture
!pip install -U openai-whisper
!pip install gradio

In [None]:
import requests           
from PIL import Image         # to deal with images
import torch

## 1. Gradio basics

Gradio lets you create intuitive User Interfaces in plain Python, by simply listing panels, buttons, text areas etc...

All [components](https://gradio.app/docs/#components) must be initialized inside a gradio.Block, which represents an area of the interface.

Let's start with a simple example taken from [here](https://gradio.app/docs/#components).



In [None]:
import gradio as gr

with gr.Blocks() as demo:   # main gradio block, always necessary

    # from here we can populate the UI as we wish
    gr.Markdown("DeepCamp: the best place in the **world**")
    
    with gr.Tab(label="Flip Text"):
        text_input = gr.Textbox()
        text_output = gr.Textbox()
        text_button = gr.Button("Flip")
   
    with gr.Tab(label="Flip Image"):
        with gr.Row():
            image_input = gr.Image()
            image_output = gr.Image()
        image_button = gr.Button("Flip")

demo.launch()

Of course, nothing happens if we click the buttons, as there is no function implemented yet.

Let's add simple functions which manipulate the input data and return an output.

In [None]:
import numpy as np
import gradio as gr

# a function that flips a string
def flip_text(x):
    return x[::-1]

# a function that flips an image
def flip_image(x):
    return np.fliplr(x)



with gr.Blocks() as demo:   # main gradio block, always necessary

    # from here we can populate the UI as we wish
    gr.Markdown("DeepCamp: the best place in the **world**")
    
    with gr.Tab(label="Flip Text"):
        text_input = gr.Textbox()
        text_output = gr.Textbox()
        text_button = gr.Button("Flip")
   
    with gr.Tab(label="Flip Image"):
        with gr.Row():
            image_input = gr.Image()
            image_output = gr.Image()
        image_button = gr.Button("Flip")

    # assign corresponding function to each button
    text_button.click(flip_text, inputs=text_input, outputs=text_output)

    # notice that the image is passed to the function as a numpy array
    image_button.click(flip_image, inputs=image_input, outputs=image_output)

demo.launch() # you can control the ports and the number of concurrent threads here


### Exercise 🏋: write a basic gradio demo

Write a gradio demo that given an image blurs it based on the input value provided by the user.

For the blur, use the provided function 



```
gaussian_filter(image, sigma)
```

where `image` is the image and `sigma` is the amount of blur.

Allow the user to select the amount of blur in the range (0,5) via a gradio Slider (see [here](https://gradio.app/docs/#slider-header))





In [None]:
from scipy.ndimage import gaussian_filter

# your code here

In [None]:
#@title Peek solution 👀

from scipy.ndimage import gaussian_filter


def blur(image, amount):
  return gaussian_filter(image, sigma=amount)

with gr.Blocks() as demo:   # main gradio block, always necessary

  # from here we can populate the UI as we wish
  gr.Markdown("DeepCamp: the best place in the **world**")

  blur_amount = gr.Slider(minimum=0, maximum=5)
  with gr.Row():
      image_input = gr.Image()
      image_output = gr.Image()
  blur_btn = gr.Button('Blur')

  # note how we pass the params
  blur_btn.click(blur, inputs=[image_input, blur_amount], outputs=image_output)


demo.launch()
    


## 2. Gradio Demo 2.0

Now that we have an idea of how Gradio works, let's place a real Neural Network behind the UI.

**Goal**: write a Gradio demo that exposes a *pretrained* ResNet34 for image classification. 

ResNet34 is pretrained on 1000 images of the ImageNet dataset, so it will be able to classify 1000 different classes. We can get the pretrained weights from Pytorch (as we saw in lab 3).


In [None]:
# download the list of imagenet classes and store it in a python dictionary
labels = eval(requests.get('https://raw.githubusercontent.com/alessiodevoto/deepers/main/data/imagenet1000_clsidx_to_labels.txt').text)


In [None]:
from torchvision.models import resnet34, ResNet34_Weights

# get the model with the pretrained weights
resnet34 = resnet34(weights=ResNet34_Weights.DEFAULT)
resnet34.eval()     # why ?

# transforms 
preprocess = ResNet34_Weights.DEFAULT.transforms()

A function that given an image returns a dictionary where keys are the labels and values are prediction scores.

In [None]:
def classify(img):
  # open the image
  image = Image.open(img)
  # apply transforms
  img_transformed = preprocess(image)
  # get unnormalized scores
  logits = resnet34(img_transformed.unsqueeze(0))
  predictions = torch.nn.functional.softmax(logits, dim=1)[0] 
  # create dictionary
  confidences = {labels[i]: float(predictions[i]) for i in range(1000)}
  return confidences


Finally, our demo

In [None]:

with gr.Blocks() as demo:   # main gradio block, always necessary
  gr.Markdown("DeepCamp: the best AI camp in the **world**")
  
  with gr.Row():
      input_image = gr.Image(type='filepath')
      pred = gr.Label(num_top_classes=3)
  with gr.Row():
    pred_btn = gr.Button('Classify')
  
  pred_btn.click(
      classify, 
      inputs=input_image, 
      outputs=pred)
  
demo.launch(debug=True)


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>

Keyboard interruption in main thread... closing server.




## 3. Final Exercise: Speech to Text 🔥

**Scenario:** we developed an automatic speech recognition (ASR) model which is able to trascribe human speech, and we want to make this service available to the world.


### 3.1 Meet Whisper
Meet OpenAI's Whisper!

![whisper](https://openaicom.imgix.net/d9c13138-366f-49d3-b8bd-cb3f5a973a5b/asr-summary-of-model-architecture-desktop.svg?fm=auto&auto=compress,format&fit=min&w=1919&h=1551)


[Whisper](https://arxiv.org/pdf/2212.04356.pdf) is an automatic speech recognition (ASR) system trained on *680.000 hours of multilingual and multitask* supervised data collected from the web 🤯. 

The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer. As you may know, Encoder and Decoder blocks are neural networks based on the attention mechanism. 

As you can see on the [official repository](https://github.com/openai/whisper) they trained the model in different "sizes":` tiny, small, base, medium, large, large-v2.`




Let's start by downloading an audio sample and listen to it.

In [None]:
from torchaudio.utils import download_asset
import IPython

speech_file = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav")
IPython.display.Audio(speech_file)

Can the ***tiny*** Whisper version transcribe this? 

In [None]:
import whisper

# the model is now download locally (that's why we picked the smallest version) 

model = whisper.load_model("tiny") # we load the tiny version

result = model.transcribe(
    speech_file,        # this can be a path to a file or a numpy array
    language='en',      # a lot of languages available
    temperature = 0,    # where have we met this before ? 🤔
    task='transcribe')  # or 'translate'

print(result["text"])



 Sum caminho risulta Isabelle


In [None]:
# a bunch of additional fields
result

{'text': ' I had that curiosity beside me at this moment.',
 'segments': [{'id': 0,
   'seek': 0,
   'start': 0.0,
   'end': 3.2,
   'text': ' I had that curiosity beside me at this moment.',
   'tokens': [50364,
    286,
    632,
    300,
    18769,
    15726,
    385,
    412,
    341,
    1623,
    13,
    50524],
   'temperature': 0.0,
   'avg_logprob': -0.2618454419649564,
   'compression_ratio': 0.8679245283018868,
   'no_speech_prob': 0.01015368103981018}],
 'language': 'en'}

### Whisper + Gradio

Ok, so we know how Whisper works now. Can we integrate this into a Gradio demo ?

The demo should 

- allow the user to upload an audio file or record from microphone (see `gradio.Audio`)
- allow the user to pick language and task (see `gradio.Radio`) and temperature (see `gr.Slider`)
- show the transcription 

Once you are done
- try to record some samples and decode them with different temperatures.
- Try to force the transcription into a wrong language.



In [None]:
# your code here

In [None]:
#@title Peek solution 👀

import gradio as gr

def whisper_transcribe(audio, temp, lang):

  if lang == 'Detect':
    lang = None

  result = model.transcribe(
    audio, 
    language=lang, 
    temperature=temp,    
    task='transcribe')

  return result["text"]


with gr.Blocks() as demo:   # main gradio block, always necessary

  # from here we can populate the UI as we wish
  gr.Markdown("DeepCamp: the best AI camp in the **world**")
  with gr.Row():
      temp = gr.Slider(minimum=0, maximum=1)
      lang = gr.Radio(choices=['en', 'it', 'pt', 'es', 'fr', 'Detect'], value='Detect')
  with gr.Tab("Upload"):
      audio_up = gr.Audio(source="upload", type='filepath')
      transcr_up = gr.Button('Trascribe')

  with gr.Tab("Record"):
      audio_mic = gr.Audio(source="microphone", type='filepath')
      transcr_mic = gr.Button('Trascribe')
  
  trascript = gr.Text()
  
  
  transcr_up.click(
      whisper_transcribe, 
      inputs=[audio_up, temp, lang], 
      outputs=trascript)
  
  transcr_mic.click(
      whisper_transcribe, 
      inputs=[audio_mic, temp, lang], 
      outputs=trascript)


demo.launch(debug=True)