## Example script illustrating sound classification on audio stream
This notebook is an example how to use DeGirum PySDK to do sound classification AI inference of an audio stream from local microphone.

This script works with the following inference options:

1. [DeGirum Cloud Platform](https://cs.degirum.com),
1. DeGirum-hosted AI server node shared via Peer-to-Peer VPN,
1. AI server node hosted by you in your local network,
1. AI server running on your local machine,
1. DeGirum ORCA accelerator directly installed on your local machine.

To try different options, you just need to change the `inference_option` in the code below. 

### Specify where do you want to run your inferences

In [2]:
inference_option = 1  # <<< change it according to your needs selecting from the list in the header comment

### The rest of the cells below should run without any modifications

In [3]:
import degirum as dg # import DeGirum PySDK
import mytools
import sys
import numpy as np
from IPython.display import clear_output

if sys.platform == "darwin":
    raise "Sorry, MacOS is currently not supported due to problems with pyaudio package"

In [4]:
# connect to model zoo according to selected inference option
zoo = mytools.connect_model_zoo(inference_option)

Inference option = 'DeGirum Cloud Platform'


In [5]:
# load YAMNET sound classification model for DeGirum Orca AI accelerator
# (change model name to "...n2x_cpu_1" to run it on CPU)
model = zoo.load_model("mobilenet_v1_yamnet_sound_cls--96x64_quant_n2x_orca_1")

In [6]:
# Define model-specific audio streaming function
# TL;DR: 
# We define context manager function, which opens PyAudio stream on enter, reads it and yields audio waveforms
# of proper type, proper size, and with proper overlap. It properly closes PyAudio stream on exit.
# You pass model parameters and check-for-abort function as arguments.

import pyaudio
from contextlib import contextmanager

@contextmanager 
def AudioStream(model_info, check_abort):
    chunk_length = model_info.InputWaveformSize[0] // 2
    audio = pyaudio.PyAudio()
    stream = audio.open(format = pyaudio.paInt16, channels = 1,
            rate = int(model_info.InputSamplingRate[0]), input = True, frames_per_buffer = chunk_length)
    
    data = np.zeros(2 * chunk_length, dtype = np.int16)
    try:
        def out_stream():
            while not check_abort():
                data[:chunk_length] = data[chunk_length:]
                data[chunk_length:] = np.frombuffer(stream.read(chunk_length), dtype = np.int16)
                yield data
        yield out_stream
    finally:
        stream.stop_stream() # stop audio streaming
        stream.close() # close audio stream
        audio.terminate() # terminate audio library


In [None]:
abort = False # stream abort flag
N = 5 # inference results history depth
history = [] # list of N consecutive inference results

# Acquire model input stream object
with AudioStream(model.model_info, lambda: abort) as stream:
    #
    # AI prediction loop.
    # emit keyboard typing sound to stop
    #
    for res in model.predict_batch(stream()):
        # clear Jupyter output cell
        clear_output(wait = True) 
        
        # add top inference result to history
        history.insert(0, f"{res.results[0]['label']}: {res.results[0]['score']}" )
    
        # keep only N last elements in history
        if len(history) > N:
            history.pop()
    
        # print history
        for m in history:
            print(m)
        
        # check for stop condition
        if res.results[0]['label'] == "Typing":
            abort = True
    