## Example script illustrating asynchronous parallel execution of sound classification on audio stream and object detection on video stream
This notebook is an example how to use DeGirum PySDK to perform parallel inferences on two asynchronous data streams with different frame rates. To achieve maximum performance this example uses non-blocking batch prediction mode.

**pyaudio package with portaudio is required to run this sample.**

**Access to microphone is required to run this sample.**

**Access to camera is required to run this sample.**

This script works with the following inference options:

1. [DeGirum Cloud Platform](https://cs.degirum.com),
1. DeGirum-hosted AI server node shared via Peer-to-Peer VPN,
1. AI server node hosted by you in your local network,
1. AI server running on your local machine,
1. DeGirum ORCA accelerator directly installed on your local machine.

To try different options, you just need to change the `inference_option` in the code below.

The script needs either a web camera or local camera connected to the machine running this code. The camera index or URL needs to be specified either in the code below by assigning `camera_id` or in .env file by defining `CAMERA_ID` variable and assigning `camera_id = None`.

### Specify where do you want to run your inferences

In [1]:
inference_option = 1  # <<< change it according to your needs selecting from the list in the header comment
camera_id = 0         # camera index or URL; 0 to use default local camera, None to take from .env file

### The rest of the cells below should run without any modifications

In [2]:
import degirum as dg # import DeGirum PySDK
import mytools
import sys

In [3]:
# connect to model zoo according to selected inference option
zoo = mytools.connect_model_zoo(inference_option)

Inference option = 'DeGirum Cloud Platform'


In [4]:
# load YAMNET sound classification model for DeGirum Orca AI accelerator
# (change model name to "...n2x_cpu_1" to run it on CPU)
sound_model = zoo.load_model("mobilenet_v1_yamnet_sound_cls--96x64_quant_n2x_orca_1")

In [5]:
# load MobileNetv2+SSD object detection model for DeGirum Orca AI accelerator
# (change model name to "...n2x_cpu_1" to run it on CPU)
detection_model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")

# set model parameters
detection_model.image_backend = 'opencv' # select OpenCV backend: needed to have overlay image in OpenCV format
detection_model.input_numpy_colorspace = 'BGR'

In [6]:
# set non-blocking mode for both models
sound_model.non_blocking_batch_predict = True
detection_model.non_blocking_batch_predict = True

In [8]:
audio_sampling_rate_hz = sound_model.model_info.InputSamplingRate[0]
audio_buffer_size = sound_model.model_info.InputWaveformSize[0] // 2 # two read buffers in waveform for half-length overlapping

with mytools.Display("Async Streams") as display, \
    mytools.open_audio_stream(audio_sampling_rate_hz, audio_buffer_size) as audio_stream, \
    mytools.open_video_stream(camera_id) as video_stream:
    
    # create prediction result generators:
    sound_predictor = sound_model.predict_batch(mytools.audio_overlapped_source(audio_stream, lambda: False, True))
    detection_predictor = detection_model.predict_batch(mytools.video_source(video_stream))
    
    sound_label = ""
    while True: # press 'x' or 'q' to abort
        
        # do asynchronous ML inferences for both models (each one can be None if not ready):
        sound_result = next(sound_predictor)
        detection_result = next(detection_predictor)

        # process sound classification result (just remember the text)
        if sound_result is not None:
            sound_label = f"{sound_result.results[0]['label']}: {sound_result.results[0]['score']}"
        
        # process video detection result (just display the annotated frame)
        if detection_result is not None:
            img = detection_result.image_overlay
            mytools.Display.put_text(img,
                                     sound_label,
                                     (1, img.shape[0] // 4),
                                     (0,0,0),
                                     (255,255,255),                                         
                                    )
            display.show(img)            

Successfully opened video stream '0'
