# Sound Demo 1
This demo illustrates how to train a binary and a multilabel sound classifier using your microphone

#### run global setup

In [1]:
try:
    with open("../global_setup.py") as setupfile:
        exec(setupfile.read())
except FileNotFoundError:
    print('Setup already completed')

#### run local setup

In [2]:
from notebooks.experiments.src.sound_demos.live_predictions import LivePredictions, run_livepred
from notebooks.experiments.src.sound_demos.multilabel_classifier import Recorder, SoundClassifier
from src.audio.mini_recorder import miniRecorder

# Binary classifier
We are going to use a pretrained model to construct a binary sound classifier. The model is a deep convolutional nerual network in the same style as the [VGG16 network](https://arxiv.org/pdf/1409.1556.pdf), with slightly different settings and fewer layers. The VGG16 network was originally used for images so how can we use it for sound? Instead of working directly with the waveform signal we can work with the spectrogram, a 2d image, instead.  
The pretraining was done on the [UrbanSound8K dataset](https://urbansounddataset.weebly.com/urbansound8k.html).  This contains more than 8000 examples up to 4 seconds long of sound from 10 different classes (children playing, dogs barking among others). Our hope is that the network has learnt some general audio-features from the spetrogram, that we can use to distinguish between two classes of sound of your choice.  

### Create dataset
First we create or own dataset as basis for training. The below cell will start a recording process where you first record $n$ examples of the first class, followed by $n$ examples of the second class. Whenever a recording is finished the next one starts immediately after.  

Things to keep in mind
- You can decide to just make the sound of class 0 throughout the recording time for class 0, or you can try to match exactly one example of this sound for each recording. Whatever you choose, make sure to do the same for the second class. What do you think will happen if there is a lot of silence in class 0 recordings, but not in class 1 recordings?
- How many examples do we need in each class in order to get a good classifier?
- How will background noise affect performance?
- What happens (should happen) at test-time if sounds from both classes are present in a recording?

The recorded files will be located in the folder you specify as `wav_dir`. E.g if you wish to record 10 files for each class and locate them in `/Users/me/sound` on you computer, the first line of code should look like this
```python
recorder = Recorder(n_classes=2, n_files = 10, prefix='binary', wav_dir='/Users/me/sound')
```
By default, the files will be saved in the tmp directory in the root folder of the repository.

In [3]:
recorder = Recorder(n_classes=2, n_files = 12, prefix='binary', wav_dir='tmp')
recorder.record(seconds=2)
data = recorder.create_dataset()

Recording class 0, file no 1/12...
Finished recording, writing file...
binary0-001.wav


FileNotFoundError: [Errno 2] No such file or directory: '/tmp\\binary0-001.wav'

Or if you already recorded a dataset you can reload it using this piece of code

In [None]:
recorder = Recorder(n_classes=2, n_files = 12, prefix='binary', wav_dir='tmp')
data = recorder.create_dataset()

### Train the binary classifier
Run the cell below to train the binary classifier using the pre-trained weights.
<!--You need to download the pre-trained weights for the neural network from [here](https://drive.google.com/file/d/1BXe5KZcZVqFzBMJo6FZ78hQ8j2CCRr3X/view?usp=sharing).  
Place the file containing the weights somewehere on your computer and then specify the full path below. E.g. if the location of the file is  ```/Users/me/sound/sound_classification_weights.hdf5 ``` the first line of code should look like this
```python
binary_classifier = SoundClassifier(weights_path='/Users/me/sound/sound_classification_weights.hdf5')
```-->

In [None]:
binary_classifier = SoundClassifier()
binary_classifier.train(data=data)
binary_classifier.plot_training()

### Test the trained model
Now we have a model that is trained to discriminate between two sounds. Try to make a recording of sound from one of the classes (or something completely different) and see what it is classified as by the model

In [None]:
rec = miniRecorder(seconds=1.5)
_ = rec.record()

In [None]:
binary_classifier.predict(sound_clip=rec.sound)

In [None]:
rec.playback()

### Live Predictions
Let us use the live spectrogram to visualize the sound input to the microphone continuously and get running predictions from the model.
What happens?
- Does your binary classifier work?
- Does the model predict one of the classes even when there is silence / background noise? Why? Do you have any ideas how to mitigate this?
- What happens if you produce sound from both classes at the same time? What should ideally happen? 

In [None]:
run_livepred(predictor=binary_classifier)