# Intelligent Audio and Music Analysis Assignment 7

This assignment accounts for the last 50 points of the 3rd and last assignment block (100 points total)

Assignment is mainly **free form**, the goal is to apply what has been practiced so far. For implementing assignment 7, best practice is to follow the code structures from previous assignments and reuse as much code as possible (this makes it easier for us to review it). You can use any libraries, however, we recommend you use: madmom, librosa, pyTorch, etc. (the libraries we have used so far).


### GPU Support
Our JupyterHub, unfortunately, does not yet provide GPU support. Nevertheless, this assignemnt can be run as-is on JupyterHub, however training of the neural network will take a long time.

In order to speed up training if you are in a hurry, you can run this notebook on any local machine with GPU and cuda support, or alternatively use infrastructure like [Google colab](https://colab.research.google.com/) and drive, if you have a google account.

Simply upload your solved notebook and necessary other files, like output model file, back to JupyterHub for your submission.

In [1]:
import os
# This code block enables this notebook to run on google colab.
try:
    from google.colab import drive
    print('Running in colab...\n===================')
    COLAB = True
    !pip install madmom torch==1.4.0 torchvision==0.5.0 librosa --upgrade
    print('Installed dependencies!\n=======================')

    if not os.path.exists('data'):
        print('Downloading data...\n===================')
        !mkdir data
        !cd data
        !wget https://zenodo.org/record/45739/files/TUT-acoustic-scenes-2016-development.audio.1.zip?download=1
        !wget https://zenodo.org/record/45739/files/TUT-acoustic-scenes-2016-development.audio.2.zip?download=1
        !wget https://zenodo.org/record/45739/files/TUT-acoustic-scenes-2016-development.audio.3.zip?download=1
        !wget https://zenodo.org/record/45739/files/TUT-acoustic-scenes-2016-development.audio.4.zip?download=1
        !wget https://zenodo.org/record/45739/files/TUT-acoustic-scenes-2016-development.audio.5.zip?download=1
        !wget https://zenodo.org/record/45739/files/TUT-acoustic-scenes-2016-development.audio.6.zip?download=1
        !wget https://zenodo.org/record/45739/files/TUT-acoustic-scenes-2016-development.audio.7.zip?download=1
        !wget https://zenodo.org/record/45739/files/TUT-acoustic-scenes-2016-development.audio.8.zip?download=1
        !wget https://zenodo.org/record/45739/files/TUT-acoustic-scenes-2016-development.doc.zip?download=1
        !wget https://zenodo.org/record/45739/files/TUT-acoustic-scenes-2016-development.error.zip?download=1
        !wget https://zenodo.org/record/45739/files/TUT-acoustic-scenes-2016-development.meta.zip?download=1
            
        !wget https://zenodo.org/record/165995/files/TUT-acoustic-scenes-2016-evaluation.audio.1.zip?download=1
        !wget https://zenodo.org/record/165995/files/TUT-acoustic-scenes-2016-evaluation.audio.2.zip?download=1
        !wget https://zenodo.org/record/165995/files/TUT-acoustic-scenes-2016-evaluation.audio.3.zip?download=1
        !wget https://zenodo.org/record/165995/files/TUT-acoustic-scenes-2016-evaluation.doc.zip?download=1
        !wget https://zenodo.org/record/165995/files/TUT-acoustic-scenes-2016-evaluation.meta.zip?download=1
            
        !for file in *.*; do mv $file ${file%?download=1}; done
        
        !unzip "*.zip"
        !rm *.zip
        !cd ..

    print('===================\nMake sure you activated GPU support: Edit->Notebook settings->Hardware acceleration->GPU\n==================')
except:
    print('=======================\nNOT running in colab...\n=======================')
    COLAB = False

NOT running in colab...


## Audio Scene Classification

Your task is to implement a solution to an auditory scene detection challenge, precisely the DCASE 2016 Acoustic Scene Classification task. Details about the challenge are provided on the [task website](http://dcase.community/challenge2016/task-acoustic-scene-classification).
1. You are **free in choosing the strategy that you apply** and can also reuse and modify your implementation of previous assignments, e.g., by modifying the architecture to handle clips of 30 seconds length.
2. **Follow the given evaluation strategies of the task**, in particular wrt. development and evaluation datasets and cross validation settings.
3. Consider **reducing the amount of data** in a reasonable way, if necessary.
4. **Compare your results** to the numbers reported on the task website and comment on you main findings.

Remark: The goal is not to outperform the state of the art, but to experiment with a classification task in the general audio domain. Therefore, you can apply your existing solutions from the music domain and reflect upon the capabilities and limitations of your approach.

The overall goal of this assignment is to implement the method in an elegant way and present your implementation in this notebook:
1. **Illustrate your chosen architecture** e.g. by printing the individual layers and the shapes of the forward function if you choose a neural network approach (as we have done in previous assignments).
2. **Use plots** to showcase features and evaluation results.
3. Output your **final performance** and set it into context.

The rough distribution of points is as follows:
* 10 Points data preprocessing and data handling
* 10 Points machine learning architecture (e.g. neural network and data loader)
* 10 Points training method and evaluation
* 10 Points results and conclusion
* 10 Points overall presentation throughout the notebook


# Task 1: Data Processing (10 Points)

If you work on JupyterHub, find the audiofiles in the shared folder as indicated in the cell below.
Think about **reasonable features** to use and extract them for the audio files.
The DCASE dataset is already split into **a development and an evaluation** set. The idea is to only use the evaluation set **once** at the very end when you are confident about your trained system.
Only use the development set to draw your train/valid/test splits from.
The dataset comes with **predefined splits** for four-fold cross-validation. Feel free to use your own training setup, but read and **follow the guidelines** that come in the documentation of the dataset!

**Note**: Check the readme files in the dataset folder for more details!!

In [2]:
import os
import numpy as np

# get dataset path
dataset_path = os.path.join(os.environ['HOME'], 'shared', 'data', 'assignment_7')
if os.path.exists('data'):
    dataset_path = 'data'

development_path = os.path.join(dataset_path, 'TUT-acoustic-scenes-2016-development')
evaluation_path = os.path.join(dataset_path, 'TUT-acoustic-scenes-2016-evaluation')

development_audio_path = os.path.join(development_path, 'audio')
development_annotation_file = os.path.join(development_path, 'meta.txt')
development_error_file = os.path.join(development_path, 'error.txt')
split_definition_path = os.path.join(development_path, 'evaluation_setup')

evaluation_annotation_file = os.path.join(evaluation_path, 'meta.txt')
evaluation_audio_path = os.path.join(evaluation_path, 'audio')

# collect list of audio files:
development_audio_files = [af for af in os.listdir(development_audio_path) if af.endswith('.wav')]
evaluation_audio_files = [af for af in os.listdir(evaluation_audio_path) if af.endswith('.wav')]

dev_audio_total_count = len(development_audio_files)
eval_audio_total_count = len(evaluation_audio_files)

print(f'Total number of development audio files: {dev_audio_total_count}')
print(f'Total number of evaluation audio files: {eval_audio_total_count}')

Total number of development audio files: 1170
Total number of evaluation audio files: 390


### 1.1 Implementation

In [None]:
# Put your data handling code here. 
# You can add additional cells below this one for structuring the notebook.
# Feel free to add markdown cells / plots / tests / etc. if it helps your presentation.

### BEGIN SOLUTION
print('this is the reference implementation')
### END SOLUTION

### 1.2 Discussion

Write down what choices you made regarding data structuring and feature extraction, feel free to refer to code/plots/etc. in cells above.

YOUR ANSWER HERE

## Task 2: Machine Learning Approach (10 Points)

Implement your audio scene classification method here. You are free to use any approach you find appropriate. As a hint: the easiest way to succeed is to adapt the neural network approach from assignment 6 (or 5), since convolutional neural networks have been shown to work very well for this task, and you can start with a running code base.

### 2.1 Implementation

In [None]:
# Implement your machine learning architecture in the cells below. 
# You can add additional cells below this one for structuring the notebook.
# Feel free to add markdown cells / plots / tests / etc. if it helps your presentation.

### BEGIN SOLUTION
print('this is the reference implementation')
### END SOLUTION

### 2.2 Discussion
Write down your choices and findings. Feel free to refer to code/plots/etc. in cells above.

In [None]:
YOUR ANSWER HERE

## Task 3: Training, Inference, and Evaluation (10 Points)

Depending on your choices for the machine learning model, implement the appropriate code to train and test it.
For developing and training the model only use the development set. 

### 3.1 Implementation

In [None]:
# Put your trainin and evaluation code in the cells below.
# You can add additional cells below this one for structuring the notebook.
# Feel free to add markdown cells / plots / tests / etc. if it helps your presentation.

### BEGIN SOLUTION
print('this is the reference implementation')
### END SOLUTION

### 3.2 Discussion
Write down your choices and findings. Feel free to refer to code/plots/etc. in cells above.

YOUR ANSWER HERE

## Task 4: Results and Conclusion (10 Points)

Use the code cells below to calculate the final performance of the developed approach on the evaluation part of the dataset. 

In [None]:
# Put the evaluation code on the evaulation dataset in these code cells.
# You can add additional cells below this one for structuring the notebook.
# Feel free to add markdown cells / plots / tests / etc. if it helps your presentation.

### BEGIN SOLUTION
print('this is the reference implementation')
### END SOLUTION

### Task 4.2 Discussion

Compare your performance to the ones shown on the DCASE website, and discuss possible reasons for performance differences.
Discuss your approach in the context of the other methods presented on the DCASE website.

YOUR ANSWER HERE

## Task 5: Overall Presentation (10 Points)

Make sure your notebook **clearly presents your chosen approach** to the problem solution. If necessary, revisit the individual tasks and **add plots, outputs, code comments**, etc. to clearly explain what is going on.

You do not need to overdo it (no endless prints or plots that bloat the notebook) - less is sometimes more - as a goal think about your peers in the lecture and make it so that they could easily understand what is going on in the notebook. Exemplary plots with overall metrics are usually a nice compromise.