# <center>**Principles of Machine Learning**</center>
## <center>**Coursework 3 (advanced implementation)**</center>

**Table of Contents**

Declaration

1. Section 1: Author

2. Section 2: Problem Formulation
  * Section 2.1: Outline
  * Scetion 2.2: Sampling in Relation to Audio Files
  * Section 2.3: Melody vs. Harmony: Similarities and Differences
  * Section 2.4: Proposed Methodology

3. Section 3: Machine Learning Pipeline
  * Section 3.1: Loading the Data
  * Section 3.2: Data Preparation

4. Section 4: Transformation Stage 
  * Section 4.1: Feature Extraction

5. Section 5: Modelling

6. Section 6: Methodology
  * Section 6.1: Model Fitting, Training and Validation
  * Section 6.2: Testing
    * Section 6.2.1: Class 1
    * Section 6.2.2: Class 0

7. Section 7: Dataset

8. Section 8: Results

9. Section 9: Conclusions

**Declaration:** Some of the code used in this assignment has been adapted and customized from www.docs.python.org/, www.matplotlib.org/stable/, www.pandas.pydata.org/docs, www.stackoverflow.com/questions/, www.geeksforgeeks.org/, www.kite.com/python/, www.codegrepper.com/, www.stats.stackexchange.com/questions/, www.machinelearningmind.com/, www.kaggle.com/, www.scikit-learn.org, www.towardsdatascience.com/, www.github.com/, www.librosa.org/blog/2019/07/17/resample-on-load/#resample-on-load/, www.librosa.org/doc/main/generated/librosa.feature.mfcc/, www.librosa.org/doc/main/generated/librosa.stft/, ML Foundations with Laurence Moroney at https://www.youtube.com/playlist?list=PLOU2XLYxmsII9mzQ-Xxug4l2o04JBrkLV, and Principles of Machine Learning Lab, Tutorial and Lecture Notes.<br> 

Non-coding information on "melodies" and "harmonies" is learned, surmised, summarised, and quoted from www.masterclass.com/articles/melody-vs-harmony-similarities-and-differences-with-musical-examples/, www.en.wikipedia.org/wiki/Melody/, and www.en.wikipedia.org/wiki/Harmony/.

### **Section 1: Author**<br> 
**Student Name**: Kweku Esuon Acquaye<br> 

### **Section 2: Problem Formulation**<br> 
This report uses modern data science methods to analyse audio files from the MLEnd Hums and Whistles dataset, and to build a machine learning pipeline that takes as input an audio segment from the dataset and predicts its classification as either a melody or a harmony. It constitutes Coursework 3 in fulfilment of the requirements of Principles of Machine Learning module.<br> 

#### **Section 2.1: Outline**<br>
The MLEndHW dataset consists of participant-submitted 15-second humming and whistling recordings of fragments of 8 different movie songs. Each participant submitted 2 humming and 2 whistling renditions per song (32 per participant), along with their demographic data. Demographic data is currently unavailable for this task.<br> 

With 210 participants there are 210 x 4 x 8 = 6720 audio files, anonymised with sample numbers, e.g. S12. The task in this notebook is to<br>
1. understand data
2. decide which data to extract and how to represent it
3. inspect a few recordings for anything unexpected
4. decide what can be automated and what needs to be done manually
5. figure out how to automate what needs to be automated
6. formulate a machine learning problem that can be attempted using the MLEndHW dataset and build a solution model.

#### **Section 2.2: _Sampling_ in Relation to Audio Files**<br> 
An attempt at explaining the potentially confusing use of the term 'sample' in the field of audio processing is made herein as follows: Dataframe files are invariably reffered to in machine learning terms as 'samples', i.e. every row is a sample. However, audio files in themselves have a property referred to as 'sample', or more appropriately 'sample rate', which refers to the number of times the continuous sinusoidal sound wave is _accessed_ and _assessed_ (captured, recorded, snap-shot taken, digitally recorded, etc) in a series of discrete values. Basic information on this distinction is obtained from, among other sources, https://techterms.com/definition/sampling, https://techterms.com/definition/sample_rate, https://www.vocitec.com/docs-tools/blog/sampling-rates-sample-depths-and-bit-rates-basic-audio-concepts, https://en.wikipedia.org/wiki/Sampling_(signal_processing). Sampling rate (or sampling frequency) is measured in Hertz (Hz) which is the number of samples taken in 1 second.<br>  


#### **Section 2.3: Melody vs. Harmony: Similarities and Differences**<br> 
Music has 3 primary elements: melody, harmony, and rhythm. Lyrics constitute a 4th element when there is singing. Melody and harmony, which work in tandem but are distinct from one another, are based on the arrangement of pitches.<br> 

A melody is a collection of musical tones that are grouped together as a single entity$^1$. Most compositions consist of multiple melodies working in conjunction with one another. A melody has 2 primary components - pitch and duration. Pitch is the actual audio vibration produced by an instrument, voice, hum, whistle or other expression of the music$^2$. It is basically the frequecy of the sound wave. Duration is the time that each pitch lasts and is divided into lengths of whole notes, half notes, quarter-note triplets, etc.<br> 

A harmony is the result of amalgamation of musical notes to form a cohesive whole$^3$. It is typically analysed as a series of chords. Harmonies are referred to as the vertical aspect of music as opposed to melodies which are referred to as horizontal$^4$. Although harmonies are simultaneously occurring frequencies, pitches, tones, notes, or chords, both melodic and harmonic outputs can be decomposed into consituent pitches, frequencies, power, and other properties of music. It is these properties that would be utilised to classify the dataset into melodies and harmonies.

#### **Section 2.4: Proposed Methodology**<br> 
In this task the attempted methodology would be to divide the 8 songs of the dataset into melodic and harmonic classes. Each of these would then be divided 70% and 30%, the former for traing and the latter for testing. A pipeline to train a model consisting of a neural network would then be built that takes as input an audio segment of a hum and outputs a prediction of its melodic or harmonic class.<br> 

Due to the disparate acoustic properties of hums and whistles, it is decided to use hums only to obtain better model accuracy in this task. Hums from all 16 parts of the dataset would be combined into one dataset to train the model.

### **Section 3: Machine Learning Pipeline**<br> 
The following steps constitute the machine learning pipeline built to achieve the purpose of this task:<br> 

#### **Section 3.1: Loading the Data**<br> 
The following steps import the necessary dependencies and mounts the drive (i.e. makes drive directly available to Colab) where original audio data files are stored.

In [None]:
# Importing libraries
from google.colab import drive

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os, sys, re, pickle, glob
import urllib.request
import zipfile
from glob import glob

import IPython.display as ipd
from tqdm import tqdm
import librosa

# Mounting Google Drive 
drive.mount('/content/drive')

Mounted at /content/drive


The following function is defined to download MLEndHW dataset files:

In [None]:
# Creating function
def download_url(url, save_path):
    with urllib.request.urlopen(url) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

print("Zip download function created.")

Zip download function created.


Files of the MLEndHW dataset are found to be no longer available at the original location, hence a copy of the dataset made available on OneDrive and shared on the student forum by a colleague is downloaded to Google Drive.<br> 

The next step extracts the zipped zip files:

In [None]:
# Extracting files at 1st level
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/Data.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

print("First level extraction completed.")

First level extraction completed.


The next cell checks the presence of extracted zip files:

In [None]:
# Listing files of 1st level extraction
import os
path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data'
os.listdir(path)

['Frozen_1.zip',
 'Frozen_2.zip',
 'Hakuna_1.zip',
 'Hakuna_2.zip',
 'Mamma_1.zip',
 'Mamma_2.zip',
 'Panther_1.zip',
 'Panther_2.zip',
 'Potter_1.zip',
 'Potter_2.zip',
 'Rain_1.zip',
 'Rain_2.zip',
 'Showman_1.zip',
 'Showman_2.zip',
 'StarWars_1.zip',
 'StarWars_2.zip']

In the next few steps, audio files are extracted into named folders (there is reassignment of variable names):

In [None]:
# Extracting Panther1 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/panther1/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Panther_1.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/panther1/*.wav'
files1 = glob(sample_path)
print("There are", len(files1), "audio files in panther1 folder.")

There are 208 audio files in panther1 folder.


In [None]:
# Extracting Panther2 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/panther2/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Panther_2.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/panther2/*.wav'
files2 = glob(sample_path)
print("There are", len(files2), "audio files in panther2 folder.")

There are 205 audio files in panther2 folder.


In [None]:
# Extracting Rain1 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/rain1/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Rain_1.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/rain1/*.wav'
files3 = glob(sample_path)
print("There are", len(files3), "audio files in rain1 folder.")

There are 208 audio files in rain1 folder.


In [None]:
# Extracting Rain2 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/rain2/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Rain_2.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/rain2/*.wav'
files4 = glob(sample_path)
print("There are", len(files4), "audio files in rain2 folder.")

There are 205 audio files in rain2 folder.


In [None]:
# Extracting Hakuna1 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/hakuna1/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Hakuna_1.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/hakuna1/*.wav'
files5 = glob(sample_path)
print("There are", len(files5), "audio files in hakuna1 folder.")

There are 213 audio files in hakuna1 folder.


In [None]:
# Extracting Hakuna2 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/hakuna2/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Hakuna_2.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/hakuna2/*.wav'
files6 = glob(sample_path)
print("There are", len(files6), "audio files in hakuna2 folder.")

There are 199 audio files in hakuna2 folder.


In [None]:
# Extracting Mamma1 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/mamma1/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Mamma_1.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/mamma1/*.wav'
files7 = glob(sample_path)
print("There are", len(files7), "audio files in mamma1 folder.")

There are 217 audio files in mamma1 folder.


In [None]:
# Extracting Mamma2 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/mamma2/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Mamma_2.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/mamma2/*.wav'
files8 = glob(sample_path)
print("There are", len(files8), "audio files in mamma2 folder.")

There are 195 audio files in mamma2 folder.


In [None]:
# Extracting Showman1 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/showman1/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Showman_1.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/showman1/*.wav'
files9 = glob(sample_path)
print("There are", len(files9), "audio files in showman1 folder.")

There are 207 audio files in showman1 folder.


In [None]:
# Extracting Showman2 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/showman2/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Showman_2.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/showman2/*.wav'
files10 = glob(sample_path)
print("There are", len(files10), "audio files in showman2 folder.")

There are 203 audio files in showman2 folder.


In [None]:
# Extracting Frozen1 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/frozen1/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Frozen_1.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/frozen1/*.wav'
files11 = glob(sample_path)
print("There are", len(files11), "audio files in frozen1 folder.")

There are 200 audio files in frozen1 folder.


In [None]:
# Extracting Frozen2 files
directory_to_extract_to = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/frozen2/'
zip_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/Data/Frozen_2.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# Counting files
sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/all_samples/frozen2/*.wav'
files12 = glob(sample_path)
print("There are", len(files12), "audio files in frozen2 folder.")

There are 210 audio files in frozen2 folder.


#### **Section 3.2: Data Preparation**

To proceed further, a determination is made as to which tunes are melodies and which are harmonies. Using the original YouTube music tunes and the information summarised in Section 2.3 above, the following demacartion is made:<br> 

**Melodies** = Potter, Rain, Mamma, and Frozen.<br> 
**Harmonies** = StarWars, Panther, Hakuna, and Showman.<br> 

Although Mamma qualifies as both melody and harmony, it was judged to be marginally more melodic than harmonic and assigned as a melody.<br> 

All audio files, including preprocessed Potter and StaWars files from earlier basic implementation, are then manually transferred to one of two folders labbelled as "melodies_train" or "harmonies_train". A total of 76 files are determined to be unloadable due to errors in their formatting and deleted. 120 files from each folder are then transferred to corresponding "melodies_test" and "harmonies_test" folders.



In [None]:
# Loading melodies training and validation data
melodies_sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/melodies_train/*.wav'
files = glob(melodies_sample_path)
len(files)

1432

In [None]:
# Extracting melodies info from filenames
melody_table = [] 

for file in files:
  try:
    file_name = file.split('/')[-1]
    participant_ID = file.split('/')[-1].split('_')[0]
    interpretation_type = file.split('/')[-1].split('_')[1]
    interpretation_number = file.split('/')[-1].split('_')[2]
    song = file.split('/')[-1].split('_')[3].split('.')[0]
    melody_table.append([file_name,participant_ID,interpretation_type,interpretation_number, song])
  except:
    print(file_name)
    
melody_table[:5]

[['S73_hum_1_Rain.wav', 'S73', 'hum', '1', 'Rain'],
 ['S73_hum_4_Rain.wav', 'S73', 'hum', '4', 'Rain'],
 ['S74_hum_1_Rain.wav', 'S74', 'hum', '1', 'Rain'],
 ['S74_hum_4_Rain.wav', 'S74', 'hum', '4', 'Rain'],
 ['S75_hum_1_Rain.wav', 'S75', 'hum', '1', 'Rain']]

In [None]:
# Creating melodies dataframe of training files
melody_df = pd.DataFrame(melody_table,columns=['file_id','participant','interpretation','number','song']).set_index('file_id') 
melody_df.head(5)

Unnamed: 0_level_0,participant,interpretation,number,song
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
S73_hum_1_Rain.wav,S73,hum,1,Rain
S73_hum_4_Rain.wav,S73,hum,4,Rain
S74_hum_1_Rain.wav,S74,hum,1,Rain
S74_hum_4_Rain.wav,S74,hum,4,Rain
S75_hum_1_Rain.wav,S75,hum,1,Rain


In [None]:
# Appending classification column
melody_df['classfcn'] = 'melody'
melody_df.head(5)

Unnamed: 0_level_0,participant,interpretation,number,song,classfcn
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
S73_hum_1_Rain.wav,S73,hum,1,Rain,melody
S73_hum_4_Rain.wav,S73,hum,4,Rain,melody
S74_hum_1_Rain.wav,S74,hum,1,Rain,melody
S74_hum_4_Rain.wav,S74,hum,4,Rain,melody
S75_hum_1_Rain.wav,S75,hum,1,Rain,melody


In [None]:
# Loading harmonies training and validation data
harmonies_sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/harmonies_train/*.wav'
files = glob(harmonies_sample_path)
len(files)

1439

In [None]:
# Extracting harmonies info from filenames
harmony_table = [] 

for file in files:
  try:
    file_name = file.split('/')[-1]
    participant_ID = file.split('/')[-1].split('_')[0]
    interpretation_type = file.split('/')[-1].split('_')[1]
    interpretation_number = file.split('/')[-1].split('_')[2]
    song = file.split('/')[-1].split('_')[3].split('.')[0]
    harmony_table.append([file_name,participant_ID,interpretation_type,interpretation_number, song])
  except:
    print(file_name)
    
harmony_table[:5]

[['S72_whistle_1_Panther.wav', 'S72', 'whistle', '1', 'Panther'],
 ['S73_hum_2_Panther.wav', 'S73', 'hum', '2', 'Panther'],
 ['S73_hum_4_Panther.wav', 'S73', 'hum', '4', 'Panther'],
 ['S74_hum_2_Panther.wav', 'S74', 'hum', '2', 'Panther'],
 ['S74_hum_4_Panther.wav', 'S74', 'hum', '4', 'Panther']]

In [None]:
# Creating harmonies dataframe of training files
harmony_df = pd.DataFrame(harmony_table,columns=['file_id','participant','interpretation','number','song']).set_index('file_id') 
harmony_df.head(5)

Unnamed: 0_level_0,participant,interpretation,number,song
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
S72_whistle_1_Panther.wav,S72,whistle,1,Panther
S73_hum_2_Panther.wav,S73,hum,2,Panther
S73_hum_4_Panther.wav,S73,hum,4,Panther
S74_hum_2_Panther.wav,S74,hum,2,Panther
S74_hum_4_Panther.wav,S74,hum,4,Panther


In [None]:
# Appending classification column
harmony_df['classfcn'] = 'harmony'
harmony_df.head(5)

Unnamed: 0_level_0,participant,interpretation,number,song,classfcn
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
S72_whistle_1_Panther.wav,S72,whistle,1,Panther,harmony
S73_hum_2_Panther.wav,S73,hum,2,Panther,harmony
S73_hum_4_Panther.wav,S73,hum,4,Panther,harmony
S74_hum_2_Panther.wav,S74,hum,2,Panther,harmony
S74_hum_4_Panther.wav,S74,hum,4,Panther,harmony


With classification assigned, all audio files are transerred to a single folder labelled "combined_train". Associated dataframes are concatenated row-wise to form a single dataframe "combined_df".

In [None]:
# Concatenating dataframes
frames = [melody_df, harmony_df]
combined_df = pd.concat(frames)

print("Combined dataframe has shape:", combined_df.shape)
print("\nThe confluence of concatenation is:\n")
combined_df[1429:1439]

Combined dataframe has shape: (2871, 5)

The confluence of concatenation is:



Unnamed: 0_level_0,participant,interpretation,number,song,classfcn
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
S71_hum_4_Rain.wav,S71,hum,4,Rain,melody
S72_hum_1_Rain.wav,S72,hum,1,Rain,melody
S72_whistle_2_Rain.wav,S72,whistle,2,Rain,melody
S72_whistle_1_Panther.wav,S72,whistle,1,Panther,harmony
S73_hum_2_Panther.wav,S73,hum,2,Panther,harmony
S73_hum_4_Panther.wav,S73,hum,4,Panther,harmony
S74_hum_2_Panther.wav,S74,hum,2,Panther,harmony
S74_hum_4_Panther.wav,S74,hum,4,Panther,harmony
S75_hum_2_Panther.wav,S75,hum,2,Panther,harmony
S75_whistle_1_Panther.wav,S75,whistle,1,Panther,harmony


Due to repeated persistent kernel failure to execute after many, many hours (classification in dataframe does not affect actual audio files), it is decided to change approach to the task by working directly with the folder of combined files and effecting classification at the level of the function "getXy" using the statement<br> 

``yi = labels_file.loc[fileID]['song'] == ['Potter', 'Rain', 'Mamma', 'Frozen']``<br> 

instead of the above separated melodies and harmonies files approach.

In [None]:
# Resetting path for combined data
combined_sample_path = '/content/drive/MyDrive/Data2/MLEndHW2/combined_train/*.wav'
files = glob(combined_sample_path)
len(files)

2870

In [None]:
# Splitting files
for file in files:
  file.split('/')[-1]

print("File splitting completed.")

File splitting completed.


In [None]:
# Extracting combined info from filenames
combined_table = [] 

for file in files:
  try:
    file_name = file.split('/')[-1]
    participant_ID = file.split('/')[-1].split('_')[0]
    interpretation_type = file.split('/')[-1].split('_')[1]
    interpretation_number = file.split('/')[-1].split('_')[2]
    song = file.split('/')[-1].split('_')[3].split('.')[0]
    combined_table.append([file_name,participant_ID,interpretation_type,interpretation_number, song])
  except:
    print(file_name)
    
combined_table[:5]

[['S78_hum_2_Mamma.wav', 'S78', 'hum', '2', 'Mamma'],
 ['S78_whistle_2_Mamma.wav', 'S78', 'whistle', '2', 'Mamma'],
 ['S79_hum_1_[mamma].wav', 'S79', 'hum', '1', '[mamma]'],
 ['S79_hum_3_[Mamma].wav', 'S79', 'hum', '3', '[Mamma]'],
 ['S80_hum_2_Mamma.wav', 'S80', 'hum', '2', 'Mamma']]

In [None]:
# Creating combined dataframe of training files
combined_df = pd.DataFrame(combined_table,columns=['file_id','participant','interpretation','number','song']).set_index('file_id') 
combined_df.head(5)

Unnamed: 0_level_0,participant,interpretation,number,song
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
S78_hum_2_Mamma.wav,S78,hum,2,Mamma
S78_whistle_2_Mamma.wav,S78,whistle,2,Mamma
S79_hum_1_[mamma].wav,S79,hum,1,[mamma]
S79_hum_3_[Mamma].wav,S79,hum,3,[Mamma]
S80_hum_2_Mamma.wav,S80,hum,2,Mamma


### **Section 4: Transformation Stage**<br> 
In this analysis, 4 features are extracted from each audio file and assembled in an array described by a dataframe. These 4 features, used as input for the task model, are:<br> 

1.   Power.
2.   Pitch mean.
3.   Pitch standard deviation.
4.   Fraction of voiced region.

The extraction of these features requires first a Fourier transformation of the waveform into discrete frequencies over the range of the segment.<br> 
**Power** is "the sum of the absolute squares of a signal's time-domain samples divided by the signal length, or, equivalently, the square of its root mean square level"$^4$. From general knowledge that power is energy  divided by time taken, the power of a sound signal is basically its loudness, i.e. how quickly the signal's energy is delivered.<br> 
**Pitch mean**: Pitch is the property of sound that enables its judgement as low or high (and anywhere in between) on the musical scale, and is measured as frequency (in Hz). Pitch mean is the mean frequency in a segment.<br> 
**Pitch standard deviation** is the _spread_ of the main frequencies of the audio segment.<br> 
**Fraction of voiced region** is the proportion or ratio of an audio segment that actually has sound/signal output to total audio segment time. Its converse is silent regions or regions with no signal output, i.e. silence or just noise. The periodicity of this property of sound may contain discernible information is thus used as a feature of audio analysis.<br> 

#### **Section 4.1: Feature Extraction**
The next step defines a function for determining the pitch of an audio segments:

In [None]:
def getPitch(x,fs,winLen=0.02):
  p = winLen*fs
  frame_length = int(2**int(p-1).bit_length())
  hop_length = frame_length//2
  f0, voiced_flag, voiced_probs = librosa.pyin(y=x, fmin=80, fmax=450, sr=fs, frame_length=frame_length,hop_length=hop_length)
  return f0,voiced_flag

print("Pitch function created.")

Pitch function created.


In [None]:
def getXy(files,labels_file, scale_audio=False, onlySingleDigit=False):
  X,y =[],[]
  for file in tqdm(files):
    fileID = file.split('/')[-1]
    file_name = file.split('/')[-1]
    yi = labels_file.loc[fileID]['song'] == 'Potter', 'Rain', 'Mamma', 'Frozen' # this establishes the classification

    fs = None # fs would default to 22050
    x, fs = librosa.load(file,sr=fs)
    if scale_audio: x = x/np.max(np.abs(x))
    f0, voiced_flag = getPitch(x,fs,winLen=0.02)
      
    power = np.sum(x**2)/len(x)
    pitch_mean = np.nanmean(f0) if np.mean(np.isnan(f0))<1 else 0
    pitch_std  = np.nanstd(f0) if np.mean(np.isnan(f0))<1 else 0
    voiced_fr = np.mean(voiced_flag)

    xi = [power,pitch_mean,pitch_std,voiced_fr]
    X.append(xi)
    y.append(yi)

  return np.array(X),np.array(y)

print("Feature extraction function created.")

Feature extraction function created.


### **Section 5: Modelling**<br> 

Classification is set such that 'Potter', 'Rain', 'Mamma', and 'Frozen' evaluate to 1 (True) and StarWars, Panther, Hakuna, and Showman to 0 (False) during definition of the above function.<br> 

A numpy predictor array `X` and a binary label vector `y`of actual audio data are obtained next and their shapes output as follows:

In [None]:
# Creating predictor array and label vector
X,y = getXy(files, labels_file=combined_df, scale_audio=True, onlySingleDigit=True)

100%|██████████| 2870/2870 [2:46:03<00:00,  3.47s/it]


In [None]:
# Outputting shapes and arrays
print('The shape of X is', X.shape) 
print('The shape of y is', y.shape)
print('The features matrix is', X)
print('The labels vector is', y)

The shape of X is (2870, 4)
The shape of y is (2870, 4)
The features matrix is [[3.29445416e-02 1.63847849e+02 2.33039344e+01 6.05075337e-01]
 [1.69751481e-02 4.10724167e+02 2.57669233e+01 5.53311793e-01]
 [5.38656931e-02 3.27666777e+02 4.29379334e+01 6.60246533e-01]
 ...
 [2.96142031e-02 2.13179964e+02 5.90148448e+01 8.71062271e-01]
 [3.11814236e-02 3.94748263e+02 2.79719036e+01 7.72241993e-01]
 [3.54416091e-02 1.44451579e+02 2.77358846e+01 7.48224152e-01]]
The labels vector is [['False' 'Rain' 'Mamma' 'Frozen']
 ['False' 'Rain' 'Mamma' 'Frozen']
 ['False' 'Rain' 'Mamma' 'Frozen']
 ...
 ['False' 'Rain' 'Mamma' 'Frozen']
 ['False' 'Rain' 'Mamma' 'Frozen']
 ['False' 'Rain' 'Mamma' 'Frozen']]


In [None]:
# Checking for class imbalance
print('The number of melodic recordings in the training dataset is', np.count_nonzero(y))
print('The number of harmonic recordings in the training dataset is', y.size - np.count_nonzero(y))

The number of melodic recordings in the training dataset is 11480
The number of harmonic recordings in the training dataset is 0


It is decided not to normalise the data prior to inputting the model, the judgement being that it would be better for model training and provide better test and deployment accuracy with less likelihood of overfitting.<br>

### **Section 6: Methodology**<br> 

#### **Section 6.1: Model Fitting, Training and Validation**<br> 

In the next few steps the data is split 75% and 25% respectively for training and validation, and a simple neural network model created:

In [None]:
from sklearn.model_selection import train_test_split

# Splitting data
X_train, X_val, y_train, y_val = train_test_split(X,y,test_size=0.25)
X_train.shape, X_val.shape, y_train.shape, y_val.shape

((2152, 4), (718, 4), (2152, 4), (718, 4))

Following a fairly proportional review of the literature the model of choice for this notebook is a simple neural network (NN), as this method is widely employed with good success for audio analysis. With the nature of the data (i.e timeseries/digital signal processing, 4 features, 2871 items) and the goal of analysis in mind (i.e. distinguish between melodies and harmonies = binary classification), it is decided to build a sequential NN with an input layer consisting of 12 perceptrons, with 3 hidden layers of 24:12:6 perceptrons, and an output layer of 1 perceptron. <br> 

The following model design is created:<br> 

**Sequential** : Defines a sequance of layers in the neural network.

**Flatten** : Turns images into a 1 dimensional array.

**Dense** : Adds a layer of neurons.

**Activation function** : Nonlinearities that define node output. Each layer of neurons needs an activation function to tell them what to do.

**Relu** : Rectified Linear Unit, a ramp function comprising the positive part of the argument of an activation function - effectively means "if X>0 return X, else return 0", hence it only passes values 0 or greater to the next layer in the model. It is less computationally intensive, and of good utility in hidden layers.

**Sigmoid** :  One of many last activation functions of a neural network, it is nonlinear, maps logistic or multinomial regression output to probabilities between 0 and 1, and one of recommended last activation functions for binary classification.

**Optimizer**: In this case, using the RMSprop optimization algorithm is preferable to stochastic gradient descent (SGD), because RMSprop automates learning-rate tuning.

**Loss function**: BinaryCrossentropy is chosen as loss function to maximise quantification of the difference between the two probability distributions of melody or harmony.

In [None]:
# Importing library
import tensorflow as tf
print(tf.__version__)

# Defining callback function
class myCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    if(logs.get('accuracy')>0.95):
      print("\nReached 95% accuracy so cancelling training!")
      self.model.stop_training = True

callbacks = myCallback()
print("Callback function defined.")

2.7.0
Callback function defined.


In [None]:
# Passing np array data into tf Dataset
training_data = tf.data.Dataset.from_tensor_slices((X_train, y_train))
validation_data = tf.data.Dataset.from_tensor_slices((X_val, y_val))
print("Training and validation data passed, ready for loading.")

Training and validation data passed, ready for loading.


In [None]:
# Shuffling and batching data
BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 50

training_data = training_data.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
validation_data = validation_data.batch(BATCH_SIZE)

print("Dataset shuffled 50 at a time, and to transit in batches of 32.")

Dataset shuffled 50 at a time, and to transit in batches of 32.


In [None]:
# Building NN model architecture
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
                                    tf.keras.layers.Dense(12, activation='relu'),
                                    tf.keras.layers.Dense(24, activation='relu'),
                                    tf.keras.layers.Dense(12, activation='relu'),
                                    tf.keras.layers.Dense(6, activation='relu'), 
                                    tf.keras.layers.Dense(1, activation='sigmoid')])

print("Simple NN model building completed.")

Simple NN model building completed.


In [None]:
# Defining NN compiler
model.compile(optimizer=tf.keras.optimizers.RMSprop(),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

print("Simple NN model compiler defined.")

Simple NN model compiler defined.


In [None]:
# Training the model
model.fit(training_data, epochs=30, callbacks=[callbacks])
print("Model training completed.")

In [None]:
# Validating the model
model.evaluate(validation_data)
print("Model validation completed.")

#### **Section 6.2: Testing**<br> 
The following few steps are meant to test the model to determine its performance in conditions of real deployment.

In [None]:
# Loading combined test data
test_path = '/content/drive/MyDrive/Data2/MLEndHW2/combined_test/*.wav'
test_files = glob(test_path)
len(test_files)

In [None]:
# Splitting files
for file in files:
  file.split('/')[-1]

print("File splitting completed.")

In [None]:
# Extracting combined test info from filenames
test_table = [] 

for file in files:
  try:
    file_name = file.split('/')[-1]
    participant_ID = file.split('/')[-1].split('_')[0]
    interpretation_type = file.split('/')[-1].split('_')[1]
    interpretation_number = file.split('/')[-1].split('_')[2]
    song = file.split('/')[-1].split('_')[3].split('.')[0]
    test_table.append([file_name,participant_ID,interpretation_type,interpretation_number, song])
  except:
    print(file_name)
    
test_table[:5]

In [None]:
# Creating dataframe of test files
test_df = pd.DataFrame(test_table,columns=['file_id','participant','interpretation','number','song']).set_index('file_id') 
test_df.head(5)

In [None]:
# Creating test arrays
X1_test,y1_test = getXy(test_files, labels_file=test_df, scale_audio=True, onlySingleDigit=True)

In [None]:
# Outputting test shapes and arrays
print('The shape of X1_test is', X1_test.shape) 
print('The shape of y1_test is', y1_test.shape)
print('The features matrix is', X1_test)
print('The labels vector is', y1_test)

In [None]:
# Checking for class imbalance
print('The number of melodic recordings in the testing dataset is', np.count_nonzero(y1_test))
print('The number of harmonic recordings in the testing dataset is', y1_test.size - np.count_nonzero(y1_test))

In [None]:
# Passing test array data into tf Dataset
testing_data = tf.data.Dataset.from_tensor_slices((X1_test, y1_test))
print("Test data passed, ready for loading.")

In [None]:
# Testing the model
test_classifications = model.predict(testing_data)
print(test_classifications[5:35])
print(y1_test[5:35])

### **Section 7: Dataset**<br> 
The dataset for this analysis is the MLEnd Hums and Whistles public dataset version 0. It consists of participant-submitted 15-second humming and whistling recordings of fragments of 8 different movie songs. Each participant submitted 2 humming and 2 whistling renditions per song (32 per participant), along with their demographic data. Demographic data is currently unavailable for this task.<br> 

With 210 participants there are 210 x 4 x 8 = 6720 audio files, anonymised with sample numbers, e.g. S12.

### **Section 8: Results**<br> 
The results obtained with the dataset and pipeline above can be summarised as:<br> 

**i. Training Accuracy** 

**ii. Validation Accuracy**

**iii. Support Vectors**

**iv. Test Accuracy**

### <center>**References**</center> 
1. https://www.masterclass.com/articles/melody-vs-harmony-similarities-and-differences-with-musical-examples#consonance-and-dissonance

2. Mikio Tohyama, *Waveform Analysis of Sound: 3 (Mathematics for Industry)*, 1$^{st}$ edn (Springer, 2015), p. 90.

3. https://en.wikipedia.org/wiki/Harmony

4. https://www.kdnuggets.com/2020/02/audio-data-analysis-deep-learning-python-part-1.html