# Demonstration Notebook
## Train model to recognize notes from input sounds

By Ben Walsh \
For Liloquy

&copy; 2021 Ben Walsh <ben@liloquy.io>

## Contents

1. [Import Libraries](#lib_import)
1. [Data Import](#data_import)
1. [Train Model](#model_train)
1. [Evaluate Model](#model_eval)
1. [Save Model](#model_save)

TO DO
- Model registry
- Generalize training functions to look for any files matching corresponding note tag
  - Add _Male2 recordings
- Submodule repo into simple_gui
- Explore different models - try adding Neural Network
- Feature importance with xgboost
- Optimize hyper-parameters - use gridsearch
- For model selection / parameter optimization, plot train/test errors, consider kfolds
- Make dedicated train.py outside of notebook
- Add model_params in model folder with notes, t_len/hum_len


In [1]:
%load_ext autoreload
%autoreload 2

## <a id = "lib_import"></a>1. Import Libraries

In [2]:
import sys
import os
import datetime
import time

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import ipywidgets as widgets

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import svm
from sklearn.preprocessing import LabelEncoder

import xgboost as xgb

import pickle

from scipy.io import wavfile as wav
from IPython.display import Audio

import sqlite3

# Add custom modules to path
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from util.music_util import note_to_freq, add_noise, melody_transcribe, melody_write, Note
from util.ml_util import feat_extract, load_training_data
from util import DATA_FOLDER, MODEL_FOLDER


pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
..\1_audio\hum\Hum_Db4.wav does not exist
..\1_audio\hum\Hum_Eb4.wav does not exist
..\1_audio\hum\Hum_Gb4.wav does not exist
..\1_audio\hum\Hum_Ab4.wav does not exist
..\1_audio\hum\Hum_Bb4.wav does not exist
..\1_audio\hum\Hum_B4.wav does not exist


## <a id = "data_import"></a>2. Data Import

In [3]:
SCALE = ('C4', 'D4', 'E4', 'F4', 'G4', 'A4')
X, y, fs = load_training_data(SCALE)

  fs, signal = wav.read(training_data[note])


### From CSV

In [4]:
X_train = pd.read_csv(os.path.join(DATA_FOLDER,'X_train.csv'))
X_test = pd.read_csv(os.path.join(DATA_FOLDER,'X_test.csv'))
y_train = pd.read_csv(os.path.join(DATA_FOLDER,'y_train.csv'))
y_test = pd.read_csv(os.path.join(DATA_FOLDER,'y_test.csv'))

### From new SQL database (in development)

FEAT_DB_NAME = 'features.db'
TABLE_NAME = 'X_all'
conn = sqlite3.connect(FEAT_DB_NAME)
c = conn.cursor()
c.execute('SELECT name from sqlite_master where type= "table"')

print('Tables in {}: {}'.format(FEAT_DB_NAME, c.fetchall()))

c.close()

c = conn.cursor()
sql = "SELECT * FROM {}".format(TABLE_NAME)
c.execute(sql)
imported_data = c.fetchall()
print('Number of entries imported: {}'.format(len(imported_data)))
c.close()

## <a id = "model_train"></a>3. Train Model

### Define Models

#### SVM model

In [5]:
svm_model = svm.SVC(gamma='scale')

#### XGBoost model

In [6]:
xgb_params = {}
xgb_params['n_estimators'] = 100
xgb_params['max_depth'] = 3
xgb_params['reg_lambda'] = 1

xgb_model = xgb.XGBRegressor(
    n_estimators=xgb_params['n_estimators'],
    reg_lambda=xgb_params['reg_lambda'],
    gamma=0,
    max_depth=xgb_params['max_depth']
)

### Fit Model

In [7]:
svm_model.fit(X_train, y_train)

  return f(**kwargs)


SVC()

For XGBoost, categorical labels must be encoded

In [8]:
label_encoder = LabelEncoder()
label_encoder = label_encoder.fit(y_train)
label_encoded_y = label_encoder.transform(y_train)

In [9]:
xgb_model.fit(X_train, label_encoded_y)



XGBRegressor()

## <a id = "model_eval"></a>4. Evaluate Model

In [10]:
y_predict_svm = svm_model.predict(X_test)
print(f"Accuracy on test set: {100*accuracy_score(y_test, y_predict_svm)}")

Accuracy on test set: 100.0


In [11]:
y_predict_xgb = xgb_model.predict(X_test)
predictions = [round(value) for value in y_predict_xgb]
# evaluate predictions
print(f"Accuracy on test set: {100*accuracy_score(label_encoder.transform(y_test), predictions)}")

Accuracy on test set: 93.33333333333333


  return f(**kwargs)


## <a id = "model_save"></a>5. Save Model

In [12]:
# Add timestamp to ensure unique name
timestamp = datetime.datetime.now()
timestamp_str = '{}-{:02}-{:02}-{:02}-{}-{}-{}'.format(timestamp.year, timestamp.month, timestamp.day, timestamp.hour, timestamp.minute, timestamp.second, timestamp.microsecond)

pickle.dump(svm_model, open('{}/model-{}'.format(MODEL_FOLDER, timestamp_str), "wb"))