# Linear regression - audio

Use linear regression to recover or 'fill out' a completely deleted portion of an audio file!
This will be using The [FSDD, Free-Spoken-Digits-Dataset](https://github.com/Jakobovski/free-spoken-digit-dataset), an audio dataset put together by Zohar Jackson:
cleaned up audio (no dead-space, roughly same length, same bitrate, same samples-per-second rate, same speaker, etc) samples ready for machine learning.

In [2]:
'''
! pip install pandas matplotlib librosa
! pip install katonic[ml] -q
'''
!pip install katonic

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3[0m[39;49m -> [0m[32;49m23.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


# get the data

In [3]:
import os
import scipy.io.wavfile as wavfile


zero = []
directory = "/home/katonic/free-spoken-digit-dataset/recordings/"#../free-spoken-digit-dataset/recordings/"
for fname in os.listdir(directory):
    print('-->',fname)
    if fname.startswith("0_jackson"):
        fullname = os.path.join(directory, fname)
        sample_rate, data = wavfile.read(fullname)
        zero.append( data )


--> 6_jackson_25.wav
--> 5_lucas_7.wav
--> 8_jackson_49.wav
--> 2_theo_13.wav
--> 9_nicolas_45.wav
--> 3_george_36.wav
--> 8_yweweler_10.wav
--> 4_yweweler_13.wav
--> 8_theo_17.wav
--> 0_george_25.wav
--> 3_lucas_4.wav
--> 1_theo_19.wav
--> 7_theo_25.wav
--> 0_nicolas_22.wav
--> 5_nicolas_19.wav
--> 2_lucas_24.wav
--> 0_theo_31.wav
--> 9_jackson_7.wav
--> 9_lucas_26.wav
--> 5_jackson_49.wav
--> 2_theo_27.wav
--> 0_theo_21.wav
--> 3_lucas_37.wav
--> 3_yweweler_20.wav
--> 5_jackson_1.wav
--> 3_george_33.wav
--> 7_jackson_5.wav
--> 4_yweweler_28.wav
--> 6_theo_2.wav
--> 3_george_3.wav
--> 2_nicolas_29.wav
--> 6_yweweler_1.wav
--> 9_george_0.wav
--> 5_lucas_34.wav
--> 8_nicolas_35.wav
--> 1_lucas_44.wav
--> 8_george_36.wav
--> 1_nicolas_23.wav
--> 7_jackson_36.wav
--> 2_lucas_39.wav
--> 8_lucas_31.wav
--> 0_theo_3.wav
--> 9_yweweler_30.wav
--> 8_lucas_10.wav
--> 5_lucas_38.wav
--> 2_yweweler_1.wav
--> 0_yweweler_37.wav
--> 5_yweweler_13.wav
--> 8_lucas_22.wav
--> 2_yweweler_35.wav
--> 0_ja

There are 500 recordings, 50 of each digit.  
Each .wav file is actually just a bunch of numeric samples, "sampled"
from the analog signal. [Sampling](https://en.wikipedia.org/wiki/Sampling_%28signal_processing%29) is a type of discretization. When we mention 'samples', we mean observations. When we mention 'audio samples', we mean the actually "features" of the audio file.

The goal of this notebook is to use multi-target, linear regression to generate by extrapolation, the missing portion of the test audio file.

Each one audio_sample features will be the output of an equation,
which is a function of the provided portion of the audio_samples:

   missing_samples = f(provided_samples)

# prepare the data

Convert zero into a DataFrame and set the dtype to np.int16, since the input audio files are 16 bits per sample. This is important otherwise the produced audio samples will be encoded as 64 bits per sample and will be too short.

In [4]:
import numpy as np
import pandas as pd

zeroDF = pd.DataFrame(zero, dtype=np.int16)

  zeroDF = pd.DataFrame(zero, dtype=np.int16)


In [5]:
zeroDF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Columns: 6273 entries, 0 to 6272
dtypes: float64(2186), int16(4087)
memory usage: 1.2 MB


Since these audio clips are unfortunately not length-normalized, we're going to have to just hard chop them to all be the same length.
Since Pandas would have inserted NANs at any spot to make zero a 
perfectly rectangular [n_observed_samples, n_audio_samples] array, do a dropna on the Y axis here. Then, convert one back into an NDArray using .values

In [6]:
if zeroDF.isnull().values.any() == True:
  print("Preprocessing data: dropping all NaN")
  zeroDF.dropna(axis=1, inplace=True)
else:
  print("Preprocessing data: No NaN found!")

zero = zeroDF.values # this is a list

Preprocessing data: dropping all NaN


In [7]:
n_audio_samples = zero.shape[1]

In [8]:
n_audio_samples

4087

# split the data into training and testing sets

There are 50 takes of each clip. You want to pull out just one of them, randomly, and that one will NOT be used in the training of the model. In other words, the one file we'll be testing / scoring on will be an unseen sample, independent to the rest of the training set.

In [9]:
from sklearn.utils.validation import check_random_state

rng   = check_random_state(7) 
random_idx = rng.randint(zero.shape[0])

test  = zero[random_idx] # the test sample
train = np.delete(zero, [random_idx], axis=0)

In [10]:
print(train.shape)
print(test.shape)

(49, 4087)
(4087,)


Save the original 'test' clip, the one you're about to delete half of, so that you can compare it to the 'patched' clip once you've generated it. 
This assume the sample rate is always the same for all samples

In [11]:
wavfile.write('/home/katonic/regression/OriginalTestClip.wav', sample_rate, test)

Embedding the audio file.  
Note that this is not working directly in GitHub (I think all JavaScript is stripped out), fork it or download it to play the audio

In [12]:
from IPython.display import Audio
Audio("/home/katonic/regression/OriginalTestClip.wav")

# carve out the labels Y

The data will have two parts: X and y (the true labels).  
X is going to be the first portion of the audio file, which we will be providing the computer as input (the "chopped" audio).  
y, the "label", is going to be the remaining portion of the audio file.   In this way the computer will use linear regression to derive the missing portion of the sound file based off of the training data it has received! 

_ProvidedPortion_ is how much of the audio file will be provided, in percent. The remaining percent of the file will be generated via linear extrapolation.  

In [13]:
Provided_Portion = 0.5 # let's delete half of the audio

test_samples = int(Provided_Portion * n_audio_samples)
X_test = test[0:test_samples] # first ones

In [14]:
import IPython
IPython.display.Audio(data=X_test, rate= sample_rate)

Can you hear it? Now it's only the first syllable, "ze" ...  
But we can even delete more and leave only the first quarter!

In [15]:
Provided_Portion = 0.25 # let's delete three quarters of the audio!

test_samples = int(Provided_Portion * n_audio_samples)
X_test = test[0:test_samples] # first ones

In [16]:
wavfile.write('/home/katonic/audio_regression/outputs/ChoppedTestClip.wav', sample_rate, X_test)
IPython.display.Audio("/home/katonic/audio_regression/outputs/ChoppedTestClip.wav")

Almost unrecognisable.  
Will the linear regression model be able to reconstruct the audio?

In [17]:
y_test = test[test_samples:] # remaining audio part is the label

Duplicate the same process for X_train, y_train.

In [18]:
X_train = train[:, 0:test_samples] # first ones: data
y_train = train[:, test_samples:]  # remaining ones: label

SciKit-Learn gets mad if you don't supply your training data in the form of a 2D arrays: [n_samples, n_features].

So if you only have one SAMPLE, such as is our case with X_test, and y_test, then by calling .reshape(1, -1), you can turn [n_features] into [1, n_features].

In [19]:
X_train

array([[  -332,   -396,   -502, ...,  -1636,  -2131,  -1767],
       [   354,    442,    610, ...,  -1731,  -1581,   -461],
       [   382,    459,    530, ...,  -3760,  -3084,  -1245],
       ...,
       [   301,    394,    507, ...,   -104,    165,    -48],
       [  -336,    160,     65, ..., -11790, -14011, -12078],
       [  -326,   -362,   -376, ...,    142,    -48,    604]], dtype=int16)

In [20]:
y_train

array([[-1578,  -761,  -781, ...,  -256,   713,  1265],
       [ 1782,  2855,  2986, ...,   263,   223,   385],
       [ -391,   787,  2983, ...,  -206,  -920, -1501],
       ...,
       [  248,   -18,   214, ...,  2471, -1658, -3648],
       [-8359, -6814, -4490, ...,  -315,  -343,  -319],
       [  316,   494,   589, ..., -3727, -4336, -5194]], dtype=int16)

In [21]:
X_test = X_test.reshape(1,-1)
y_test = y_test.reshape(1,-1)

In [22]:
X_test

array([[-302, -337, -371, ..., -231, -135, -195]], dtype=int16)

In [23]:
y_test

array([[-454, -652, -734, ..., 1105,  559,  477]], dtype=int16)

# Create and train the linear regression model 

In [24]:
from sklearn import linear_model

model = linear_model.LinearRegression()

In [25]:
model.fit(X_train, y_train)

Use the model to predict the 'label' of X_test.  
SciKit-Learn will use float64 to generate the predictions so let's take those values back to int16

In [26]:
y_test_prediction = model.predict(X_test)

In [29]:
y_test_prediction = y_test_prediction.astype(dtype=np.int16)
y_test_prediction

array([[-1149,  -843,  -250, ...,  -829,   482,   779]], dtype=int16)

In [29]:
X_test.tolist()

[[-302,
  -337,
  -371,
  -416,
  -463,
  -533,
  -589,
  -580,
  -577,
  -587,
  -595,
  -577,
  -579,
  -559,
  -518,
  -502,
  -485,
  -490,
  -532,
  -570,
  -585,
  -608,
  -659,
  -634,
  -555,
  -401,
  -182,
  92,
  426,
  794,
  1152,
  1501,
  1833,
  2058,
  2172,
  2217,
  2195,
  2021,
  1789,
  1525,
  1205,
  926,
  634,
  415,
  256,
  176,
  122,
  125,
  120,
  83,
  69,
  -17,
  -132,
  -310,
  -523,
  -791,
  -1047,
  -1263,
  -1451,
  -1585,
  -1646,
  -1655,
  -1630,
  -1520,
  -1349,
  -1211,
  -1086,
  -968,
  -875,
  -847,
  -896,
  -989,
  -1099,
  -1273,
  -1446,
  -1555,
  -1550,
  -1452,
  -1273,
  -953,
  -493,
  76,
  619,
  1194,
  1744,
  2218,
  2629,
  2860,
  2954,
  2951,
  2826,
  2610,
  2356,
  2033,
  1695,
  1315,
  936,
  566,
  231,
  -50,
  -275,
  -433,
  -516,
  -490,
  -441,
  -354,
  -242,
  -97,
  -30,
  -13,
  3,
  -30,
  -105,
  -158,
  -250,
  -333,
  -364,
  -402,
  -430,
  -469,
  -481,
  -493,
  -536,
  -577,
  -587,
  -625,
  -68

In [30]:
import pickle

with open('audio_regression.pickle','wb') as f:
    pickle.dump(model,f)

# Katonic SDK

In [None]:
from katonic.ml.regression import Regressor

exp_name = 'audio_prediction'
reg = Regressor(X_train,X_test,y_train,y_test, exp_name)

## Get registered experiment details

In [None]:
exp_id = reg.id
print("experiment name : ", reg.name)
print("experiment location : ", reg.location)
print("experiment id : ", reg.id)
print("experiment status : ", reg.stage)

In [None]:
run_list = reg.search_runs(exp_id)['run_id'].tolist()
if run_list:
    reg.delete_run_by_id(run_list)

## Random Forest

In [None]:
# params = {
# 'n_estimators': {
#     'low': 80,
#     'high': 120,
#     'step': 10,
#     'type': 'int'
#     },
# 'criterion':{
#     'values': ['mse', 'mae'],
#     'type': 'categorical'
#     },
# 'min_samples_split': {
#     'low': 2,
#     'high': 5,
#     'type': 'int'
#     },
# 'min_samples_leaf':{
#     'low': 1,
#     'high': 5,
#     'type': 'int'
#     }
# }

In [None]:
reg.RandomForestRegressor()#is_tune=True, params=params)

## Gradient Boosting Regressor

In [None]:
# params = {
# 'n_estimators': {
#     'low': 80,
#     'high': 120,
#     'step': 10,
#     'type': 'int'
#     },
# 'learning_rate':{
#     'low': 0.6,
#     'high':1.0,
#     'type': 'float'
#     },
# 'min_samples_split': {
#     'low': 2,
#     'high': 5,
#     'type': 'int'
#     },
# 'min_samples_leaf':{
#     'low': 1,
#     'high': 5,
#     'type': 'int'
#     },
# 'max_depth': {
#     'low': 2,
#     'high': 4,
#     'type': 'int'
#     }
# }
reg.GradientBoostingRegressor()#is_tune=True, params=params)

## LGBM Regressor

In [None]:
# params={
#     'num_leaves':{
#         'low':25,
#         'high':35,
#         'type':'int'
#     },
#     'learning_rate':{
#         'low':0.1,
#         'high':0.5,
#         'type':'float'
#     },
#     'n_estimators':{
#         'low':80,
#         'high':120,
#         'step':10,
#         'type':'int'
#     },
#     'min_child_samples':{
#         'low': 10,
#         'high':20,
#         'type': 'int'
#     }
# }
reg.LGBMRegressor()#is_tune=True, params=params)

## Support Vector Regression

In [None]:
# params={
#     'C':{
#         'low': 0.5,
#         'high':1.0,
#         'type': 'float'
#     },
#     'kernel':{
#         'values': ['linear', 'rbf', 'poly'],
#         'type':'categorical'
#     },
#     'degree':{
#         'low':2,
#         'high': 4,
#         'type': 'int'
#     }
# }
reg.SupportVectorRegressor()#is_tune=True, params=params)

In [None]:
XGB Regressor
Parameters on the model that needs to be tuned.
params={
    'n_estimators':{
        'low': 10,
        'high': 40,
        'step':10,
        'type': 'int'
    },
    'max_depth':{
        'low':1,
        'high':5,
        'type':'int'
    },
    'learning_rate':{
        'low':0.2,
        'high':0.5,
        'type':'float'
    },
    'objective':{
        'values': ['reg:squarederror'],
        'type':'categorical'
    }
}
reg.XGBRegressor(is_tune=True, params=params)

In [None]:
pd.set_option('max_columns', None)

## Runs of the experiment

In [None]:
df_runs = reg.search_runs(exp_id)
print("Number of runs done : ", len(df_runs))
df_runs

## selecting top Runs on the basis of the metrics R2

In [None]:
top_runs = df_runs.sort_values(['metrics.R2'],ascending=False)
top_runs.head()

## Selecting Best Model

In [None]:
artifacts = top_runs.iloc[0]["artifact_uri"]
run_id = top_runs.iloc[0]["run_id"]
model_name = top_runs.iloc[0]["run_name"] 


print('Best model_artifacts :',artifacts)
print("=" * 100)
print('Best model run_id :',run_id)
print("=" * 100)
print('Best model :',model_name)
print("=" * 100)
print("Best model experiment id :",exp_id)

## Registering Best Model

In [None]:
reg.register_model(model_name = model_name,run_id=run_id)

In [None]:
result = reg.change_stage(
    ver_list=["1"],
    model_name = model_name,
    stage="Production"
)

In [None]:
reg.location

## Fetching Model

In [None]:
location = f"{artifacts}/{model_name}"
model = reg.load_model(location)

In [None]:
model

## Predict

In [None]:
y_pred = model.predict(X_test)

In [None]:
y_pred= y_pred.reshape(-1)

In [None]:
# Prepare variable as DataFrame in pandas
df = pd.DataFrame(X_test)

# Add the target variable to df
# df["y_pred"] = y_pred

In [None]:
df

# Evaluate the result

In [None]:
score = model.score(X_test, y_test) # test samples X and true values for X
print ("Extrapolation R^2 Score: ", score)

In [None]:
X_test

In [34]:
y_test

array([[-9524, -7084, -6587, ...,  -341,  -291,  -254]], dtype=int16)

Obviously, if you look only at Rsquared it seems that it was a totally useless result.  
But let's listen to the generated audio.

First, take the first Provided_Portion portion of the test clip, the part you fed into your linear regression model. Then, stitch that
together with the abomination the predictor model generated for you,
and then save the completed audio clip:

In [None]:
completed_clip = np.hstack((X_test, y_test_prediction))
wavfile.write('/home/katonic/image-classification/output-aud/ExtrapolatedClip.wav', sample_rate, completed_clip[0])

In [None]:
IPython.display.Audio("/home/katonic/image-classification/output-aud/ExtrapolatedClip.wav")

Well, not bad!

In [26]:
API_PREDICT = "https://devenv.katonic.ai/6385d5b56674e0a1f647e9eb/models/md-a56fe347-b7bb-4c2e-8e9a-f8b17fe30bbc/api/v1/predict"
API_FEEDBACK = "https://devenv.katonic.ai/6385d5b56674e0a1f647e9eb/models/md-a56fe347-b7bb-4c2e-8e9a-f8b17fe30bbc/api/v1/feedback"

In [27]:
SECURE_TOKEN = "md-a56fe347-b7bb-4c2e-8e9a-f8b17fe30bbc-6385d5b56674e0a1f647e9eb-audio-regression eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJmOGIxN2ZlMzBiYmMtYjAyOWM1MmI3MmNhNDAzZjllZDMxYzc4M2ZlYzdkYjYiLCJleHAiOjMzMjAzODA5NTY0NzU2fQ.5Fht7J5Tt5MIhC26bgXfaRFA74M8IDddTndkyAuL9fA"

In [28]:
import requests

In [44]:
# Doing Inference using MODEL API.
X_sample = []
for i in range(len(X_test)):
    x = X_test[i].tolist()
    x.append(y_test[i])
    X_sample.append(x)

features = X_sample
pred_labels = []
for x_i in (features):
    data = {"data": [list(float(x) for x in x_i[:-1])]}
    # print(data)
    result = requests.post(f"{API_PREDICT}", json = data,verify=False, headers = {"Authorization":SECURE_TOKEN})
    print(result.text)
    pred_labels.append([float(i) for i in result.text[2:-2].split(',')])

[[-4699.71,-3404.13,-2939.9,-4440.01,-4485.98,-2581.3,-878.69,-228.15,160.71,1080.69,3152.33,4671.09,3779.79,3122.36,3811.56,4416.43,3019.46,1325.11,1131.84,-1514.9,-3424.13,-4993.07,-6455.39,-5801.61,-6234.4,-6288.0,-4888.63,-3396.22,-922.27,341.04,2388.06,3939.27,4724.42,7197.65,7711.54,6796.19,5761.96,4543.81,4358.61,1457.33,-1385.62,-3327.3,-4637.5,-5606.09,-7378.28,-7245.85,-6553.03,-5657.55,-3461.79,-1470.7,74.35,1826.26,4227.31,5707.76,5844.62,6354.53,6616.15,5658.15,3675.8,1932.72,1160.93,-317.92,-2401.41,-4138.17,-4681.27,-4819.54,-5130.73,-5095.83,-4309.21,-2802.63,-1311.64,-281.97,1181.8,2702.82,3722.38,4400.06,5003.16,4508.94,3820.81,3815.1,3286.19,2235.55,-229.46,-2769.14,-4389.67,-6055.23,-6107.68,-6270.82,-7046.08,-5966.67,-4655.67,-2767.33,-269.22,1078.21,3094.58,4091.44,5912.96,7291.89,6661.17,6391.02,4639.76,3918.08,2687.51,41.8,-1910.91,-3819.24,-4195.7,-5656.47,-6834.3,-6171.79,-5049.69,-2713.75,-1993.11,-1067.74,1816.78,4104.42,5808.6,5023.59,5306.17,6900.82,6389.0



In [38]:
type(result.text)

str

In [39]:
test_list = []
print(result.text[2:-2])

-4699.71,-3404.13,-2939.9,-4440.01,-4485.98,-2581.3,-878.69,-228.15,160.71,1080.69,3152.33,4671.09,3779.79,3122.36,3811.56,4416.43,3019.46,1325.11,1131.84,-1514.9,-3424.13,-4993.07,-6455.39,-5801.61,-6234.4,-6288.0,-4888.63,-3396.22,-922.27,341.04,2388.06,3939.27,4724.42,7197.65,7711.54,6796.19,5761.96,4543.81,4358.61,1457.33,-1385.62,-3327.3,-4637.5,-5606.09,-7378.28,-7245.85,-6553.03,-5657.55,-3461.79,-1470.7,74.35,1826.26,4227.31,5707.76,5844.62,6354.53,6616.15,5658.15,3675.8,1932.72,1160.93,-317.92,-2401.41,-4138.17,-4681.27,-4819.54,-5130.73,-5095.83,-4309.21,-2802.63,-1311.64,-281.97,1181.8,2702.82,3722.38,4400.06,5003.16,4508.94,3820.81,3815.1,3286.19,2235.55,-229.46,-2769.14,-4389.67,-6055.23,-6107.68,-6270.82,-7046.08,-5966.67,-4655.67,-2767.33,-269.22,1078.21,3094.58,4091.44,5912.96,7291.89,6661.17,6391.02,4639.76,3918.08,2687.51,41.8,-1910.91,-3819.24,-4195.7,-5656.47,-6834.3,-6171.79,-5049.69,-2713.75,-1993.11,-1067.74,1816.78,4104.42,5808.6,5023.59,5306.17,6900.82,6389.02,

In [49]:
len(y_test[0])

3066

In [50]:
len(pred_labels[0])

3066

In [51]:
data = {"predicted_label":pred_labels, "true_label" : y_test.tolist()}
result = requests.post(f"{API_FEEDBACK}", json=data,verify=False, headers = {"Authorization":SECURE_TOKEN})
print(result.text)

Internal Server Error


