In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import librosa
import librosa.display
import os
import re
import tensorflow as tf
from pathlib import Path
from tensorflow.keras import models, layers
from IPython.display import Audio
from keras.utils import to_categorical
import warnings
warnings.filterwarnings("ignore")

Some of these imports are audio specific imports such as librosa and librosa.display which help read audio data, trim, create spectograsm etc.

"from IPython.display import Audio" allows playing audio signals directly in python enviroments.

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#ls /content/drive/MyDrive/

# Dataset

 ### Background on data: The following data includes 2065 audio samples ( Kicks, Snares, 808's, Open hats, and Closed hats) from my own music production library which is made from several sources which include but are not limited to music production/sample sharing websites such as Landr, Splice, Reddit etc. The audio library can be accessed [Here](https://drive.google.com/drive/folders/1Dl2wvDMLQip063K0ncE7Anv7zzn25a0L?usp=sharing). Some of the sounds also came from the [drum classifier github project](https://github.com/aabalke33/drum-audio-classifier/tree/main). These audio samples are typically a few seconds (0-4 seconds) in WAV File formats. I am looking to explore how to convert these samples into their own features which include arrays of time, pitch, and amplitude to train the mode to accuratley classify the instruments based on any given audio sample in this library or outside of this library. Before starting to work with data, I decided to review the unique audio files to ensure data integrity and assumption made as far as the beginning stage. This will be explored further in the cleaning section.
---
### Data Dictionary ###


| Feature | Type | Dataset | Description |
|---------|------|---------|-------------|
|**File_path**|*str*| My data | The file path or directory where the audio file is located.|
|**File_name**|*str*| My data | The name of the file, including the file extension. |
|**Instrument**|*str*| My data | The name of the instrument. Instruments are: Kick, Snare, 808, Open hat, Closed hat|
|**End_time**|*datetime*| My data | The timeframe in seconds that the audio sample ends |
|**Index**|*int*| My data | An index value assigned for instrument based on the instrument column. |
|**Instrument_encoded**|*int*| My data | An encoded value representing the instrument type. |
|**Instrument_0**|*int*| My data | A binary indicator (0 or 1), representing whether the sample is a Kick drum or not.|
|**Instrument_1**|*int*| My data | A binary indicator (0 or 1), representing whether the sample is a Snare drum or not. |
|**Instrument_2**|*int*| My data | A binary indicator (0 or 1),representing whether the sample is an 808 drum or not. |
|**Instrument_3**|*int*| My data | A binary indicator (0 or 1), representing whether the sample is an Open hat or not. |
|**Instrument_4**|*int*| My data | A binary indicator (0 or 1), representing whether the sample is a Closed hat or not. |


---

To start, I want to create a dataframe that has the following columns: File_path, File_name, Instrument, Start_time and End_time. While I have
thought about labeling this a "drum classfier" meaning that my column would have technically been "Drums", I decided to keep it general to instrument because I am intrested in exploring this model in the future with various intruments.

I may assign variable where I drop some of these columns later in the process but for organizational purposes, I am intrested in making this my main df and saving it as such.

In [4]:
# Set the absolute path to the 'Drums' folder
abs_path = '/content/drive/MyDrive/Drums/'

all_files = os.listdir(abs_path)

# Create a list of full file paths
file_paths = [os.path.join(abs_path, f) for f in all_files]

df = pd.DataFrame({'File_path': file_paths})

# Extract the filenames from the file paths and instrument types
df['File_name'] = df['File_path'].apply(lambda x: os.path.basename(x).split('.')[0])

df['Instrument'] = df['File_name'].apply(lambda x: x.split()[0])

Now that I created the dataframe by assigning paths and lists, I want to create the Instrument column which would have all the diffrent types of instruments which in this case will be 5 diffrent instruments: Kick, Snare, Open Hat, Closed Hat and 808 to label my drum sounds accordingly. I also want to be sure that I am able to get all the diffrent combination of file names as there are many files that are labled diffrently than just their name in each category. This function will help sort through the diffrent names of each instrument.

In [6]:
instrument_mapping = {
    'kick': 'Kick',
    'snare': 'Snare',
    '808': '808',
}
def extract_instrument(file_path, instrument_str=None):
    # Extract the filename from the file path
    filename = os.path.basename(file_path)

    pattern = r'(kick|snare|closedhat|closed-hat|closed-hi-hat|openhat|open hat|OH|Oh|open\shat\s\d+|\(open\shat\)|\d+\s-\sOH|808|HH|OPEN-HAT)(_\d+)?'
    match = re.search(pattern, filename, re.IGNORECASE)

    if match:
        instrument = match.group(1).lower()
        if instrument in ['closedhat', 'closed-hat', 'closed-hi-hat']:
            return 'Closed Hat'
        elif instrument in ['openhat', 'open hat', 'OH', 'oh', 'Oh', 'OPEN-HAT', 'HH', 'Hh']:
            return 'Open Hat'
        else:
            return instrument_mapping.get(instrument, instrument.replace('-', ' ').capitalize())
    elif instrument_str:
        # Extract the instrument name from the 'Instrument' column
        match = re.search(r'(kick|snare|closedhat|closed-hat|closed-hi-hat|openhat|open hat|OH|Oh|open\shat\s\d+|\(open\shat\)|HH|OPEN-HAT)', instrument_str, re.IGNORECASE)
        if match:
            instrument = match.group(1).lower()
            if instrument in ['closedhat', 'closed-hat', 'closed-hi-hat']:
                return 'Closed Hat'
            elif instrument in ['openhat', 'open hat', 'OH', 'oh', 'Oh', 'OPEN-HAT', 'HH', 'Hh']:
                return 'Open Hat'
            else:
                return instrument_mapping.get(instrument, instrument.replace('-', ' ').capitalize())

    return 'NaN'

# Apply the extract_instrument function to the 'File_path' and 'Instrument' columns
df['Instrument'] = df.apply(lambda row: extract_instrument(row['File_path'], row.get('Instrument')), axis=1)

In [8]:
# Extract audio start to finish
df['Start_time'] = 0  # Initialize start time to 0
df['End_time'] = 0

for index, row in df.iterrows():
    audio, sr = librosa.load(row['File_path'])
    duration = librosa.get_duration(y=audio, sr=sr)
    df.at[index, 'End_time'] = duration

I am extracting the audio data using librosa. This create will creat the start time and end time columns.

In [10]:
df.head(2000)

Unnamed: 0,File_path,File_name,Instrument,Start_time,End_time
0,/content/drive/MyDrive/Drums/Snare 40 (Roy Aye...,Snare 40 (Roy Ayers - Ebony Blaze (Album Versi...,Snare,0,0.100136
1,/content/drive/MyDrive/Drums/Snare 41 (Roy Aye...,Snare 41 (Roy Ayers - The Old One Two (Move To...,Snare,0,0.154694
2,/content/drive/MyDrive/Drums/Snare 42 (Roy Aye...,Snare 42 (Roy Ayers - No Question (Album Versi...,Snare,0,0.189342
3,/content/drive/MyDrive/Drums/Kick 37 (Roy Ayer...,Kick 37 (Roy Ayers - Can't You See Me),Kick,0,0.417596
4,/content/drive/MyDrive/Drums/Kick 38 (Roy Ayer...,Kick 38 (Roy Ayers - Can't You See Me),Kick,0,0.258730
...,...,...,...,...,...
1995,/content/drive/MyDrive/Drums/Kick 35 (Roy Ayer...,Kick 35 (Roy Ayers - Show Us A Feeling (Album ...,Kick,0,0.210794
1996,/content/drive/MyDrive/Drums/Open hat 15 (Roy ...,Open hat 15 (Roy Ayers - Show Us A Feeling (Al...,Open Hat,0,0.360272
1997,/content/drive/MyDrive/Drums/Snare 38 (Roy Aye...,Snare 38 (Roy Ayers - 2000 Black),Snare,0,0.140454
1998,/content/drive/MyDrive/Drums/Kick 36 (Roy Ayer...,Kick 36 (Roy Ayers - Time And Space),Kick,0,0.322948


# Cleaning process.

  Now that I created the dataframe, I am intrested in cleaning up some of the data such as checking for missing data, nulls.

  Note: The instruments that need to be included in the instrument column are Snare, Kick, Open Hat, 808 and Closed Hat.

In [11]:
df.shape

(2068, 5)

As of now, there are 2068 rows and 5 columns

In [12]:
df.head(20)

Unnamed: 0,File_path,File_name,Instrument,Start_time,End_time
0,/content/drive/MyDrive/Drums/Snare 40 (Roy Aye...,Snare 40 (Roy Ayers - Ebony Blaze (Album Versi...,Snare,0,0.100136
1,/content/drive/MyDrive/Drums/Snare 41 (Roy Aye...,Snare 41 (Roy Ayers - The Old One Two (Move To...,Snare,0,0.154694
2,/content/drive/MyDrive/Drums/Snare 42 (Roy Aye...,Snare 42 (Roy Ayers - No Question (Album Versi...,Snare,0,0.189342
3,/content/drive/MyDrive/Drums/Kick 37 (Roy Ayer...,Kick 37 (Roy Ayers - Can't You See Me),Kick,0,0.417596
4,/content/drive/MyDrive/Drums/Kick 38 (Roy Ayer...,Kick 38 (Roy Ayers - Can't You See Me),Kick,0,0.25873
5,/content/drive/MyDrive/Drums/Snare 43 (Roy Aye...,Snare 43 (Roy Ayers - Everytime I See You (Alb...,Snare,0,0.164082
6,/content/drive/MyDrive/Drums/Open hat 16 (Roy ...,Open hat 16 (Roy Ayers - Everytime I See You (...,Open Hat,0,0.662268
7,/content/drive/MyDrive/Drums/Snare 44 (Roy Aye...,Snare 44 (Roy Ayers - Everytime I See You (Alb...,Snare,0,0.342222
8,/content/drive/MyDrive/Drums/Snare 45 (Roy Aye...,Snare 45 (Roy Ayers - Everytime I See You (Alb...,Snare,0,0.332381
9,/content/drive/MyDrive/Drums/Snare 46 (Roy Aye...,Snare 46 (Roy Ayers - Everytime I See You (Alb...,Snare,0,0.161224


I noticed that there are a few things that I need to clean which include making sure the Instruments in the "Instruments" Column are labled accordingly. I would like to make sure that this column is especially cleaned because I want to make sure that I am able to create an Index for each  instrument.

I want to start off by checking the dataframe for nulls, duplicates, abnormalities etc.

In [None]:
df.isnull().sum()

File_path     0
File_name     0
Instrument    0
Start_time    0
End_time      0
dtype: int64

In [None]:
null_indices = df[df['Instrument'].isnull()].index
if len(null_indices) > 0:
    null_rows = df.loc[null_indices]
else:
    print("There are no null values in the 'Instrument' column.")

There are no null values in the 'Instrument' column.


In [None]:
df.duplicated()

0       False
1       False
2       False
3       False
4       False
        ...  
2063    False
2064    False
2065    False
2066    False
2067    False
Length: 2068, dtype: bool

In [None]:
df.dtypes

File_path      object
File_name      object
Instrument     object
Start_time      int64
End_time      float64
dtype: object

In [None]:
hh_values = df[df['Instrument'].str.contains('Hh', case=False)]
hh_values

Unnamed: 0,File_path,File_name,Instrument,Start_time,End_time
1005,/content/drive/MyDrive/Drums/730! HH (20)🛰🥤.wa...,730! HH (20)🛰🥤,Hh,0,0.760317
1018,/content/drive/MyDrive/Drums/730! HH (37)🛰🥤.wa...,730! HH (37)🛰🥤,Hh,0,0.760317
1023,/content/drive/MyDrive/Drums/730! HH (11)🛰🥤.wa...,730! HH (11)🛰🥤,Hh,0,0.5
1026,/content/drive/MyDrive/Drums/730! HH (21)🛰🥤.wa...,730! HH (21)🛰🥤,Hh,0,0.760317
1032,/content/drive/MyDrive/Drums/730! HH (26)🛰🥤.wa...,730! HH (26)🛰🥤,Hh,0,0.760317
1035,/content/drive/MyDrive/Drums/730! HH (14)🛰🥤.wa...,730! HH (14)🛰🥤,Hh,0,0.212018
1037,/content/drive/MyDrive/Drums/730! HH (40)🛰🥤.wa...,730! HH (40)🛰🥤,Hh,0,0.760317
1042,/content/drive/MyDrive/Drums/730! HH (36)🛰🥤.wa...,730! HH (36)🛰🥤,Hh,0,0.760317
1056,/content/drive/MyDrive/Drums/730! HH (46)🛰🥤.wa...,730! HH (46)🛰🥤,Hh,0,0.069796
1057,/content/drive/MyDrive/Drums/730! HH (6)🛰🥤.wav...,730! HH (6)🛰🥤,Hh,0,0.06254


Viewing all the files that are have "Hh" in my instrument column.

In [None]:
hh_sum = df.loc[df['Instrument'].str.contains('Hh', case=False), 'Instrument'].count()
hh_sum

50

Getting the count of of the "Hh"(meaning Hi-hat) instruments to see how many of these we would potentially be dropping and to decide if the audio files are worth looking through to classify one by one. Since it is 50, I have decided to drop all the rows with the "Hh" being the unclassified Hi-hats.

In [None]:
df = df[df['Instrument'].str.contains('Hh', case=False) == False]

Assigning the dataframe accordingly with the rows dropped.

In [None]:
df.shape

(2018, 5)

Verfiying rows dropped correctly.

In [None]:
Instrument = {
    'Kick': 0,
    '808': 1,
    'Snare': 2,
    'Open Hat': 3,
    'Closed Hat': 4,
}

indexed_instruments = {instrument: index for index, instrument in enumerate(Instrument)}

df['Index'] = df['Instrument'].map(indexed_instruments)

To later use particular instruments, I may need to index into them. By assigning them to diffrent numerical values, I will be able to get the information of particular audio files easier.

In [13]:
df.head(200)

Unnamed: 0,File_path,File_name,Instrument,Start_time,End_time
0,/content/drive/MyDrive/Drums/Snare 40 (Roy Aye...,Snare 40 (Roy Ayers - Ebony Blaze (Album Versi...,Snare,0,0.100136
1,/content/drive/MyDrive/Drums/Snare 41 (Roy Aye...,Snare 41 (Roy Ayers - The Old One Two (Move To...,Snare,0,0.154694
2,/content/drive/MyDrive/Drums/Snare 42 (Roy Aye...,Snare 42 (Roy Ayers - No Question (Album Versi...,Snare,0,0.189342
3,/content/drive/MyDrive/Drums/Kick 37 (Roy Ayer...,Kick 37 (Roy Ayers - Can't You See Me),Kick,0,0.417596
4,/content/drive/MyDrive/Drums/Kick 38 (Roy Ayer...,Kick 38 (Roy Ayers - Can't You See Me),Kick,0,0.258730
...,...,...,...,...,...
195,/content/drive/MyDrive/Drums/closedhat_0041.wav,closedhat_0041,Closed Hat,0,2.000000
196,/content/drive/MyDrive/Drums/openhat_0051.wav,openhat_0051,Open Hat,0,1.283628
197,/content/drive/MyDrive/Drums/closedhat_0055.wav,closedhat_0055,Closed Hat,0,0.336190
198,/content/drive/MyDrive/Drums/closedhat_0135.wav,closedhat_0135,Closed Hat,0,0.512290


Dataframe with the Index column.

In [14]:
non_zero_count = len(df[df['Start_time'] != 0])

if non_zero_count > 0:
    print(f"There are {non_zero_count} rows with non-zero values in the 'Start_time' column.")
else:
    print("All values in the 'Start_time' column are 0.")

All values in the 'Start_time' column are 0.


I am checking for all the columns now for abnormalities, including, start_time.

In [None]:
null_indices = df[df['Index'].isnull()].index.tolist()[1]

null_rows = df.loc[null_indices]
null_rows

File_path     /content/drive/MyDrive/Drums/OPEN-HAT (35).wav
File_name                                      OPEN-HAT (35)
Instrument                                          Open hat
Start_time                                                 0
End_time                                            1.075918
Index                                                    NaN
Name: 1019, dtype: object

Getting the rows with null values in my Index column.

In [None]:
print(df[df.isna().any(axis=1)])

                                           File_path      File_name  \
1016  /content/drive/MyDrive/Drums/OPEN-HAT (33).wav  OPEN-HAT (33)   
1019  /content/drive/MyDrive/Drums/OPEN-HAT (35).wav  OPEN-HAT (35)   
1022   /content/drive/MyDrive/Drums/OPEN-HAT (1).wav   OPEN-HAT (1)   
1050   /content/drive/MyDrive/Drums/OPEN-HAT (8).wav   OPEN-HAT (8)   
1051  /content/drive/MyDrive/Drums/OPEN-HAT (30).wav  OPEN-HAT (30)   
1055  /content/drive/MyDrive/Drums/OPEN-HAT (37).wav  OPEN-HAT (37)   
1090  /content/drive/MyDrive/Drums/OPEN-HAT (39).wav  OPEN-HAT (39)   
1096  /content/drive/MyDrive/Drums/OPEN-HAT (29).wav  OPEN-HAT (29)   
1099  /content/drive/MyDrive/Drums/OPEN-HAT (40).wav  OPEN-HAT (40)   
1112  /content/drive/MyDrive/Drums/OPEN-HAT (48).wav  OPEN-HAT (48)   
1125  /content/drive/MyDrive/Drums/OPEN-HAT (46).wav  OPEN-HAT (46)   
1129  /content/drive/MyDrive/Drums/OPEN-HAT (47).wav  OPEN-HAT (47)   
1130  /content/drive/MyDrive/Drums/OPEN-HAT (44).wav  OPEN-HAT (44)   
1151  

There is many null values for the index column because the Open hat was not assigned as "Open Hat".

In [None]:
unique_instruments = df['Instrument'].unique()
print(f"Unique instruments: {unique_instruments}")

Unique instruments: ['Snare' 'Kick' 'Open Hat' '808' 'Closed Hat' 'Open hat']


In [None]:
df.loc[df['Instrument'] == 'Open hat', 'Index'] = df.loc[df['Instrument'] == 'Open hat', 'Index'].fillna(4.0)

In [None]:
df['Instrument'] = df['Instrument'].replace('Open hat', 'Open Hat')

Replacing all the Open hats to be labeled as Open Hat and labeled them with their assigned Index number.

In [None]:
print(df[df.isna().any(axis=1)])

Empty DataFrame
Columns: [File_path, File_name, Instrument, Start_time, End_time, Index]
Index: []


In [None]:
num_instruments = df['Instrument'].nunique()
print(f"Number of unique instruments: {num_instruments}")

Number of unique instruments: 5


In [None]:
unique_instruments = df['Instrument'].unique()
print(f"Unique instruments: {unique_instruments}")

Unique instruments: ['Snare' 'Kick' 'Open Hat' '808' 'Closed Hat']


Changes merged the "opens hats" successfully.

In [15]:
df.head(5)

Unnamed: 0,File_path,File_name,Instrument,Start_time,End_time
0,/content/drive/MyDrive/Drums/Snare 40 (Roy Aye...,Snare 40 (Roy Ayers - Ebony Blaze (Album Versi...,Snare,0,0.100136
1,/content/drive/MyDrive/Drums/Snare 41 (Roy Aye...,Snare 41 (Roy Ayers - The Old One Two (Move To...,Snare,0,0.154694
2,/content/drive/MyDrive/Drums/Snare 42 (Roy Aye...,Snare 42 (Roy Ayers - No Question (Album Versi...,Snare,0,0.189342
3,/content/drive/MyDrive/Drums/Kick 37 (Roy Ayer...,Kick 37 (Roy Ayers - Can't You See Me),Kick,0,0.417596
4,/content/drive/MyDrive/Drums/Kick 38 (Roy Ayer...,Kick 38 (Roy Ayers - Can't You See Me),Kick,0,0.25873


In [None]:
df['Index'] = df['Index'].astype(int)

In [None]:
index_counts = df['Index'].value_counts()
index_counts

Index
0    530
1    481
2    435
4    302
3    270
Name: count, dtype: int64

In [None]:
df.shape

(2018, 6)

From this, we can tell that all the Index rows, count-wise, were assigned correctly as the total amounts to 2018.

In [None]:
nan_counts = df.isnull().sum()
nan_counts

File_path     0
File_name     0
Instrument    0
Start_time    0
End_time      0
Index         0
dtype: int64

In [16]:
df.head(100)

Unnamed: 0,File_path,File_name,Instrument,Start_time,End_time
0,/content/drive/MyDrive/Drums/Snare 40 (Roy Aye...,Snare 40 (Roy Ayers - Ebony Blaze (Album Versi...,Snare,0,0.100136
1,/content/drive/MyDrive/Drums/Snare 41 (Roy Aye...,Snare 41 (Roy Ayers - The Old One Two (Move To...,Snare,0,0.154694
2,/content/drive/MyDrive/Drums/Snare 42 (Roy Aye...,Snare 42 (Roy Ayers - No Question (Album Versi...,Snare,0,0.189342
3,/content/drive/MyDrive/Drums/Kick 37 (Roy Ayer...,Kick 37 (Roy Ayers - Can't You See Me),Kick,0,0.417596
4,/content/drive/MyDrive/Drums/Kick 38 (Roy Ayer...,Kick 38 (Roy Ayers - Can't You See Me),Kick,0,0.258730
...,...,...,...,...,...
95,/content/drive/MyDrive/Drums/Kick 58 (Vern Bla...,Kick 58 (Vern Blair Debate - Super Funk),Kick,0,0.219229
96,/content/drive/MyDrive/Drums/Kick 59 (Area - G...,Kick 59 (Area - Guardati dal mese vicino all'a...,Kick,0,0.244807
97,/content/drive/MyDrive/Drums/Snare 80 (Banco D...,Snare 80 (Banco Del Mutuo Soccorso - Interno C...,Snare,0,0.162313
98,/content/drive/MyDrive/Drums/Snare 81 (Banco D...,Snare 81 (Banco Del Mutuo Soccorso - Interno C...,Snare,0,0.519546


In [None]:
instrument_counts = df.groupby('Instrument').size()
print(f"Number of samples for each instrument:\n{instrument_counts}")

Number of samples for each instrument:
Instrument
808           481
Closed Hat    246
Kick          530
Open Hat      326
Snare         435
dtype: int64


Halfway through I decided that I wanted two ways to ensure that I would not run into any issues in the modeling process so I deicded to encode the Instrument column. This would be depedent on whther I approach the model as a classifcation model or CNN(Convolutional Neural Network). As of now I am most imntrested in applying the CNN model. This would do the same thing the index column does however, it seperate its assigned numbers into diffrent columns.

In [None]:
instruments = df['Instrument'].unique()
num_classes = len(instruments)

instrument_to_index = {instrument: i for i, instrument in enumerate(instruments)}

df['Instrument_encoded'] = df['Instrument'].map(instrument_to_index)
one_hot_encoded = to_categorical(df['Instrument_encoded'], num_classes=num_classes)

for i in range(num_classes):
    df[f'Instrument_{i}'] = one_hot_encoded[:, i]

In [17]:
df.head(5)

Unnamed: 0,File_path,File_name,Instrument,Start_time,End_time
0,/content/drive/MyDrive/Drums/Snare 40 (Roy Aye...,Snare 40 (Roy Ayers - Ebony Blaze (Album Versi...,Snare,0,0.100136
1,/content/drive/MyDrive/Drums/Snare 41 (Roy Aye...,Snare 41 (Roy Ayers - The Old One Two (Move To...,Snare,0,0.154694
2,/content/drive/MyDrive/Drums/Snare 42 (Roy Aye...,Snare 42 (Roy Ayers - No Question (Album Versi...,Snare,0,0.189342
3,/content/drive/MyDrive/Drums/Kick 37 (Roy Ayer...,Kick 37 (Roy Ayers - Can't You See Me),Kick,0,0.417596
4,/content/drive/MyDrive/Drums/Kick 38 (Roy Ayer...,Kick 38 (Roy Ayers - Can't You See Me),Kick,0,0.25873


In [None]:
df.to_csv('/content/drive/MyDrive/Instrument_classifier.csv', index=False)

Saved the dataframe as a CSV, ready to be used for the EDA process and further modeling.

To summarize, my goal is to create a instrument classifier that can predict what instrument it is. This would work with 5 instruments which include Kick, Snare, Closed Hat, Open Hat, and 808. I am using real audio data and extracted this data to make a dataframe and in this process I cleaned certain discrepancys such as null values in columns, duplicates, incorrect labeling, and more. I decided to choose these particular columns as features as I belive these are columns that could potenitally best help identify the instrument. In the EDA and modeling process I may discover other features that I may want to feature engineer however, this is what i have so far. Lastly, I decided to encode my instrument column as I beleive that may be the foundation of my model.

End of Data/Cleaning notebook.