# Music Genre Labelling Model

Hello! I am Daniel Lim, and this is a notebook outlining the end-to-end project I have
done to train a Music Genre Labelling Model. The dataset used is the [million song
dataset](http://millionsongdataset.com/) (T. Bertin-Mahieux, D. P.W. Ellis, B. Whitman, and P. Lamere.)

## Notebook Setup: Imports + Reading Data

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import nltk
import os
import re
from gensim.models import Word2Vec
from gensim.utils import simple_preprocess

In [2]:
df_inputs = pd.read_csv('data/features.csv')
df_labels = pd.read_csv('data/labels.csv')

### Details About This Project:

In this project, the task is to develop and serve a classification model for musical
genres. The data provided has been split into **features** and **labels**, which are in
their own `.csv` files.

The task can be split into the following parts:
1. Exploratory Data Analysis (EDA) On the Dataset
2. Data Processing for Training
3. Choosing & Training a Model
4. Serving the Model to a Web Service

Most of the actual implementation of this project will be in `.py` scripts, but for
exploration and preliminary tests, I will be using this notebook.

## Part 1: EDA On the Dataset

The beginning of any Machine Learning / Data Science / AI project should include an
understanding of the dataset before any solutions are applied. This should aid the steps
downstream as well, as an understanding of the data would guide decisions in **data preprocessing** and **model training**

#### Dataset Schema:

Pasting the Dataset Schema provided in the `README` file below:

========
Features
========

* trackID: unique identifier for each song (Maps features to their labels)
* title: title of the song. Type: text.
* tags: A comma-separated list of tags representing the words that appeared in the lyrics of the song and are assigned by human annotators. Type: text / categorical.
* loudness: overall loudness in dB. Type: float / continuous.
* tempo: estimated tempo in beats per minute (BPM). Type: float / continuous.
* time_signature: estimated number of beats per bar. Type: integer.
* key: key the track is in. Type: integer/ nominal. 
* mode: major or minor. Type: integer / binary.
* duration: duration of the song in seconds. Type: float / continuous.
* vect_1 ... vect_148: 148 columns containing pre-computed audio features of each song. 
	- These features were pre-extracted (NO TEMPORAL MEANING) from the 30 or 60 second snippets, and capture timbre, chroma, and mfcc aspects of the audio. \
	- Each feature takes a continuous value. Type: float / continuous.
 

=======
Labels
=======

* trackID: unique id for each song (Maps features to their labels)
* genre: the genre label
	1. Soul and Reggae
	2. Pop
	3. Punk
	4. Jazz and Blues
	5. Dance and Electronica
	6. Folk
	7. Classic Pop and Rock
	8. Metal


These details are very helpful, as it helps us to understand the features that we are
working with, especially in terms of data types, and how each feature might relate to
the label.

Here's a quick runthrough of the features and some early impressions:

* `trackID`: Mainly to join the features to their labels, no meaning to the label
* `title`: Might potentially relate to the label, maybe more aggressive names might mean
  a more "aggressive" genre?
* `tags`: Lyrics might definitely relate to the genre
* `loudness`, `tempo`, `time_signature`... `duration`: The rest of the features could
  be related to the genres as well
* `vect_1` to `vect_148`: These are pre-extracted features, and from the description,
  they seem to provide a lot of finer details in the songs. It's also good to know that
  they don't have any temporal meaning, as that would have other implications, such as
  choosing a model that can handle sequential data.

### 1a. Quick observation of the data:

In [3]:
df_inputs.head()

Unnamed: 0,trackID,title,tags,loudness,tempo,time_signature,key,mode,duration,vect_1,...,vect_139,vect_140,vect_141,vect_142,vect_143,vect_144,vect_145,vect_146,vect_147,vect_148
0,6654,Beside the Yellow Line,"i, the, to, and, a, me, it, not, in, my, is, o...",-8.539,104.341,3.0,7.0,1.0,298.73587,44.462048,...,0.000308,0.000302,0.000302,0.000315,0.000297,0.000305,0.000266,0.000225,0.130826,1.071914
1,5883,Ooh Na Na,"i, you, to, and, a, me, it, not, in, my, is, y...",-4.326,141.969,3.0,6.0,0.0,236.09424,46.069761,...,0.001751,0.001855,0.00192,0.00195,0.001937,0.001912,0.001836,0.00174,0.148765,0.882304
2,3424,Calabria 2008,"i, the, you, to, and, a, me, it, not, in, of, ...",-9.637,126.003,4.0,10.0,0.0,412.94322,40.376622,...,0.000951,0.001039,0.001116,0.001166,0.001159,0.00111,0.001015,0.000895,0.116206,0.306846
3,5434,Verbal Abuse (Just an American Band),"i, you, to, and, a, me, it, not, my, is, your,...",-10.969,197.625,4.0,2.0,1.0,64.78322,45.598532,...,0.000233,0.000284,0.000313,0.000325,0.000324,0.000299,0.000273,0.000236,0.163738,1.247803
4,516,Helen Of Troy,"i, the, to, a, me, it, not, in, is, your, we, ...",-5.369,170.008,4.0,0.0,1.0,191.97342,47.159148,...,0.000853,0.000927,0.000994,0.001037,0.001051,0.001011,0.000962,0.000898,0.108193,0.366419


In [4]:
df_labels.head()

Unnamed: 0,trackID,genre
0,8424,metal
1,7923,folk
2,2314,folk
3,810,jazz and blues
4,439,folk


Before any EDA, check the total number of entries and if there are any duplicates or missing values:

In [5]:
len(df_inputs)

8128

In [6]:
df_inputs.duplicated().sum()

0

Great! No duplicates. Now for missing values in each column:

In [7]:
missing_values = df_inputs.isnull().sum()
missing_columns = missing_values[missing_values > 0].sort_values(ascending=False)
print(missing_columns)

key         15
vect_12     13
vect_4      12
vect_6      12
tags        12
            ..
vect_84      1
vect_118     1
vect_120     1
vect_121     1
vect_73      1
Length: 115, dtype: int64


The good thing is there isn't any one column that has many rows missing values. Now I'll
check how many rows will be dropped if I drop all rows with missing values.

In [8]:
rows_with_missing_values = df_inputs.isnull().any(axis=1)
print(f'Number of rows with missing values: {rows_with_missing_values.sum()}')

Number of rows with missing values: 404


Dropping all 404 of the rows would give us 7724 rows, which is still a decent amount,
even though we are losing around 5% of the rows. The tradeoff of perfoming imputation
like mean, median or even KNN imputation and potentially introducing noise / biases into
the dataset is not worth it in my opinion, given the amount of rows we can still work with.

In [9]:
df_inputs = df_inputs.dropna()
len(df_inputs)

7724

Checking the number of columns we have:

In [10]:
print(f"Number of columns: {df_inputs.shape[1]}")

Number of columns: 157


157 is quite a lot of columns, and excluding the trackID, we still have 156 so the
dimensionality of our problem might be quite high. This is before we even do a
one-hot-encoding on the `key` column which I was intending to. We'll have to keep the
dimensionality in mind as we move along.

Now, looking at some descriptive statisitics about the inputs:

In [11]:
df_inputs.describe()

Unnamed: 0,trackID,loudness,tempo,time_signature,key,mode,duration,vect_1,vect_2,vect_3,...,vect_139,vect_140,vect_141,vect_142,vect_143,vect_144,vect_145,vect_146,vect_147,vect_148
count,7724.0,7724.0,7724.0,7724.0,7724.0,7724.0,7724.0,7724.0,7724.0,7724.0,...,7724.0,7724.0,7724.0,7724.0,7724.0,7724.0,7724.0,7724.0,7724.0,7724.0
mean,4275.560979,-9.533505,125.778853,3.576644,5.240031,0.685785,238.656432,43.675231,3.802539,8.809924,...,0.000715,0.000763,0.000789,0.000813,0.000809,0.000778,0.000743,0.000694,0.194231,5.254273
std,2480.518111,4.390825,34.716748,1.194295,3.594905,0.464233,88.938726,5.647718,48.420691,29.641771,...,0.000649,0.000684,0.000711,0.000719,0.000714,0.000707,0.000682,0.000645,0.086668,42.457995
min,0.0,-35.726,0.0,0.0,0.0,0.0,5.27628,17.606993,-289.862566,-140.558193,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2113.75,-12.20525,99.88525,3.0,2.0,0.0,186.33098,40.038735,-26.435137,-8.403524,...,0.000278,0.000298,0.000304,0.000319,0.000317,0.000299,0.000292,0.000273,0.129483,0.730204
50%,4278.5,-8.755,121.7895,4.0,5.0,1.0,227.826485,44.159327,8.532998,10.07459,...,0.000556,0.000592,0.000609,0.000632,0.00063,0.000596,0.000566,0.000528,0.17996,1.590191
75%,6422.25,-6.122,147.018,4.0,9.0,1.0,275.41506,48.0044,38.050538,27.163832,...,0.000915,0.000983,0.001019,0.001051,0.00105,0.000998,0.000943,0.000869,0.245426,3.756182
max,8555.0,-0.414,253.036,7.0,11.0,1.0,1271.71873,55.564543,150.885303,157.48321,...,0.006545,0.006613,0.006698,0.006682,0.006645,0.006777,0.00677,0.006632,0.767182,3193.622527


#### Some findings from the descriptive statistics:

A simple observation is that the values are not normalised, and the range of the values
can be quite wide as well as their distribution, like in `vect_2`, where the minimum is
-289.9, but the mean is only 3.8, and the max is 150.9. We'll definitely have to scale
the data if we are working with a deep learning model.

### Part 1a. Findings:

Some important findings from this part are:
1. No duplicates in data
2. 404 rows with missing values
3. Total of 157 columns before data preprocessing
4. Vectors are not normalised

### 1b. Deeper Dive Into EDA

This section will aim to get more insights on specific columns to decide the best course
of action for the data preprocessing.

Before looking at the inputs, we should find out the distribution of the genres 

In [12]:
print(f'Number of genres: {len(df_labels["genre"].value_counts())}')

print(f"Value Counts: {df_labels['genre'].value_counts()}")


Number of genres: 8
Value Counts: genre
classic pop and rock     1684
folk                     1665
metal                    1209
soul and reggae           988
punk                      981
pop                       731
dance and electronica     523
jazz and blues            347
Name: count, dtype: int64


Alright, we have 8 classes, and an imbalanced amount of samples for each class. We can
address this by doing a stratified train test split when it comes to it. Next, let's
look at the input data.

First, let's start off by looking at the `tags` column

#### `tags` column

In [13]:
df_inputs[['trackID','tags']]

Unnamed: 0,trackID,tags
0,6654,"i, the, to, and, a, me, it, not, in, my, is, o..."
2,3424,"i, the, you, to, and, a, me, it, not, in, of, ..."
3,5434,"i, you, to, and, a, me, it, not, my, is, your,..."
4,516,"i, the, to, a, me, it, not, in, is, your, we, ..."
5,4906,"i, the, you, to, and, a, me, it, not, in, is, ..."
...,...,...
8123,1802,"i, the, you, to, and, a, me, it, not, in, is, ..."
8124,3397,"i, the, you, to, and, a, me, it, not, in, my, ..."
8125,1760,"i, the, you, me, it, not, in, my, is, your, do..."
8126,2114,"i, the, you, and, it, in, my, is, of, your, th..."


Right off the bat, it looks like the tags contain many **stop words** (words commonly used
in a language that have little value for tasks like text classification). Before
carrying on, I'll remove them so I can make a better analysis on the `tags` column.

To do this, I'll utilise the Natural Language Toolkit library `nltk`, which has a list of
common English stop words, to identify the stop words and remove them. Take note that
the values are comma separated, and we can leverage that to accurately check each
individual word.

I'll define a function `remove_stop_words()` that takes in the string, splits them by
commas and removes the stop words, and rejoins them with commas in between each word. 

In [14]:
nltk.download('stopwords')
stop_words = set(nltk.corpus.stopwords.words('english'))

def remove_stop_words(text):
    words = text.split(',')
    filtered_words = [word.strip() for word in words if word.strip().lower() not in stop_words]
    return ", ".join(filtered_words)


df_inputs["tags"] = df_inputs["tags"].apply(remove_stop_words)

[nltk_data] Downloading package stopwords to /Users/danny/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Now that the stop words are removed, we can take a look at the remaining words to see
what we're working with.

In [15]:
df_inputs['tags'][0]

'know, time, way, take, yeah, back, heart, caus, man, mind, wo, still, long, girl, hand, call, leav, face, gone, alon, chang, god, stand, friend, hard, gotta, star, year, fear, black, line, onc, came, understand, goe, road, land, voic, guess, meet, bout, doubt, track, glad, ago, five, path, besid, float, yellow, unknown, except, pierc, railroad'

Just looking at the first entry in the dataframe, we can see that the words have been
stemmed, ("cause" has been changed to "caus", "alone" or "along" to "alon").

Stemming is a form of text preprocessing that simplifies the vocabulary of the input
text by dropping suffixes. However, this comes with the risk that some words might be
stemmed incorrectly, reducing the information from the text. A good example would be
how I mentioned before that "alon" could be either "alone" or "along", which have two
very different meanings, but because it was already stemmed, we can't say for sure.

An alternative to stemming would be lemmatisation, which takes into account the context
of the word and uses a dictionary to reduce words to their base forms, instead of just
cutting the suffixes out. However, this is more computationally expensive.

Since our text is already stemmed, I'll leave it as it is.


#### Text Preprocessing Approach:

Given that we have text data in our `tags` column, and that this text is meant to
represent the lyrics that appeared in the song, the context of the text is quite
important. To capture the semantics of the text better, I am going to opt to use word
embeddings to preprocess the text data to be fed into the model over something like
TF-IDF (Term Frequency - Inverse Document Frequency).

Using this approach will also aid in lowering dimensionality of the text data, which
would be a great benefit.

Note that we will also have to perform a similar preprocessing for the `title` column as well.

This will be implemented in a data preprocessing script. For now, let's move on to the
`time_signature` column

#### `time_signature`

This feature is described as `estimated number of beats per bar`, which would imply
discrete values. Let's confirm that by checking the value counts in the column

In [16]:
df_inputs["time_signature"].value_counts()

time_signature
4.0    5152
1.0    1052
3.0     976
5.0     371
7.0     169
0.0       4
Name: count, dtype: int64

Interestingly, we have songs with 0 beats in a bar, strange. But music, being a creative
output, doesn't necessarily follow any rules. 

One thing to note is that we have 6 categories of time signatures, and they are
contained in 1 column. Since this isn't an Ordinal Feature where a higher number or a
lower number necessarily means its better or worse, we should convert this feature into
a one-hot-encoded feature to avoid letting the eventual model learn an incorrect
relationship between `time_signature` and the genre. We can achieve this by simply
running a `pd.get_dummies()` on the column

In [17]:
df_inputs = pd.get_dummies(df_inputs, columns=['time_signature'], prefix='time_signature')
df_inputs.head()

Unnamed: 0,trackID,title,tags,loudness,tempo,key,mode,duration,vect_1,vect_2,...,vect_145,vect_146,vect_147,vect_148,time_signature_0.0,time_signature_1.0,time_signature_3.0,time_signature_4.0,time_signature_5.0,time_signature_7.0
0,6654,Beside the Yellow Line,"know, time, way, take, yeah, back, heart, caus...",-8.539,104.341,7.0,1.0,298.73587,44.462048,-13.499814,...,0.000266,0.000225,0.130826,1.071914,False,False,True,False,False,False
2,3424,Calabria 2008,"know, like, come, go, one, never, make, say, n...",-9.637,126.003,10.0,0.0,412.94322,40.376622,27.546188,...,0.001015,0.000895,0.116206,0.306846,False,False,False,True,False,False
3,5434,Verbal Abuse (Just an American Band),"time, come, go, one, see, want, day, away, nee...",-10.969,197.625,2.0,1.0,64.78322,45.598532,10.249509,...,0.000273,0.000236,0.163738,1.247803,False,False,False,True,False,False
4,516,Helen Of Troy,"go, see, back, wo, face, end, place, far, onc,...",-5.369,170.008,0.0,1.0,191.97342,47.159148,-52.070941,...,0.000962,0.000898,0.108193,0.366419,False,False,False,True,False,False
5,4906,Only Him Or Me - Original,"know, like, time, come, go, see, got, never, f...",-16.516,142.254,4.0,1.0,146.75546,36.712606,-1.896533,...,0.000107,0.000114,0.131246,0.693531,False,False,False,False,True,False


#### `key`

This feature is pretty similar in nature as `time_signature`, as it seems to be
nominally encoded in a single column. Let's confirm this by checking the value counts again.

In [18]:
df_inputs["key"].value_counts()

key
0.0     1008
7.0      934
2.0      913
9.0      908
4.0      667
11.0     602
1.0      589
5.0      547
10.0     470
6.0      442
8.0      408
3.0      236
Name: count, dtype: int64

Surely enough, it was similarly encoded. We can simply run a `pd.get_dummies()` as well
to convert this feature into a one hot encoded one.

In [19]:
df_inputs = pd.get_dummies(df_inputs, columns=['key'], prefix='key')
df_inputs.head()

Unnamed: 0,trackID,title,tags,loudness,tempo,mode,duration,vect_1,vect_2,vect_3,...,key_2.0,key_3.0,key_4.0,key_5.0,key_6.0,key_7.0,key_8.0,key_9.0,key_10.0,key_11.0
0,6654,Beside the Yellow Line,"know, time, way, take, yeah, back, heart, caus...",-8.539,104.341,1.0,298.73587,44.462048,-13.499814,26.257028,...,False,False,False,False,False,True,False,False,False,False
2,3424,Calabria 2008,"know, like, come, go, one, never, make, say, n...",-9.637,126.003,0.0,412.94322,40.376622,27.546188,-24.231226,...,False,False,False,False,False,False,False,False,True,False
3,5434,Verbal Abuse (Just an American Band),"time, come, go, one, see, want, day, away, nee...",-10.969,197.625,1.0,64.78322,45.598532,10.249509,52.666606,...,True,False,False,False,False,False,False,False,False,False
4,516,Helen Of Troy,"go, see, back, wo, face, end, place, far, onc,...",-5.369,170.008,1.0,191.97342,47.159148,-52.070941,18.232822,...,False,False,False,False,False,False,False,False,False,False
5,4906,Only Him Or Me - Original,"know, like, time, come, go, see, got, never, f...",-16.516,142.254,1.0,146.75546,36.712606,-1.896533,3.454569,...,False,False,True,False,False,False,False,False,False,False


#### Some Thoughts on `vect_n` features:

As mentioned before, we do have 148 columns of vectors representing audio features
extracted from each song. While this is undoubtedly useful, since it is basically
feature engineering done from audio to vectors, this does increase our dimensionality by
a lot. Given more time, I would probably look at the correlation between each `vect`
feature, and observe which `vect`s are highly correlated to each other, and perhaps do
some sort of feature engineering or feature selection to combine the effects of those
`vect`s to reduce the dimensionality. Already with the text embeddings for `title` and
`tags`, we might be working with over 300, close to 400 input sizes to the model. But
for now, I'll leave it.

## Part 2: Data Cleaning and Preprocessing

This involves writing scripts to perform the cleaning and preprocessing tasks identified
in the EDA on the dataset.

As the code is all contained within the scripts `data_preparation.py`,
`data_preprocess.py` and `torch_datasets.py` within
`src/genrelabeller/data_preprocessing`, I will not go through the code, but I'll just
list the steps taken:

<ins>Data Preparation</ins>
1. Takes in Raw Data
2. Rows with null values are dropped
3. Stop words are removed from the `title` and `tags` columns
4. Word embeddings are retrieved for `title` and `tags`, using **Word2Vec**
5. `time_sig` and `key` columns are one-hot-encoded
6. Outputs a single pandas DataFrame of cleaned data 

<ins>Data Preprocess</ins>
1. Takes in cleaned DataFrame from Data Preparation Output
2. Train Test Split the dataset (Stratify on classes)
3. One-hot-encode labels
4. Separate the embeddings to adjust the shape later
5. Train a Standard Scaler on the Train set, and scale both the Train and Validation Sets
6. Drop Unique Identifiers (make sure the indexes for inputs and labels line up)
7. Combine the features into a row-wise numpy array
8. Convert the data into PyTorch Tensors and into PyTorch datasets and dataloaders
9. Outputs PyTorch Dataloaders, one for Training Set and one for Validation Set

## Part 3: Training the Model

After the EDA, it was apparent that the data we're working with for this project is
complex, given the dimensionality, as well as the fact that we're working with natural
language text. 

To capture the semantics of the text data, I chose to use `Word2Vec` to convert the
tokenised words into embeddings of 100 elements each vector. This increases the
dimensionality of the problem by a lot, which classical models tend to struggle with. 

In addition, classical models like a Random Forest, which are robust
against high dimensionality, treat each feature independently, which does not leverage
the relationships or semantic meanings captured in the text embeddings. 

Therefore, a simple Feed Forward Neural Network (Multi Layer Perceptron) was chosen to
be the model to tackle this project. Neural Networks are able to learn complex
relationships between the features and can not only learn the relationships in the
embeddings, but also with other features like the key or the time signature.

#### Model Architecture
![neuralnet archi](imgs/model_architecture.png)

After a few runs of the training pipeline to tune the hyper parameters (MLflow was not
used due to the lack of time to implement), I settled on the following parameters:
* Inputs: 370
* Hidden Neurons: 8
* Output Neurons: 8
* Hidden Layers: 1
* Epochs: 200
* Learning Rate: 0.001
* Loss: CrossEntropy()

This is actually a very small simple Neural Network. I started out with a model with
more hidden neurosn and hidden layers, but found that the train and validation loss
plots showed very severe overfitting. Therefore I decided to prune the model down to the
current small architecture. **Train and Validation Losses plotted below:**

![loss plots](artifacts/model/losses.png)

From observation, it is obvious that we have an overfitted model as the validation loss
diverges from the training loss very early, around epoch 10, and starts plateauing out,
while the training loss continues to dip. 

There is a possibility that the train test
split being stratified on the classes was not enough to make the validation set
representative of the training set, and that the model is learning features that are in
the training set that are absent in the validation set.

There is also a possibility that the model is just plain overfitting to the data,
learning noise in the training set and not being able to generalise well to the
validation set.

If time permits, we can perform error analysis on the model to gain insights as to why
this is happening, and try to curb it and improve model performance

#### Model Evaluation
Scikit-learn Classification Report:

![eval metrics](imgs/model_eval.png)

Given that this is an imbalanced multi-class classification problem, I have chosen to use Macro
Averaged F1 Score as the evaluation metric to focus on.

The macro averaged F1-Score is a way to represent the model's performance classifying
across all classes as it is the average of the F1-Scores of each class. This means that
each class contributes equally to the final score, regardless of how many samples there
are in each class. y giving equal weight to each class, the macro-averaged F1-score
provides a more balanced view of model performance, particularly when some classes are
underrepresented. Such as with **"Dance and Electronica"** and **"Jazz and Blues"**.

From the classification report given using Scikit-Learn's method, it is apparent that
the model performs very poorly on songs belonging to the classes with a lower number of
samples, like **"Dance and Electronica"** and **"Jazz and Blues"**, having a class
F1-Score of 0.39 and 0.27 respectively. On the other hand, the model does well on the
songs that belong to classes with a higher representation like **"Pop"**, **"Metal"**
and **"Punk"**. This is quite a typical finding, as it is probably because the model
was not trained on enough samples that were part of the lower represented classes to
learn how to classify them well.

Interestingly, the model doesn't perform very well on the highest represented classes as
well like **"Classic Pop and Rock"** and **"Folk"**. Just through pure speculation, it
is possible that the samples from these genres might be a little similar to other genres
and maybe that's why the model is not classifying these well. For example, songs in
**"Classic Pop and Rock"** might be similar to those in **"Pop"**. However, this is just
speculation and further error analysis would have to be done to confirm this theory.

#### Trained Model Saved in `artifacts/model`, Inference Output `prediction.csv` saved in `data`

Model training was run and the trained model weights are saved in `artifacts/model`,
along with the loss plot.

## Part 4: Deploying the Model Using FastAPI

The app is designed to carry out a full inference pipeline given an input `.csv` file
when the `/predict/` end point is called. Additionally, a `/genre/` and a
`/titles/{genre}` endpoint are available which returns the unique genres in the test
file and the titles in predicted to be in each genre respectively.

Prior to writing the `app.py` script, which is run when starting the FastAPI server, I
also wrote an `inference_pipeline.py` to get a sense of the needs and the orchestration
of the inference pipeline. Since this part is mostly running the docker container and
interacting with the API, I will just go through reading the SQLite database that is the output
of the `/predict/`.

In [20]:
import sqlite3
import pandas as pd

conn = sqlite3.connect('artifacts/inference_results.db')
query = "SELECT * FROM inference_results"
df = pd.read_sql_query(query, conn)
df

Unnamed: 0,id,trackID,title,predicted_genre
0,1,6732,You Get What You Give,Classic Pop and Rock
1,2,5415,Greedee,Metal
2,3,7757,Wonderful World,Dance and Electronica
3,4,1854,Michoacan,Classic Pop and Rock
4,5,4942,HUSTLER,Soul and Reggae
...,...,...,...,...
423,424,186,Hablame De Frente,Folk
424,425,4758,Jody And The Kid,Classic Pop and Rock
425,426,2231,Tama,Dance and Electronica
426,427,2925,Billy Dee,Jazz and Blues


Aside from this, I have also run an inference pipeline run and outputted a
`predictions.csv` in the `data` folder, with only the `trackID` and `genre`.