# COGS 118A - Group 010 - Final Project

# Insert title here

## Group members

- David Soberanis
- Ernest Lin
- Felipe Lorenzi
- Shushruth Kallutla
- John (Morgan) Harrison

# Abstract 
This section should be short and clearly stated. It should be a single paragraph <200 words.  It should summarize: 
- what your goal/problem is
- what the data used represents 
- the solution/what you did
- major results you came up with (mention how results are measured) 

__NB:__ this final project form is much more report-like than the proposal and the checkpoint. Think in terms of writing a paper with bits of code in the middle to make the plots/tables

# Background

Musical genres are often umbrella terms which group songs with very distinct styles. However, some features, such as the rhythm, construction of the drum beat, instrumentation, presence of vocals, and others, can be useful for correctly classifying the genre of a song <a name="biss"></a>[<sup>[1]</sup>](#bissnote). According to some, the classification of genres is often socially-driven, rather than based on the features of the songs themselves, placing songs into genres with the intention of targeting specific groups of listeners and making profit <a name="tagg"></a>[<sup>[2]</sup>](#taggnote) <a name="greenberg"></a>[<sup>[3]</sup>](#greenbergnote).

However, relatively recent research uncovered that songs often cluster into three distinct categories: "“Arousal” (the energy level of the music); “Valence” (the spectrum from sad to happy emotions in the music); and “Depth” (the amount of sophistication and emotional depth in the music)" <a name="greenberg"></a>[<sup>[3]</sup>](#greenbergnote).

It would be interesting to understand if other musical features could be useful for classifying songs into genres. This could uncover new rule-sets for music genre recognition by analyzing which features are most associated with which genres. The features present in the Spotify API appear to be promising for this task as they include the features aforementioned of valence, arousal and depth and more. In addition, the Spotify API provides more low-level musical features which can be extracted from the audio signal of a song, and many examples can be found online of people classifying songs into genres based only on these low-level features, with some success <a name="venturott"></a>[<sup>[4]</sup>](#venturottnote) <a name="elbir"></a>[<sup>[5]</sup>](#elbirnote).

Models for genre recognition could be useful for providing features to music recommendation systems, such as Spotify's itself. Furthermore, it is no news that knowing the genre of a song is useful for listeners to find songs they like, however having an automatic approach which only takes into account the actual musical features of a song could make the process faster and more fruitful to users.

# Problem Statement

Our project aims to use supervised machine learning techniques to classify songs on Spotify into distinct Generes. By doing so, we hope to explore the underlying features that characterize each genere and the criterias that differentiate various generes.

# Data

We downloaded our dataset from the following source: https://www.kaggle.com/datasets/grasslover/spotify-music-genre-list?resource=download&select=songDb.tsv

This is a Spotify music genre list with 131,580 rows and 20 columns. Each row contains a song, while the columns contain song features from Spotify's API. These features include: Danceability, Energy, Key, Loudness, Mode, Speedchness, Acousticness, Instrumentalness, Liveness, Valence, Tempo, Time Signature, and Duration. The rest of the columns contain information regarding the location of the song in Spotify.

Our initial dataset contained 2800 unique genres to start. This is an issue, as many of these genres were too specific. For example: 'Minecraft music'. We considered grouping these genres into more general categories, but this led to malgrouped data and mediocre results. We believe this is due to the subjective nature of music genres.

We kept only the songs belonging to the top 5 genres (by number of songs).

After this, our working dataset is composed of:

- 5,750 songs<br><br>
- 13 features: Danceability, Energy, Key, Loudness, Mode, Speechness, Acousticness, Instrumentalness, Liveness, Valence, Tempo, Duration_ms, time_signature<br><br>
- 5 target groups: Alternative, Electrolatino, Doo-Wop, Reading, and Nuelectro

# Proposed Solution

We will investigate different models for classifying the songs into the appropriate genre based on its features. 

Since we assume that song features of a genre are similar, we determined that using k-nearest neighbors could be a suitable solution to this problem. The features of a song (valence, danceability, etc) will be used to calculate the distance of a song to other labeled songs within a specific proximity to determine what genre a song most resembles out of all the labeled data that were given. 

We also intend to try logistic regression, support vector machines and neural networks and evaluate which scores highest given our metrics.

We will use linear regression as a benchmark model to compare to our solution.

# Evaluation Metrics

- We will use cross validation for selecting the most appropriate kind of model, as well as for hyperparameter tuning within different model kinds. For comparing models and hyperparameters, we intend to use the F-1 score. This test combines the precision and recall of a classifier into a single metric by using their harmonic mean. This will allow us to compare if classifier A has better recall or precision when compared to B then we can further test to see if precision is more important than recall or vice versa. 

- F1 score formula

\begin{align}
        F1 = \frac{2 \ }{ \frac{1}{Recall} + \frac{1}{Precision}}
    \end{align}

- Precision: Number of correct positives divided by the number of total positive results predicted by the classifier

\begin{align}
        Precision = \frac{TruePositive \ }{True Positive + False Positive }
    \end{align}
    
- Recall: Number of correct positives divided by the number of all samples that should have been identified as positive 

\begin{align}
        Recall = \frac{TruePositive \ }{True Positive + False Negative }
    \end{align}

- For classification accuracy, we will use a confusion matrix. A confusion matrix is used to evaluate the accuracy of a multiclass classifier model<a name="confusion matrix"></a>[<sup>[10]</sup>](#confusionmatrix). Since we will be classifying songs by genres a confusion matrix will help us benchmark our classifier model. The way the confusion matrix works is one has predicted classes on the columns and actual classes on the rows. These predicted and actual classes are matched up for comparison. For example cell (1x1) could represent a true positive for the genre 'Rock' if the first row is an actual classification for 'Rock' and the first row is the predicted classification for 'Rock.' There are True Positives, True Negatives, False Positives, and False Negatives. Our data will include the genre of the songs which we will use for our (actual) classification in the matrix. After the confusion matrix is created we will calculate the accuracy of our model with the following:

\begin{align}
        Accuracy = \frac{TruePositive + TrueNegative \ }{TotalSample}
    \end{align}
- We can also test the sensitivity of our model with the following:

\begin{align}
        Sensitivity = \frac{TruePositive \ }{False Negative + True Positive}
    \end{align}

# Results

You may have done tons of work on this. Not all of it belongs here. 

Reports should have a __narrative__. Once you've looked through all your results over the quarter, decide on one main point and 2-4 secondary points you want us to understand. Include the detailed code and analysis results of those points only; you should spend more time/code/plots on your main point than the others.

If you went down any blind alleys that you later decided to not pursue, please don't abuse the TAs time by throwing in 81 lines of code and 4 plots related to something you actually abandoned.  Consider deleting things that are not important to your narrative.  If its slightly relevant to the narrative or you just want us to know you tried something, you could keep it in by summarizing the result in this report in a sentence or two, moving the actual analysis to another file in your repo, and providing us a link to that file.

### EDA

We found no interactions between the features, except for Loudness and Energy. 
None of our features had any clear association with our target variable.

### Feature selection?

Another likely section is if you are doing any feature selection through cross-validation or hand-design/validation of features/transformations of the data

### Linear regression (base model)

Probably you need to describe the base model and demonstrate its performance.  Maybe you include a learning curve to show whether you have enough data to do train/validate/test split or have to go to k-folds or LOOCV or ???

---

Include:
- Accuracy & F1 score with CIs from cross validation
- Confusion matrix

### KNN

Perhaps some exploration of the model selection (hyper-parameters) or algorithm selection task. Validation curves, plots showing the variability of perfromance across folds of the cross-validation, etc. If you're doing one, the outcome of the null hypothesis test or parsimony principle check to show how you are selecting the best model.

---

Include:
- NCA explanation
- Grid search results (ideally using plot_results function)
- Accuracy & F1 score with CIs from cross validation
- Confusion matrix

### Logit, SVM, and MLP (should we include all of these?)

Maybe you do model selection again, but using a different kind of metric than before?

---

Include (for each model):
- Grid search results (ideally using plot_results function)
- Accuracy & F1 score with CIs from cross validation
- Confusion matrix


- Explanation of how MLP was built

# Discussion

### Interpreting the result

OK, you've given us quite a bit of tech informaiton above, now its time to tell us what to pay attention to in all that.  Think clearly about your results, decide on one main point and 2-4 secondary points you want us to understand. Highlight HOW your results support those points.  You probably want 2-5 sentences per point.

---

- Explanation on why KNN might be doing the worst
    - We believe that the KNN algorithm suffered from the relatively high-dimensionality of our data. In an attempt to alleviate this issue, we used PCA to reduce our number of features. However, in order to maintain 95% of the explained variance of our features, the number of components needed was 13, which is the same as the number of features in our dataset. Using less components led to worse results.

### Limitations

Are there any problems with the work?  For instance would more data change the nature of the problem? Would it be good to explore more hyperparams than you had time for?   

### Ethics & Privacy

If your project has obvious potential concerns with ethics or data privacy discuss that here.  Almost every ML project put into production can have ethical implications if you use your imagination. Use your imagination.

Even if you can't come up with an obvious ethical concern that should be addressed, you should know that a large number of ML projects that go into producation have unintended consequences and ethical problems once in production. How will your team address these issues?

Consider a tool to help you address the potential issues such as https://deon.drivendata.org

### Conclusion

Reiterate your main point and in just a few sentences tell us how your results support it. Mention how this work would fit in the background/context of other work in this field if you can. Suggest directions for future work if you want to.

# Footnotes
<a name="bissnote"></a>1.[^](#biss): Biss, Madars. (2021) Rhythm Tips for Identifying Music Genres by Ear. *Musical U*. https://www.musical-u.com/learn/rhythm-tips-for-identifying-music-genres-by-ear/<br> 
<a name="taggnote"></a>2.[^](#tagg): Fabbri, Franco. (1980) A Theory of Musical Genres:
Two Applications. *Popular Music Perspectives*. https://www.tagg.org/xpdfs/ffabbri81a.pdf<br> 
<a name="greenbergnote"></a>3.[^](#greenberg): Greenberg, David M. (6, August 2016) Musical genres are out of date – but this new system explains why you might like both jazz and hip hop. *EconoTimes*. http://www.econotimes.com/Musical-genres-are-out-of-date-%E2%80%93-but-this-new-system-explains-why-you-might-like-both-jazz-and-hip-hop-244941<br> 
<a name="venturottnote"></a>4.[^](#venturott): Venturott, Pedro H G. (31, January 2021) Predicting Music Genres Using Waveform Features. *Towards Data Science*. https://towardsdatascience.com/predicting-music-genres-using-waveform-features-5080e788eb64<br> 
<a name="elbirnote"></a>5.[^](#elbir): Elbir, Ahmet et. al. (2018) Music Genre Classification and Recommendation by Using Machine Learning Techniques. *IEEE*. https://ieeexplore.ieee.org/document/8554016<br> 