Song genre predictor based on Spotify data

Song genre predictor based on Spotify data

The goal of this project is to create a classifier and see how accurately it can predict song genres. Taking a dataset from Spotify [Pandya, 2022], which is al- ready using machine learning algorithms for these purposes, can help assess if the resulting model can be considered apt for a large-scale business or is more appropriate for a smaller audio streaming market player.

Used Machine Learning Methods

SVM
Decision Tree
Gaussian Naive Bayes
K-nn
MLP
Multinomial Naive Bayes
Nearest Centroids
Random Forest
XGBoost

Structure

The contents of the repository are the following:

Folders

data/ → datasets used for this project
- spotify_data: the original Spotify Tracks Dataset
- spotify_clean: dataset without only one genre assigned to each song (generated by using the data-cleaning notebook)
- spotify_simplified: dataset with only 18 unique genres in total (generated by using the clustering notebook)
- data_report: exploratory data analysis for the original dataset
figures/ → figures generated for the presentation and report (generated using the plots notebook)
ml_methods/ → notebooks with different machine learning algorithms explored for the project

Notebooks

baseline → implement the majority and rule-based baselines
clustering → reduce the number of genres in the dataset to only 18 via a combination of agglomerative clustering and manual input
data-cleaning → choose only one genre for every song in the dataset that appeared with multiple genres
data-exploration → visualize the features of the dataset and propose preprocessing steps
hyperparemter-optimization → hyperparameter optimization implemented using GridSearchCV
plots → generate plots for the report and presentation

Setup

Activate your virtual environment
Run the following command to install all the dependencies needed for this project:

pip install -r requirements.txt

Inspect the code for the different algorithms that were explored (stored under ml_methods/)

Submission details

Team 1

Elizaveta Nosova (1983805)
Miguel Samaniego (1980439)
Nico Sharei (1986818)
Julian Ament (1981511)
Artem Bisliouk (1978986)
Jannik Kranz (1981766)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
figures		figures
ml_methods		ml_methods
LICENSE		LICENSE
README.md		README.md
Song_Genre_Predictor_Presentation.pptx		Song_Genre_Predictor_Presentation.pptx
Song_Genre_Predictor_Report.pdf		Song_Genre_Predictor_Report.pdf
baseline.ipynb		baseline.ipynb
clustering.ipynb		clustering.ipynb
data-cleaning.ipynb		data-cleaning.ipynb
data-exploration.ipynb		data-exploration.ipynb
hyperparam-optim.ipynb		hyperparam-optim.ipynb
plots.ipynb		plots.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Song genre predictor based on Spotify data

Used Machine Learning Methods

Structure

Folders

Notebooks

Setup

Submission details

About

Releases

Packages

Languages

License

abisliouk/IE500-data-mining

Folders and files

Latest commit

History

Repository files navigation

Song genre predictor based on Spotify data

Used Machine Learning Methods

Structure

Folders

Notebooks

Setup

Submission details

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages