Enhance your Playlists with Machine Learning: Spotify Automatic Playlist Continuation

This is the repository of the group project Enhance your Playlists with Machine Learning: Spotify Automatic Playlist Continuation. The four articles in the series is linked below:

Part I: Extracting song data from Spotify’s API in Python

Part II: EDA and Clustering

Part III: Building a Song Recommendation System with Spotify

Part IV: Deploying a Spotify Recommendation Model with Flask

The code for all four articles is in this repository.

Introduction

The goal of this project is to recommend songs for a given playlist. This project starts from data collection all the way to model deployment to ensure you have a working model to showcase.

How to use

To clone the repository:

git clone https://github.com/enjuichang/PracticalDataScience-ENCA.git

Process

The following image is the flow chart of the project:

Data extraction

Here are a couple of things you should know before starting the project.

Spotfiy API Acquisition

If you haven’t used an API before, the use of various keys for authentication, and the sending of requests can prove to be a bit daunting. The first thing we’ll look at is getting keys to use. For this, we need a [Spotify for developers] (https://developer.spotify.com/) account. This is the same as a Spotify account, and doesn’t require Spotify Premium. From here, go to the dashboard and “create an app”. Now, we can access a public and private key, needed to use the API.

Spotify Credentials Storage and Access

Now that we have an app, we can get a client ID and a client secret for this app. Both of these will be required to authenticate with the Spotify web API for our application, and can be thought of as a kind of username and password for the application. It is best practice not to share either of these, but especially don’t share the client secret key. To prevent this, we can keep it in a separate file, which, if you’re using Git for version control, should be Gitignored.

Spotify credentials should be stored the in the a secret.txt file with the first line as the credential id and the second line as the secret key:

To access this credentials, please use the following code:

with open("secret.txt") as f:
    secret_ls = f.readlines()
    cid = secret_ls[0][:-2]
    secret = secret_ls[1]

EDA and clustering

Recommendation Model

The recommendation model is summarized in the content_based_recsys.ipynb notebook. In this section, we will go through the process of building a content-based filtering recommendation. The following parts will be covered:

Package Setup
Preprocessing
Feature Generation
Content-based Filtering Recommendation

Please follow the instruction in the notebook to produce the result.

Deployment

In order to access the final version of the app, please visit the following link: nazaryaremko1.pythonanywhere.com A demo version of the website can be accessed and tested out there. Due to the limitations of file sizes that can be uploaded to pythonanywhere, it the model there is trained only on a subset of the data. To test the full functionality of the model, please, download the repository data, cd into the folder and run the following commands:

cd recommendation_app
python wsgi.py

Then visit the local host and try out the model using any playlist!

To create a virtual environment, you can run the following commands:

python3 -m venv venv
source venv/bin/activate (or venv\Scripts\activate if you are using Windows)

Installing dependencies in virtual environment:

pip3 install -r requirements.txt

Repo Structure

│
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── raw            <- The original, immutable data dump.
│   ├── processed      <- The preprocessed data sets for training.
│   ├── test           <- The test data sets for testing.
│   └── final          <- The final data sets for modeling.
│
├── models             <- Trained models, model predictions, or model summaries.
│
├── notebooks          <- Serialized Jupyter notebooks created in the project.
│   ├── script         <- Script for data extraction and loading data
│   ├── Extraction     <- Data extraction using Spotify API
│   ├── EDA            <- Exploratory data analysis process.
│   └── Recsys         <- The training of traditional statistical models.
│
├── recommendation_app <- Model deployment folder
│   ├── application    <- Code for model deployment and website design
│   ├── data1          <- Pretrained data for model
│   └── venv           <- Environment
│
└── requirements.txt   <- The requirements file for reproducing the analysis environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

models

models

notebooks

notebooks

recommendation_app

recommendation_app

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Enhance your Playlists with Machine Learning: Spotify Automatic Playlist Continuation

Introduction

How to use

Process

Data extraction

Spotfiy API Acquisition

Spotify Credentials Storage and Access

EDA and clustering

Recommendation Model

Deployment

Repo Structure

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
data		data
models		models
notebooks		notebooks
recommendation_app		recommendation_app
.gitignore		.gitignore
README.md		README.md

enjuichang/PracticalDataScience-ENCA

Folders and files

Latest commit

History

Repository files navigation

Enhance your Playlists with Machine Learning: Spotify Automatic Playlist Continuation

Introduction

How to use

Process

Data extraction

Spotfiy API Acquisition

Spotify Credentials Storage and Access

EDA and clustering

Recommendation Model

Deployment

Repo Structure

About

Topics

Resources

Stars

Watchers

Forks

Languages