---
title: "Custom sklearn transformer"
date: 2019-12-28  
disqus: false
draft: true
---

Scikit-learn comes with a handful of useful built-in data transformation functions that allow you to fill in missing values, scale numerical data, etc. Eventually, however, you will want manipulate your data in a way that is not supported by the built-in offerings. Fortunately, it's not too hard to construct your own transformer that can easily be integrated with the greater sklearn workflow.

For this exercise, I'm going to be using data about my Spotify Discover Weekly playlists. If you don't know about [Discover Weekly](https://hackernoon.com/spotifys-discover-weekly-how-machine-learning-finds-your-new-music-19a41ab76efe) (who are you?!), you'll probably want to acquaint yourself. I've been collecting data on my weekly playlists since fall of 2019 with vague plans to turn it into some sort of project. The [spotipy](https://spotipy.readthedocs.io/en/latest/) library makes it easy to access all of the amazing data that Spotify makes available through its API.

In [5]:
import pandas as pd
import pathlib

project_dir = pathlib.Path().cwd().parent
df = pd.read_pickle(project_dir / 'data/raw/dw_combined.pkl')

cols = ['song_length_ms', 'key', 'mode', 'time_signature', 'instrumentalness', 'liveness', 'loudness', 
        'speechiness', 'valence', 'acousticness', 'tempo', 'danceability', 'energy', 'popularity']

df = df[cols]
df.head()

Unnamed: 0,song_length_ms,instrumentalness,liveness,loudness,speechiness,valence,acousticness,tempo,danceability,energy,popularity
0,217131,0.0205,0.323,-13.417,0.0555,0.501,0.976,151.858,0.553,0.281,35
1,255800,0.0309,0.142,-12.015,0.0306,0.218,0.938,125.58,0.421,0.374,38
2,188000,0.0158,0.258,-14.418,0.05,0.296,0.791,127.38,0.512,0.205,28
3,448349,0.121,0.946,-11.329,0.0637,0.268,0.902,146.544,0.286,0.506,36
4,291080,0.0062,0.0878,-12.572,0.0316,0.334,0.849,80.059,0.491,0.262,37


The above code loads the Discover Weekly data and selects the columns that we will use to build our custom transformer.

Let's say we think that it