# **Hands-On Workshop**
#### **Description**: For this workshop, we'll work together to explore a dataset being used in a Linear Regression Model to predict a given song's popularity.
##### **Quick Note:** Linear Regression models only work with continuous or ordinal variables. That is the variable either must be numeric and within a specific range OR categorical with a defined hierarchy (e.g., rating systems that include 1-5 stars, temperature categories saying "Hot", "Warm", or "Cold"). Thus you'll need to EXCLUDE any discrete variables that aren't ordinal.

In [2]:
import pandas as pd
from pandasql import sqldf
import plotly.express as px

In [6]:
training_data = pd.read_parquet("clean_songs_dataset.parquet")
training_data

Unnamed: 0,track_name,artist1,artist2,artist3,artist4,artist5,album_name,release_date,danceability,energy,track_popularity,acousticness,valence,tempo
0,Sk8er Boi,Avril Lavigne,,,,,Let Go,2002-06-04,0.487,0.900,73.0,0.000068,0.484,149.937
1,Paparazzi,Lady Gaga,,,,,The Fame,2008-01-01,0.762,0.692,70.0,0.113000,0.397,114.906
2,Sorry,Justin Bieber,,,,,Purpose (Deluxe),2015-11-13,0.654,0.760,78.0,0.079700,0.410,99.945
3,S&M,Rihanna,,,,,Loud,2010-11-16,0.767,0.682,70.0,0.011300,0.833,127.975
4,Shake It Off,Taylor Swift,,,,,1989 (Deluxe),2014-01-01,0.647,0.800,78.0,0.064700,0.942,160.078
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1233814,Wrong Move,Olivia Holt,R3HAB,THRDL!FE,,,The Wave,2018-08-25,0.722,0.706,0.0,0.161000,0.517,124.013
1233845,Howl At The Moon - Radio Edit,Stadiumx,Taylr Renee,,,,Nicky Romero presents Miami 2014,2014-03-17,0.514,0.934,28.0,0.052500,0.182,127.953
1233856,Let The Bass Kick In Miami Girl - Radio Edit,Chuckie,LMFAO,,,,Let The Bass Kick In Miami Girl,2009-12-06,0.762,0.937,0.0,0.050400,0.546,128.021
1233903,You Make Me,Avicii,,,,,True (Bonus Edition),2013-09-16,0.586,0.727,51.0,0.002470,0.496,124.989


In [7]:
with open("clean_songs_dataset_schema.txt", "r") as file:
    for line in file:
        print(line)

TRAINING DATA SCHEMA



track_name - name of the song.

artist1 - first artist featured on the song.

artist2 - second artist featured on the song.

artist3 - third artist featured on the song.

artist4 - fourth artist featured on the song.

artist5 - fifth artist featured on the song.

album_name - name of the album the song is on.

release_date - date the song was released on Spotify.

danceability - metric measuring how groovy a song is (1.0 means 70s Disco-level groovy, 0.0 means Beethoven-level groovy).

energy - metric measuring how epic the song is (1.0 means 2011 Skrillex, 0.0 means Frank Ocean).

track_popularity - metric measuring how popular the song is based off number of streams, song downloads, and other factors (100.0 means Shape of You by Ed Sheeran, 0.0 means Late 1960s Queen).

acousticness - proobability of a given song being acoustic (1.0 means it is acoustic, 0.0 means it isn't).

valence - metric measuring how positive or negative the vibes are (1.0 means Pharrell