# Overarching Question

-The overarching question will be: "Do popular tracks have audio features which are significantally different from unpopular tracks? And do these audio_features differ from one century to another?"

-The threshold for what makes a track popular will be defined later once the data is better understood

# Data Selection

-Initial investigation of the raw data with DB Browser reveals the following structure:
![Spotify_Raw_Data_Overview.PNG](attachment:Spotify_Raw_Data_Overview.PNG)

-Obviously the audio_features table is required 

-I will also want to investigate the effect wheather a song is explicit on its popularity

-Taking the release date into account will also be interesting so I need the albums and r_albums_tracks tables

-So I will additionally select these columns from the tracks table together with its id and audio_feature_id columns

# Data Wrangling 



-Importing the required packages

In [None]:
import gc
import pandas as pd
import sqlite3 as sl

-Establishing a connection to the database and creating a curser for iteration

In [None]:
con = sl.connect("C:\\Users\\Mark\\Desktop\\Spotify_Projekt\\Rohdaten\\spotify.sqlite")
c = con.cursor()


-I made the decision to convert each of these tables into a pandas dataframe since these are the industry standard and are most efficient for further wrangling and analysis

-Previous attempts revealed that the read_sql method of the pandas package often throws an operational error due to unsuitable encodings in the underlying data

-To help alleviate this issue the text_factory method of sqlite3 is used to ensure that returned rows are of the bytestring and not str type

In [None]:
con.text_factory = bytes

-The following blocks are to ensure that the rows in each table are all encoded correctly

-These blocks can be ommitted if you are sure that the encodings are now all suitable. Also, these blocks may take some time to run. In the interest of defensive programming however, I want to keep them

-The first block includes the helper function try_encodings 

In [None]:
from encodings.aliases import aliases
_encodings = set(aliases.values())

def try_encodings(byte_text: bytes):
    for encoding in _encodings:
        try:
            print(f'Encoding {encoding}: {byte_text.decode(encoding)}')
        except (UnicodeDecodeError, LookupError):
            pass

sql = 'Select * from audio_features'
c.execute(sql)

col_ix = 0  
while row := c.fetchone():
    try:
        text = row[col_ix].decode()
    except UnicodeDecodeError as e:
        print('Cannot decode byte string: ', row[col_ix])
        try_encodings(row[col_ix])
        break

In [None]:
sql = 'Select * from tracks'
c.execute(sql)

col_ix = 0  
while row := c.fetchone():
    try:
        text = row[col_ix].decode()
    except UnicodeDecodeError as e:
        print('Cannot decode byte string: ', row[col_ix])
        try_encodings(row[col_ix])
        break

In [None]:
sql = 'Select * from albums'
c.execute(sql)

col_ix = 0  
while row := c.fetchone():
    try:
        text = row[col_ix].decode()
    except UnicodeDecodeError as e:
        print('Cannot decode byte string: ', row[col_ix])
        try_encodings(row[col_ix])
        break

In [None]:
sql = 'Select * from r_albums_tracks'
c.execute(sql)

col_ix = 0  
while row := c.fetchone():
    try:
        text = row[col_ix].decode()
    except UnicodeDecodeError as e:
        print('Cannot decode byte string: ', row[col_ix])
        try_encodings(row[col_ix])
        break

-Now the dataframes for the tables can be created (takes some time to run)

-To avoid confusion I will rename the id in albums to album_id and the id of tracks to track_id

-The info method allows me to understand the contents of the tables

-I will also investigate if the tables have any missing values or outliers in order to assess the data quality

-Outliers will only be investigated for columns which contain integer or float values

In [None]:
albums = pd.read_sql("SELECT id, release_date FROM albums", con)
albums.rename(columns={"id":"album_id"}, inplace=True)

In [None]:
albums.info()
print(albums.isnull().sum())

In [None]:
albums.describe()

In [None]:
r_albums_tracks = pd.read_sql("SELECT * FROM r_albums_tracks", con)

In [None]:
r_albums_tracks.info()
print(r_albums_tracks.isnull().sum())

In [None]:
r_albums_tracks.describe()

In [None]:
tracks = pd.read_sql("SELECT id, explicit, popularity FROM tracks", con)
tracks.rename(columns={"id":"track_id"}, inplace=True)

In [None]:
tracks.info()
print(tracks.isnull().sum())

In [None]:
tracks.describe()

-I will now add the release_date to the tracks table

-The release date is given as a Unix timestamp. I am only interested in the year the track was released so I will transform the data

In [None]:
r_albums_tracks = r_albums_tracks.merge(albums, on="album_id")
r_albums_tracks.describe()

In [None]:
del albums
gc.collect()

In [None]:
tracks = tracks.merge(r_albums_tracks, on="track_id")
tracks.describe()

In [None]:
del r_albums_tracks
gc.collect()

In [None]:
tracks.drop_duplicates(subset="track_id", inplace=True)

-I will now transform the unix timestamp in years since/from 1970 (-1 = 1969, 0 = 1970, 1 = 1971 etc.)

In [None]:
def unix_conv(x):
    
    x = x/1000 #time is in milliseconds so convert to seconds first
    return int(x/31556926) + 1970
    
tracks["release_date"] = tracks["release_date"].apply(unix_conv)

For improved memory usage I will transform the explicit column from integer to boolean values

In [None]:
def int_to_bool(x):
    if x == 1:
        return True
    else:
        return False
    
tracks["explicit"] = tracks["explicit"].apply(int_to_bool)

In [None]:
tracks.describe()

In [None]:
tracks["release_date"].hist()

-There are obviosusly outliers here, I will only consider tracks from 1950 on so the rest will be filtered out

In [None]:
tracks.drop(tracks[tracks["release_date"] < 1950].index, inplace = True)

- I am also only really interested in the decade a track was released, not the exact year, so I will further transform this column

In [None]:
def year_to_decade(x):
    return str(x)[2] +"0"
tracks["release_date"] = tracks["release_date"].apply(year_to_decade)

In [None]:
tracks.drop("album_id", axis = 1, inplace = True)

-The audio_features table has 11840402 rows and 15 columns, so I will split it up in three dataframes

-The duration of a song will not be included since it doesn't speak to the nature of the actual sound of a track

-Release date will also be added to each track

In [None]:
audio_features_1 = pd.read_sql("SELECT id, acousticness, danceability, energy, instrumentalness FROM audio_features", con)
audio_features_1.rename(columns={"id":"track_id"}, inplace=True)

In [None]:
tracks["popularity"].quantile(0.95)

In [None]:
audio_features_1 = audio_features_1.merge(tracks[["track_id", "release_date", "popularity"]], on="track_id")

In [None]:
audio_features_2 = pd.read_sql("SELECT id, key, liveness, loudness, mode, speechiness FROM audio_features", con)
audio_features_2.rename(columns={"id":"track_id"}, inplace=True)

In [None]:
audio_features_2 = audio_features_2.merge(tracks[["track_id", "release_date", "popularity"]], on="track_id")

In [None]:
audio_features_3 = pd.read_sql("SELECT id, tempo, time_signature, valence FROM audio_features", con)
audio_features_3.rename(columns={"id":"track_id"}, inplace=True)

In [None]:
audio_features_3 = audio_features_3.merge(tracks, on="track_id")

In [None]:
audio_features_1.info()
print(audio_features_1.isnull().sum())

In [None]:
audio_features_1.describe()

In [None]:
audio_features_2.info()
print(audio_features_2.isnull().sum())

In [None]:
audio_features_2.describe()

As with the explicit column from tracks I will transform the mode column to boolean values. True will be major, False will be minor

In [None]:
audio_features_2["mode"] = audio_features_2["mode"].apply(int_to_bool)

In [None]:
tracks.drop("explicit", axis = 1, inplace = True)

In [None]:
audio_features_3.info()
print(audio_features_3.isnull().sum())

In [None]:
audio_features_3.describe()

In [None]:
tracks.drop("release_date", axis = 1, inplace = True)

-There are no missing values in the selected data

-There are also no obvious outliers or values outside from what should be expected from the documentation

-I will now define when a track is popular

In [None]:
tracks["popularity"].describe()

In [None]:
tracks.hist("popularity")

-As we can see most tracks are not popular at all or have a relatively low popularity 

-Spotify measures the popularity of a track from 0 to 100. The mean only lies at 6,12 and 75% of all tracks have a popularity of less than 8

-I will choose as a significance level 0,05 since that is the most commonly used level

-The 95%th quantile lies at a popularity of 30 so this will be the threshold

-This definition is not perfect but popularity won't be the focus of this analysis

In [None]:
del tracks
gc.collect()

# Data Analysis

-First I will check if popular tracks in a given decade have a significantally different mean from unpopular tracks for columns with integer values

-For the categorical columns (key, mode time_signature and explicit) I will compare the occurences of their given instances

-To get an overview over all decades of how the audio_features change, I will create four new df

-Integer/Float columns will be loaded as their respective means into feats_num_unpop and feats_num_pop

-Categorical columns will be loaded in the dfs feats_cat_unpop and feats_cat_pop, with the relative occurences of the categories

In [None]:
def get_means (feat, pop):
    list = []
    dec = ['50', "60", "70", "80", "90", "00", "10", "20"]
    if pop == "pop":
        if feat in audio_features_1:
            for x in dec:
                list.append(audio_features_1[(audio_features_1["release_date"] == x) & (audio_features_1["popularity"] >= 30)][feat].mean())
            return list
        elif feat in audio_features_2:
            for x in dec:
                list.append(audio_features_2[(audio_features_2["release_date"] == x) & (audio_features_2["popularity"] >= 30)][feat].mean())
            return list
        elif feat in audio_features_3:
            for x in dec:
                list.append(audio_features_3[(audio_features_3["release_date"] == x) & (audio_features_3["popularity"] >= 30)][feat].mean())
            return list
    elif pop == "unpop":
        if feat in audio_features_1:
            for x in dec:
                list.append(audio_features_1[(audio_features_1["release_date"] == x) & (audio_features_1["popularity"] < 30)][feat].mean())
            return list
        elif feat in audio_features_2:
            for x in dec:
                list.append(audio_features_2[(audio_features_3["release_date"] == x) & (audio_features_2["popularity"] < 30)][feat].mean())
            return list
        elif feat in audio_features_3:
            for x in dec:
                list.append(audio_features_3[(audio_features_3["release_date"] == x) & (audio_features_3["popularity"] < 30)][feat].mean())
            return list
        

In [None]:
feats_num = pd.DataFrame.from_dict({
    "century": ['50s', "60s", "70s", "80s", "90s", "00s", "10s", "20s"],
    "acousticness_pop": get_means("acousticness", "pop"),
    "acousticness_unpop": get_means("acousticness", "unpop"),
    "danceability_pop": get_means("danceability", "pop"),
    "danceability_unpop": get_means("danceability", "unpop"),
    "energy_pop": get_means("energy", "pop"),
    "energy_unpop": get_means("energy", "unpop"),
    "instrumentalness_pop": get_means("instrumentalness", "pop"),
    "instrumentalness_unpop": get_means("instrumentalness", "unpop"),
    "liveness_pop": get_means("liveness", "pop"),
    "liveness_unpop": get_means("liveness", "unpop"),
    "loudness_pop": get_means("loudness", "pop"),
    "loudness_unpop": get_means("loudness", "unpop"),
    "speechiness_pop": get_means("speechiness", "pop"),
    "speechiness_unpop": get_means("speechiness", "unpop"),
    "tempo_pop": get_means("tempo", "pop"),
    "tempo_unpop": get_means("tempo", "unpop"),
    "valence_pop": get_means("valence", "pop"),
    "valence_unpop": get_means("valence", "unpop")
})
feats_num.set_index("century", inplace=True)

In [None]:
feats_num[["acousticness_pop", "acousticness_unpop"]].plot(ylim = (0, 1))

In [None]:
feats_num[["danceability_pop", "danceability_unpop"]].plot(ylim = (0, 1))

In [None]:
feats_num[["energy_pop", "energy_unpop"]].plot(ylim = (0, 1))

In [None]:
feats_num[["instrumentalness_pop", "instrumentalness_unpop"]].plot(ylim = (0, 1))

In [None]:
feats_num[["liveness_pop", "liveness_unpop"]].plot(ylim = (0, 1))

In [None]:
feats_num[["loudness_pop", "loudness_unpop"]].plot(ylim = (-60, 0))

In [None]:
feats_num[["speechiness_pop", "speechiness_unpop"]].plot(ylim = (0, 1))

In [None]:
feats_num[["tempo_pop", "tempo_unpop"]].plot(ylim = (60, 250))

In [None]:
feats_num[["valence_pop", "valence_unpop"]].plot(ylim = (0, 1))

In [None]:
mode = pd.DataFrame.from_dict({
    "century": ['50s', "60s", "70s", "80s", "90s", "00s", "10s", "20s"],
    "major_pop": [70.39239, 73.844937, 72.1473, 69.006693, 69.33926, 65.678333, 62.119926, 55.996197],
    "major_unpop": [71.914794, 74.942017, 72.544489, 69.728469, 70.690694, 67.871549, 64.419176, 61.652418]
})
mode.set_index("century", inplace=True)

In [None]:
mode.plot(ylim = (0, 100))

In [None]:
explicit = pd.DataFrame.from_dict({
    "century": ['50s', "60s", "70s", "80s", "90s", "00s", "10s", "20s"],
    "explicit_pop": [0.059453, 0.015823, 0.231188, 1.959651, 7.95962, 8.604616, 14.311508, 28.211224],
    "explicit_unpop": [0.703266, 0.082412, 0.323912, 1.728623, 2.960856, 4.677925, 7.712943, 13.393363]
})
explicit.set_index("century", inplace=True)

In [None]:
explicit.plot(ylim= (0,100))

- Across nearly all centuries, no matter the popularity, the time gignature is nearly always either 4/4 or 3/4, but especially 4/4
- So this will only include the percentage of 4/4 time singnature, the rest is nearly only 3/4

In [None]:
time_sig = pd.DataFrame.from_dict({
    "century": ['50s', "60s", "70s", "80s", "90s", "00s", "10s", "20s"],
    "time_sig_pop": [80.380499, 82.120253, 89.580008, 91.564351, 89.155225, 89.579194, 87.794941, 89.855239],
    "time_sig_unpop": [71.866239, 77.499146, 82.399452, 83.575605, 81.027481, 83.234607, 83.589752, 81.795449]
})
time_sig.set_index("century", inplace=True)

In [None]:
time_sig.plot(ylim= (0,100))

# 50s

In [None]:
audio_features_1[(audio_features_1["release_date"] == "50") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_1[(audio_features_1["release_date"] == "50") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "50") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "50") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
key_pop_50th_group = audio_features_2[(audio_features_2["release_date"] == "50") & (audio_features_1["popularity"] >= 30)].groupby("key").size()

In [None]:
def get_perc(x, y):
    return (x/y)*100

key_pop_50th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "50") & (audio_features_1["popularity"] >= 30)].shape[0])))

In [None]:
del key_pop_50th_group
gc.collect()

In [None]:
key_unpop_50th_group = audio_features_2[(audio_features_2["release_date"] == "50") & (audio_features_2["popularity"] < 30)].groupby("key").size()

key_unpop_50th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "50") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
mode_unpop_50th_group = audio_features_2[(audio_features_2["release_date"] == "50") & (audio_features_2["popularity"] < 30)].groupby("mode").size()

mode_unpop_50th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "50") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
mode_pop_50th_group = audio_features_2[(audio_features_2["release_date"] == "50") & (audio_features_2["popularity"] >= 30)].groupby("mode").size()

mode_pop_50th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "50") & (audio_features_2["popularity"] >= 30)].shape[0])))

In [None]:
del mode_unpop_50th_group
del mode_pop_50th_group
gc.collect()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "50") & (audio_features_3["popularity"] >= 30)].describe()

In [None]:
time_sig_pop_50th_group = audio_features_3[(audio_features_3["release_date"] == "50") & (audio_features_3["popularity"] >= 30)].groupby("time_signature").size()

time_sig_pop_50th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "50") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
del time_sig_pop_50th_group
gc.collect()

In [None]:
time_sig_unpop_50th_group = audio_features_3[(audio_features_3["release_date"] == "50") & (audio_features_3["popularity"] < 30)].groupby("time_signature").size()

time_sig_unpop_50th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "50") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del time_sig_unpop_50th_group
gc.collect()

In [None]:
explicit_pop_50th_group = audio_features_3[(audio_features_3["release_date"] == "50") & (audio_features_3["popularity"] >= 30)].groupby("explicit").size()

explicit_pop_50th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "50") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
explicit_unpop_50th_group = audio_features_3[(audio_features_3["release_date"] == "50") & (audio_features_3["popularity"] < 30)].groupby("explicit").size()

explicit_unpop_50th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "50") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del explicit_unpop_50th_group
del explicit_pop_50th_group
gc.collect()

# 60s

In [None]:
audio_features_1[(audio_features_1["release_date"] == "60") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_1[(audio_features_1["release_date"] == "60") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "60") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "60") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
key_pop_60th_group = audio_features_2[(audio_features_2["release_date"] == "60") & (audio_features_1["popularity"] >= 30)].groupby("key").size()

key_pop_60th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "60") & (audio_features_1["popularity"] >= 30)].shape[0])))

In [None]:
del key_pop_60th_group
gc.collect()

In [None]:
key_unpop_60th_group = audio_features_2[(audio_features_2["release_date"] == "60") & (audio_features_2["popularity"] < 30)].groupby("key").size()

key_unpop_60th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "60") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
mode_pop_60th_group = audio_features_2[(audio_features_2["release_date"] == "60") & (audio_features_2["popularity"] >= 30)].groupby("mode").size()

mode_pop_60th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "60") & (audio_features_2["popularity"] >= 30)].shape[0])))

In [None]:
mode_unpop_60th_group = audio_features_2[(audio_features_2["release_date"] == "60") & (audio_features_2["popularity"] < 30)].groupby("mode").size()

mode_unpop_60th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "60") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
del mode_unpop_60th_group
del mode_pop_60th_group
gc.collect()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "60") & (audio_features_3["popularity"] >= 30)].describe()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "60") & (audio_features_3["popularity"] < 30)].describe()

In [None]:
time_sig_pop_60th_group = audio_features_3[(audio_features_3["release_date"] == "60") & (audio_features_3["popularity"] >= 30)].groupby("time_signature").size()

time_sig_pop_60th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "60") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
del time_sig_pop_60th_group
gc.collect()

In [None]:
time_sig_unpop_60th_group = audio_features_3[(audio_features_3["release_date"] == "60") & (audio_features_3["popularity"] < 30)].groupby("time_signature").size()

time_sig_unpop_60th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "60") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del time_sig_unpop_60th_group
gc.collect()

In [None]:
explicit_pop_60th_group = audio_features_3[(audio_features_3["release_date"] == "60") & (audio_features_3["popularity"] >= 30)].groupby("explicit").size()

explicit_pop_60th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "60") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
explicit_unpop_60th_group = audio_features_3[(audio_features_3["release_date"] == "60") & (audio_features_3["popularity"] < 30)].groupby("explicit").size()

explicit_unpop_60th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "60") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del explicit_unpop_60th_group
del explicit_pop_60th_group
gc.collect()

# 70s

In [None]:
audio_features_1[(audio_features_1["release_date"] == "70") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_1[(audio_features_1["release_date"] == "70") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "70") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "70") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
key_pop_70th_group = audio_features_2[(audio_features_2["release_date"] == "70") & (audio_features_1["popularity"] >= 30)].groupby("key").size()

key_pop_70th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "70") & (audio_features_1["popularity"] >= 30)].shape[0])))

In [None]:
del key_pop_70th_group
gc.collect()

In [None]:
key_unpop_70th_group = audio_features_2[(audio_features_2["release_date"] == "70") & (audio_features_2["popularity"] < 30)].groupby("key").size()

key_unpop_70th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "70") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
mode_pop_70th_group = audio_features_2[(audio_features_2["release_date"] == "70") & (audio_features_2["popularity"] >= 30)].groupby("mode").size()

mode_pop_70th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "70") & (audio_features_2["popularity"] >= 30)].shape[0])))

In [None]:
mode_unpop_70th_group = audio_features_2[(audio_features_2["release_date"] == "70") & (audio_features_2["popularity"] < 30)].groupby("mode").size()

mode_unpop_70th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "70") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
del mode_unpop_70th_group
del mode_pop_70th_group
gc.collect()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "70") & (audio_features_3["popularity"] >= 30)].describe()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "70") & (audio_features_3["popularity"] < 30)].describe()

In [None]:
time_sig_pop_70th_group = audio_features_3[(audio_features_3["release_date"] == "70") & (audio_features_3["popularity"] >= 30)].groupby("time_signature").size()

time_sig_pop_70th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "70") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
del time_sig_pop_70th_group
gc.collect()

In [None]:
time_sig_unpop_70th_group = audio_features_3[(audio_features_3["release_date"] == "70") & (audio_features_3["popularity"] < 30)].groupby("time_signature").size()

time_sig_unpop_70th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "70") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del time_sig_unpop_70th_group
gc.collect()

In [None]:
explicit_pop_70th_group = audio_features_3[(audio_features_3["release_date"] == "70") & (audio_features_3["popularity"] >= 30)].groupby("explicit").size()

explicit_pop_70th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "70") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
explicit_unpop_70th_group = audio_features_3[(audio_features_3["release_date"] == "70") & (audio_features_3["popularity"] < 30)].groupby("explicit").size()

explicit_unpop_70th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "70") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del explicit_unpop_70th_group
del explicit_pop_70th_group
gc.collect()

# 80s

In [None]:
audio_features_1[(audio_features_1["release_date"] == "80") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_1[(audio_features_1["release_date"] == "80") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "80") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "80") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
key_pop_80th_group = audio_features_2[(audio_features_2["release_date"] == "80") & (audio_features_1["popularity"] >= 30)].groupby("key").size()

key_pop_80th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "80") & (audio_features_1["popularity"] >= 30)].shape[0])))

In [None]:
del key_pop_80th_group
gc.collect()

In [None]:
key_unpop_80th_group = audio_features_2[(audio_features_2["release_date"] == "80") & (audio_features_2["popularity"] < 30)].groupby("key").size()

key_unpop_80th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "80") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
mode_pop_80th_group = audio_features_2[(audio_features_2["release_date"] == "80") & (audio_features_2["popularity"] >= 30)].groupby("mode").size()

mode_pop_80th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "80") & (audio_features_2["popularity"] >= 30)].shape[0])))

In [None]:
mode_unpop_80th_group = audio_features_2[(audio_features_2["release_date"] == "80") & (audio_features_2["popularity"] < 30)].groupby("mode").size()

mode_unpop_80th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "80") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
del mode_unpop_80th_group
del mode_pop_80th_group
gc.collect()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "80") & (audio_features_3["popularity"] >= 30)].describe()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "80") & (audio_features_3["popularity"] < 30)].describe()

In [None]:
time_sig_pop_80th_group = audio_features_3[(audio_features_3["release_date"] == "80") & (audio_features_3["popularity"] >= 30)].groupby("time_signature").size()

time_sig_pop_80th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "80") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
del time_sig_pop_80th_group
gc.collect()

In [None]:
time_sig_unpop_80th_group = audio_features_3[(audio_features_3["release_date"] == "80") & (audio_features_3["popularity"] < 30)].groupby("time_signature").size()

time_sig_unpop_80th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "80") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del time_sig_unpop_80th_group
gc.collect()

In [None]:
explicit_pop_80th_group = audio_features_3[(audio_features_3["release_date"] == "80") & (audio_features_3["popularity"] >= 30)].groupby("explicit").size()

explicit_pop_80th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "80") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
explicit_unpop_80th_group = audio_features_3[(audio_features_3["release_date"] == "80") & (audio_features_3["popularity"] < 30)].groupby("explicit").size()

explicit_unpop_80th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "80") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del explicit_unpop_80th_group
del explicit_pop_80th_group
gc.collect()

# 90s

In [None]:
audio_features_1[(audio_features_1["release_date"] == "90") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_1[(audio_features_1["release_date"] == "90") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "90") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "90") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
key_pop_90th_group = audio_features_2[(audio_features_2["release_date"] == "90") & (audio_features_1["popularity"] >= 30)].groupby("key").size()

key_pop_90th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "90") & (audio_features_1["popularity"] >= 30)].shape[0])))

In [None]:
del key_pop_90th_group
gc.collect()

In [None]:
key_unpop_90th_group = audio_features_2[(audio_features_2["release_date"] == "90") & (audio_features_2["popularity"] < 30)].groupby("key").size()

key_unpop_90th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "90") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
mode_pop_90th_group = audio_features_2[(audio_features_2["release_date"] == "90") & (audio_features_2["popularity"] >= 30)].groupby("mode").size()

mode_pop_90th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "90") & (audio_features_2["popularity"] >= 30)].shape[0])))

In [None]:
mode_unpop_90th_group = audio_features_2[(audio_features_2["release_date"] == "90") & (audio_features_2["popularity"] < 30)].groupby("mode").size()

mode_unpop_90th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "90") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
del mode_unpop_90th_group
del mode_pop_90th_group
gc.collect()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "90") & (audio_features_3["popularity"] >= 30)].describe()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "90") & (audio_features_3["popularity"] < 30)].describe()

In [None]:
time_sig_pop_90th_group = audio_features_3[(audio_features_3["release_date"] == "90") & (audio_features_3["popularity"] >= 30)].groupby("time_signature").size()

time_sig_pop_90th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "90") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
del time_sig_pop_90th_group
gc.collect()

In [None]:
time_sig_unpop_90th_group = audio_features_3[(audio_features_3["release_date"] == "90") & (audio_features_3["popularity"] < 30)].groupby("time_signature").size()

time_sig_unpop_90th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "90") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del time_sig_unpop_90th_group
gc.collect()

In [None]:
explicit_pop_90th_group = audio_features_3[(audio_features_3["release_date"] == "90") & (audio_features_3["popularity"] >= 30)].groupby("explicit").size()

explicit_pop_90th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "90") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
explicit_unpop_90th_group = audio_features_3[(audio_features_3["release_date"] == "90") & (audio_features_3["popularity"] < 30)].groupby("explicit").size()

explicit_unpop_90th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "90") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del explicit_unpop_90th_group
del explicit_pop_90th_group
gc.collect()

# 00s

In [None]:
audio_features_1[(audio_features_1["release_date"] == "00") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_1[(audio_features_1["release_date"] == "00") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "00") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "00") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
key_pop_00th_group = audio_features_2[(audio_features_2["release_date"] == "00") & (audio_features_1["popularity"] >= 30)].groupby("key").size()

key_pop_00th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "00") & (audio_features_1["popularity"] >= 30)].shape[0])))

In [None]:
del key_pop_00th_group
gc.collect()

In [None]:
key_unpop_00th_group = audio_features_2[(audio_features_2["release_date"] == "00") & (audio_features_2["popularity"] < 30)].groupby("key").size()

key_unpop_00th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "00") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
mode_pop_00th_group = audio_features_2[(audio_features_2["release_date"] == "00") & (audio_features_2["popularity"] >= 30)].groupby("mode").size()

mode_pop_00th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "00") & (audio_features_2["popularity"] >= 30)].shape[0])))

In [None]:
mode_unpop_00th_group = audio_features_2[(audio_features_2["release_date"] == "00") & (audio_features_2["popularity"] < 30)].groupby("mode").size()

mode_unpop_00th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "00") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
del mode_unpop_00th_group
del mode_pop_00th_group
gc.collect()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "00") & (audio_features_3["popularity"] >= 30)].describe()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "00") & (audio_features_3["popularity"] < 30)].describe()

In [None]:
time_sig_pop_00th_group = audio_features_3[(audio_features_3["release_date"] == "00") & (audio_features_3["popularity"] >= 30)].groupby("time_signature").size()

time_sig_pop_00th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "00") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
del time_sig_pop_00th_group
gc.collect()

In [None]:
time_sig_unpop_00th_group = audio_features_3[(audio_features_3["release_date"] == "00") & (audio_features_3["popularity"] < 30)].groupby("time_signature").size()

time_sig_unpop_00th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "00") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del time_sig_unpop_00th_group
gc.collect()

In [None]:
explicit_pop_00th_group = audio_features_3[(audio_features_3["release_date"] == "00") & (audio_features_3["popularity"] >= 30)].groupby("explicit").size()

explicit_pop_00th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "00") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
explicit_unpop_00th_group = audio_features_3[(audio_features_3["release_date"] == "00") & (audio_features_3["popularity"] < 30)].groupby("explicit").size()

explicit_unpop_00th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "00") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del explicit_unpop_00th_group
del explicit_pop_00th_group
gc.collect()

# 10s

In [None]:
audio_features_1[(audio_features_1["release_date"] == "10") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_1[(audio_features_1["release_date"] == "10") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "10") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "10") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
key_pop_10th_group = audio_features_2[(audio_features_2["release_date"] == "10") & (audio_features_1["popularity"] >= 30)].groupby("key").size()

key_pop_10th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "10") & (audio_features_1["popularity"] >= 30)].shape[0])))

In [None]:
del key_pop_10th_group
gc.collect()

In [None]:
key_unpop_10th_group = audio_features_2[(audio_features_2["release_date"] == "10") & (audio_features_2["popularity"] < 30)].groupby("key").size()

key_unpop_10th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "10") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
mode_pop_10th_group = audio_features_2[(audio_features_2["release_date"] == "10") & (audio_features_2["popularity"] >= 30)].groupby("mode").size()

mode_pop_10th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "10") & (audio_features_2["popularity"] >= 30)].shape[0])))

In [None]:
mode_unpop_10th_group = audio_features_2[(audio_features_2["release_date"] == "10") & (audio_features_2["popularity"] < 30)].groupby("mode").size()

mode_unpop_10th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "10") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
del mode_unpop_10th_group
del mode_pop_10th_group
gc.collect()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "10") & (audio_features_3["popularity"] >= 30)].describe()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "10") & (audio_features_3["popularity"] < 30)].describe()

In [None]:
time_sig_pop_10th_group = audio_features_3[(audio_features_3["release_date"] == "10") & (audio_features_3["popularity"] >= 30)].groupby("time_signature").size()

time_sig_pop_10th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "10") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
del time_sig_pop_10th_group
gc.collect()

In [None]:
time_sig_unpop_10th_group = audio_features_3[(audio_features_3["release_date"] == "10") & (audio_features_3["popularity"] < 30)].groupby("time_signature").size()

time_sig_unpop_10th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "10") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del time_sig_unpop_10th_group
gc.collect()

In [None]:
explicit_pop_10th_group = audio_features_3[(audio_features_3["release_date"] == "10") & (audio_features_3["popularity"] >= 30)].groupby("explicit").size()

explicit_pop_10th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "10") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
explicit_unpop_10th_group = audio_features_3[(audio_features_3["release_date"] == "10") & (audio_features_3["popularity"] < 30)].groupby("explicit").size()

explicit_unpop_10th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "10") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del explicit_unpop_10th_group
del explicit_pop_10th_group
gc.collect()

# 20s

In [None]:
audio_features_1[(audio_features_1["release_date"] == "20") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_1[(audio_features_1["release_date"] == "20") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "20") & (audio_features_1["popularity"] >= 30)].describe()

In [None]:
audio_features_2[(audio_features_2["release_date"] == "20") & (audio_features_1["popularity"] < 30)].describe()

In [None]:
key_pop_20th_group = audio_features_2[(audio_features_2["release_date"] == "20") & (audio_features_1["popularity"] >= 30)].groupby("key").size()

key_pop_20th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "20") & (audio_features_1["popularity"] >= 30)].shape[0])))

In [None]:
del key_pop_20th_group
gc.collect()

In [None]:
key_unpop_20th_group = audio_features_2[(audio_features_2["release_date"] == "20") & (audio_features_2["popularity"] < 30)].groupby("key").size()

key_unpop_20th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "20") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
mode_pop_20th_group = audio_features_2[(audio_features_2["release_date"] == "20") & (audio_features_2["popularity"] >= 30)].groupby("mode").size()

mode_pop_20th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "20") & (audio_features_2["popularity"] >= 30)].shape[0])))

In [None]:
mode_unpop_20th_group = audio_features_2[(audio_features_2["release_date"] == "20") & (audio_features_2["popularity"] < 30)].groupby("mode").size()

mode_unpop_20th_group.apply((lambda x: get_perc(x, audio_features_2[(audio_features_2["release_date"] == "20") & (audio_features_2["popularity"] < 30)].shape[0])))

In [None]:
del mode_unpop_20th_group
del mode_pop_20th_group
gc.collect()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "20") & (audio_features_3["popularity"] >= 30)].describe()

In [None]:
audio_features_3[(audio_features_3["release_date"] == "20") & (audio_features_3["popularity"] < 30)].describe()

In [None]:
time_sig_pop_20th_group = audio_features_3[(audio_features_3["release_date"] == "20") & (audio_features_3["popularity"] >= 30)].groupby("time_signature").size()

time_sig_pop_20th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "20") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
del time_sig_pop_20th_group
gc.collect()

In [None]:
time_sig_unpop_20th_group = audio_features_3[(audio_features_3["release_date"] == "20") & (audio_features_3["popularity"] < 30)].groupby("time_signature").size()

time_sig_unpop_20th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "20") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del time_sig_unpop_20th_group
gc.collect()

In [None]:
explicit_pop_20th_group = audio_features_3[(audio_features_3["release_date"] == "20") & (audio_features_3["popularity"] >= 30)].groupby("explicit").size()

explicit_pop_20th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "20") & (audio_features_3["popularity"] >= 30)].shape[0])))

In [None]:
explicit_unpop_20th_group = audio_features_3[(audio_features_3["release_date"] == "20") & (audio_features_3["popularity"] < 30)].groupby("explicit").size()

explicit_unpop_20th_group.apply((lambda x: get_perc(x, audio_features_3[(audio_features_3["release_date"] == "20") & (audio_features_3["popularity"] < 30)].shape[0])))

In [None]:
del explicit_unpop_20th_group
del explicit_pop_20th_group
gc.collect()