# **M255 Lab: Communities with Spotify**
## A Trip Down Memory Lane - Destination: Childhood Shows

The playlists being analyzed in this notebook will attempt to answer the trends and attributes associated with the characters in childhood shows or its related theme tracks.

Addition musical question includes:
* What are the common music features that appeals to children?
* What are the correlations between these said music features?
    * Is there a relationship?

The playlists used for this purpose are:
* Tom Cat's Chase Music Collection
* Stitch's
* Anime Recollections

Some anticipated attributes include having high level of energy, tempo, and valence because these are the elements that are expect to be attracting to children.

### Part 1: Setting Up
#### Importing Python Libraries
Transfer from Notebook C

In [1]:
import pandas as pd
import numpy as np
import random
import altair as alt
import requests
import inspect
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import networkx as nx
import networkx.algorithms.community as nx_comm
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pyvis
from pyvis import network as net
from itertools import combinations
from community import community_louvain
from copy import deepcopy
import plotly.graph_objects as go
import plotly.offline as pyo

#### Providing User Credential
Transfer from Notebook C

In [2]:
# storing the credentials:
CLIENT_ID = "116bae2a86fd4737862816c5f45d4c36"
CLIENT_SECRET = "4f4a732d83d04cfa94acc26d2b77169f"
my_username = "sx47r9lq4dwrjx1r0ct9f9m09"

# instantiating the client
# source: Max Hilsdorf (https://towardsdatascience.com/how-to-create-large-music-datasets-using-spotipy-40e7242cc6a6)
client_credentials_manager = SpotifyClientCredentials(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)


A web client is established here to access the connection to API. Now, we can analyze the music metadata with this.

### Part 2: Analyzing Playlists
#### Obtaining Data

##### Goal: obtain the tracks in three playlists we designated

From the Google Doc "M255 Playlist Roundup", we can locate the two parameters, user ID and playlist ID, for each of the three playlists

* Tom Cat's Chase Music Collection: "31zhcjb3h2e5miwmhyzeuj4cibmq","16rZpzLucvgWLOEULEyKBM"
* Stitch's: "31hwpxwmzz22hd46eqoagqznyfty","5taHWErP8pC1xhtSWWPYLE"
* Anime Recollections: "g495g22jg4nfsbt31gxyiao1s","5szvg6CvBqEmY2axuwweNT"

Then input the above information in:

In [3]:
# playlist_tracks(user_id: String, playlist_id: String): json_dict
TomJerry_tracks = pd.DataFrame(sp.user_playlist_tracks("31zhcjb3h2e5miwmhyzeuj4cibmq", "16rZpzLucvgWLOEULEyKBM"))
stitch_tracks = pd.DataFrame(sp.user_playlist_tracks("31hwpxwmzz22hd46eqoagqznyfty", "5taHWErP8pC1xhtSWWPYLE"))
anime_tracks = pd.DataFrame(sp.user_playlist_tracks("g495g22jg4nfsbt31gxyiao1s", "5szvg6CvBqEmY2axuwweNT"))

After retrieving the tracks in each of the playlist, we can look for the music features we seek through this function provided by Max Hilsdorf that loop through the items(tracks) of a playlist and get the audio features of every tracks.

In [4]:
# This function is created based on Max Hilsdorf's article
# Source: https://towardsdatascience.com/how-to-create-large-music-datasets-using-spotipy-40e7242cc6a6
def get_audio_features_df(playlist):
    
    # Create an empty dataframe
    playlist_features_list = ["artist", "album", "track_name", "track_id","danceability","energy","key","loudness","mode", "speechiness","instrumentalness","liveness","valence","tempo", "duration_ms","time_signature"]
    playlist_df = pd.DataFrame(columns = playlist_features_list)
    
    # Loop through every track in the playlist, extract features and append the features to the playlist df
    for track in playlist["items"]:
        # Create empty dict
        playlist_features = {}
        # Get metadata
        playlist_features["artist"] = track["track"]["album"]["artists"][0]["name"]
        playlist_features["album"] = track["track"]["album"]["name"]
        playlist_features["track_name"] = track["track"]["name"]
        playlist_features["track_id"] = track["track"]["id"]
        
        # Get audio features
        audio_features = sp.audio_features(playlist_features["track_id"])[0]
        for feature in playlist_features_list[4:]:
            playlist_features[feature] = audio_features[feature]
        
        # Concat the DataFrames
        track_df = pd.DataFrame(playlist_features, index = [0])
        playlist_df = pd.concat([playlist_df, track_df], ignore_index = True)
        
    return playlist_df

##### With the function defined, now apply this function to each of the three playlist:

While applying the function, a new feature, "playlist", is added to categorize which playlist the tracks belongs to. This is helpful later (spoilers alert) when we concatenate the three playlist, we know right away which it belongs to. In addition, looking at all of the tracks from all three tracks all at once in a single graph.

###### Tom & Jerry Audio Features Dataframe:

In [5]:
tj_audio_features_df = get_audio_features_df(TomJerry_tracks)
tj_audio_features_df["playlist"] = "Tom&Jerry"
tj_audio_features_df

Unnamed: 0,artist,album,track_name,track_id,danceability,energy,key,loudness,mode,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,playlist
0,Various Artists,It's Always Sunny In Philadelphia (Music from ...,Off Broadway,7uSWYgt3MmFjLshyRdIgPL,0.602,0.668,2,-8.71,1,0.0352,0.876,0.462,0.849,150.073,150635,4,Tom&Jerry
1,Vulfpeck,Mr Finish Line,Tee Time,3XxpR8oXqq5Km5X4LlS0pi,0.578,0.889,6,-9.708,1,0.0348,0.73,0.0984,0.868,140.986,169004,4,Tom&Jerry
2,Various Artists,SUPER EUROBEAT presents INITIAL D 〜D SELECTION 2〜,LOVE & MONEY,16211VKpjM4Hh3MfIlcu4s,0.564,0.971,9,-5.338,0,0.0362,0.132,0.223,0.961,155.437,299933,4,Tom&Jerry
3,Various Artists,SUPER EUROBEAT presents INITIAL D〜D SELECTION 3〜,NIGHT & DAY,1tYWuxQgXJqA4L4jACzjcK,0.38,0.985,1,-5.074,0,0.0548,5.3e-05,0.226,0.871,75.659,311300,4,Tom&Jerry
4,Various Artists,The Fast And The Furious: Tokyo Drift (Origina...,Speed,5EbTHYIQQtNNMloPNirkKi,0.413,0.912,4,-5.092,0,0.0878,0.0203,0.14,0.497,165.156,168587,4,Tom&Jerry
5,Fatboy Slim,The Greatest Hits: Why Try Harder,Weapon Of Choice,08kB9HSfrcIi83rymwgjMz,0.626,0.947,8,-4.608,0,0.057,0.78,0.209,0.959,195.972,219813,4,Tom&Jerry
6,Cory Wong,The Optimist,Jax,39PRMLzu72Nz6lSOCF3bWN,0.781,0.84,5,-7.689,1,0.0578,0.834,0.0505,0.698,111.995,252253,4,Tom&Jerry
7,Ricky Martin,Ricky Martin,Livin' la Vida Loca,0Ph6L4l8dYUuXFmb71Ajnd,0.425,0.954,1,-3.756,0,0.0476,0.0,0.0555,0.933,178.043,243160,4,Tom&Jerry
8,A Certain Ratio,Sextet,Lucinda,7km2akG0G3Z5STqoKrHZPf,0.69,0.902,2,-6.632,1,0.0686,0.833,0.0923,0.783,124.855,234813,4,Tom&Jerry
9,Kristofer Maddigan,Cuphead (Original Soundtrack),Fiery Frolic,02h2lqTc872LnYQ7TOFMmF,0.607,0.679,7,-6.221,0,0.114,0.659,0.336,0.715,150.229,217627,4,Tom&Jerry


###### Stitch's Audio Features Dataframe:

In [6]:
stitch_audio_features_df = get_audio_features_df(stitch_tracks)
stitch_audio_features_df["playlist"] = "Stitch"
stitch_audio_features_df

Unnamed: 0,artist,album,track_name,track_id,danceability,energy,key,loudness,mode,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,playlist
0,Elvis Presley,From Elvis in Memphis,Suspicious Minds,1H5IfYyIIAlgDX8zguUzns,0.487,0.382,7,-10.889,1,0.0309,5e-06,0.411,0.714,116.557,261280,4,Stitch
1,Elvis Presley,"Elvis' Golden Records, Vol. 3",Stuck on You,39zODpVtRvghMyfNjZ3BVK,0.647,0.513,7,-12.372,1,0.0421,9e-06,0.108,0.955,131.641,139640,4,Stitch
2,Elvis Presley,Elvis' Golden Records,Hound Dog,64Ny7djQ6rNJspquof2KoX,0.494,0.756,0,-8.492,1,0.0499,0.00505,0.76,0.949,86.895,136027,4,Stitch
3,Elvis Presley,Elvis 30 #1 Hits,(You're The) Devil in Disguise,0D1pEisM3QkiacGXJe5dmd,0.481,0.733,5,-7.633,1,0.165,1.2e-05,0.108,0.874,122.909,140427,4,Stitch
4,Various Artists,Lilo & Stitch,He Mele No Lilo,3G9ZnSjGYyHx7e221v0qse,0.73,0.331,2,-12.427,1,0.0429,0.000286,0.104,0.394,126.592,148133,4,Stitch
5,Elvis Presley,Elvis (Fool),Burning Love,7zMUCLm1TN9o9JlLISztxO,0.66,0.748,2,-11.206,1,0.0284,0.00585,0.283,0.972,143.549,170293,4,Stitch
6,Hank Williams,"The Legend Lives Anew: Hank Williams, Sr. With...",I'm So Lonesome I Could Cry,4tj7IsJrn4MvesuhoY0JBy,0.524,0.258,4,-14.827,1,0.0287,0.00164,0.328,0.349,111.937,169067,3,Stitch
7,Elvis Presley,Blue Hawaii,Aloha Oe,0a1OAjVB17jFlGegbhpQig,0.287,0.278,7,-15.653,1,0.0337,5.3e-05,0.194,0.254,98.772,114227,3,Stitch
8,Various Artists,Lilo & Stitch,Stitch to the Rescue,1616uIpy5KwVqnKtIuyhcu,0.182,0.105,2,-18.314,1,0.0377,0.518,0.0713,0.0351,78.757,354827,4,Stitch
9,Various Artists,Lilo & Stitch,Hawaiian Roller Coaster Ride,7GmiJVBAzWNikX5VkNQg85,0.702,0.624,5,-9.927,1,0.0285,0.0152,0.0669,0.947,114.542,208227,4,Stitch


###### Anime Audio Features Dataframe:

In [7]:
anime_audio_features_df = get_audio_features_df(anime_tracks)
anime_audio_features_df["playlist"] = "Anime"
anime_audio_features_df

Unnamed: 0,artist,album,track_name,track_id,danceability,energy,key,loudness,mode,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,playlist
0,Various Artists,熱烈!アニソン魂 THE BEST カバー楽曲集 TVアニメシリーズ「デジモンシリー...,Butter-Fly (デジモンアドベンチャー全話OP),4A3G4Chh8DTJftwJEf5tOp,0.472,0.907,2,-4.545,1,0.0546,0.0,0.0645,0.441,164.994,257950,4,Anime
1,毛毛,迪迦奥特曼,奇迹再现 - 《超人迪加》电视剧中文版片头曲,3zAieIyKyNODQn6y2sxbFq,0.457,0.864,9,-5.9,1,0.0838,0.0,0.374,0.441,159.784,236042,4,Anime
2,Akano,"Blue Bird (From ""Naruto Shippuden"")","Blue Bird (From ""Naruto Shippuden"")",1bEnIDpwKsyhDauHVoMz6t,0.594,0.868,6,-3.286,0,0.0606,0.0,0.209,0.691,151.976,100201,4,Anime
3,Katsuo Ohno,「名探偵コナン から紅の恋歌」オリジナル・サウンドトラック,名探偵コナン メイン・テーマ(から紅ヴァージョン),0RyQvri8BbQYChVVXSJnnL,0.269,0.857,5,-3.963,0,0.0495,0.654,0.218,0.689,131.912,186907,4,Anime
4,Geek Music,"Tom And Jerry Main Theme (From ""Tom And Jerry"")","Tom And Jerry Main Theme (From ""Tom And Jerry"")",5sFKd7dVlGoOuXl2usSZuW,0.544,0.536,5,-8.701,1,0.0357,0.768,0.493,0.514,155.986,65736,3,Anime
5,Various Artists,喜羊羊与灰太狼之虎虎生威电影原声大碟,别看我只是一只羊 - 国,1gn9wcyLBChQcC3q7tKK8F,0.827,0.507,0,-6.392,1,0.0433,0.0,0.068,0.696,125.996,186467,4,Anime
6,毛毛,迪迦奥特曼,永远的奥特曼 - 《迪迦奥特曼》中文片尾曲,4MskqNyksAQvAqnHEJmhTg,0.662,0.872,4,-6.223,1,0.0402,2.1e-05,0.217,0.579,130.015,197120,4,Anime
7,Various Artists,アニソンLive大全集 熱烈!アニソン魂「アニたまLive」vol.1 in AJF 2004,ウィーアー! (ワンピース),42esJ6BgSoV1DS3Onns03h,0.332,0.975,8,-6.182,1,0.106,0.0,0.981,0.268,167.874,262013,4,Anime
8,牛奶@咖啡,越长大越孤单,快乐星猫,4Qzuh2Jp6MlIGjQPtYbZqi,0.543,0.866,2,-4.925,1,0.0689,0.0,0.0236,0.453,159.976,194160,4,Anime
9,Toshio Masuda,NARUTO -ナルト- オリジナルサウンドトラック,NARUTO Main Theme,5kKloaKFvAuDNFi8m52hxy,0.525,0.766,4,-8.448,0,0.0438,0.00235,0.125,0.383,119.974,266920,4,Anime


As previously hinted, here, I combined the three playlists into one new playlist to analyze as a whole:

In [8]:
all_tracks_features = pd.concat([tj_audio_features_df,stitch_audio_features_df,anime_audio_features_df],ignore_index=True)
all_tracks_features.drop_duplicates()
all_tracks_features

Unnamed: 0,artist,album,track_name,track_id,danceability,energy,key,loudness,mode,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,playlist
0,Various Artists,It's Always Sunny In Philadelphia (Music from ...,Off Broadway,7uSWYgt3MmFjLshyRdIgPL,0.602,0.668,2,-8.71,1,0.0352,0.876,0.462,0.849,150.073,150635,4,Tom&Jerry
1,Vulfpeck,Mr Finish Line,Tee Time,3XxpR8oXqq5Km5X4LlS0pi,0.578,0.889,6,-9.708,1,0.0348,0.73,0.0984,0.868,140.986,169004,4,Tom&Jerry
2,Various Artists,SUPER EUROBEAT presents INITIAL D 〜D SELECTION 2〜,LOVE & MONEY,16211VKpjM4Hh3MfIlcu4s,0.564,0.971,9,-5.338,0,0.0362,0.132,0.223,0.961,155.437,299933,4,Tom&Jerry
3,Various Artists,SUPER EUROBEAT presents INITIAL D〜D SELECTION 3〜,NIGHT & DAY,1tYWuxQgXJqA4L4jACzjcK,0.38,0.985,1,-5.074,0,0.0548,5.3e-05,0.226,0.871,75.659,311300,4,Tom&Jerry
4,Various Artists,The Fast And The Furious: Tokyo Drift (Origina...,Speed,5EbTHYIQQtNNMloPNirkKi,0.413,0.912,4,-5.092,0,0.0878,0.0203,0.14,0.497,165.156,168587,4,Tom&Jerry
5,Fatboy Slim,The Greatest Hits: Why Try Harder,Weapon Of Choice,08kB9HSfrcIi83rymwgjMz,0.626,0.947,8,-4.608,0,0.057,0.78,0.209,0.959,195.972,219813,4,Tom&Jerry
6,Cory Wong,The Optimist,Jax,39PRMLzu72Nz6lSOCF3bWN,0.781,0.84,5,-7.689,1,0.0578,0.834,0.0505,0.698,111.995,252253,4,Tom&Jerry
7,Ricky Martin,Ricky Martin,Livin' la Vida Loca,0Ph6L4l8dYUuXFmb71Ajnd,0.425,0.954,1,-3.756,0,0.0476,0.0,0.0555,0.933,178.043,243160,4,Tom&Jerry
8,A Certain Ratio,Sextet,Lucinda,7km2akG0G3Z5STqoKrHZPf,0.69,0.902,2,-6.632,1,0.0686,0.833,0.0923,0.783,124.855,234813,4,Tom&Jerry
9,Kristofer Maddigan,Cuphead (Original Soundtrack),Fiery Frolic,02h2lqTc872LnYQ7TOFMmF,0.607,0.679,7,-6.221,0,0.114,0.659,0.336,0.715,150.229,217627,4,Tom&Jerry


The new playlist also recalculate the index after concatenation with ignore_index=True, which solved my problem from the last lab homework where it kept the original index. After merging the three playlist together, we need to make there isn't any duplicate tracks that will affect the outcome of comparison, thankful there wasn't any. I paid a special attention to making sure I did this because surprisingly found "Tom And Jerry Main Theme" in the Anime Recollection Playlist and have to make sure it did not conflict with any of the tracks in Tom Cat's Chase Music Collection.


### Part 3: Comparing Playlists

##### Description of information about the combined playlist:

In [9]:
all_tracks_features.describe()

Unnamed: 0,danceability,energy,loudness,speechiness,liveness,valence,tempo
count,33.0,33.0,33.0,33.0,33.0,33.0,33.0
mean,0.532273,0.718727,-7.856303,0.060782,0.22793,0.657155,134.495576
std,0.153014,0.235757,3.791357,0.042194,0.210112,0.273855,29.564219
min,0.182,0.105,-18.314,0.0284,0.0236,0.0351,75.659
25%,0.425,0.612,-9.85,0.0362,0.0923,0.441,116.557
50%,0.543,0.783,-6.632,0.0495,0.194,0.698,131.641
75%,0.647,0.889,-5.074,0.0606,0.283,0.902,155.986
max,0.827,0.985,-2.533,0.233,0.981,0.972,195.972


Going through the average value of each features from the combined playlist:
* Danceability: half of all the tracks are above 0.5 which is the middle range of danceability as we can see by the mean and 50 percentile.
* The averages for energy and valence are above the 0.65 threshold, this supports our hypothesis that high energy and valence is generally seen in children's show.
* The average of the tempo is approximately 134 which fits into the music definition of Allegro (fast, quick, and bright) which is from 120–156 bpm. Hence, most tracks for children does tend to have a fast tempo. This factor is going to analyze more closely by these standard later.

##### Description of information about each individual playlists:
Here, we can compare the three playlist separately to see how the three playlist return the result of the combined playlist.

The description of information separately can actually tell us more things than the combined playlist. By analyzing each individual playlist, we will know what each playlist contributed to overall data, and why it did the way it did. Additionally, we can compare each individual one to the overall one.

###### Tom & Jerry:

In [10]:
tj_audio_features_df.describe()

Unnamed: 0,danceability,energy,loudness,speechiness,instrumentalness,liveness,valence,tempo
count,13.0,13.0,13.0,13.0,13.0,13.0,13.0,13.0
mean,0.549692,0.844,-6.073308,0.071662,0.548643,0.178031,0.776,141.362769
std,0.139138,0.12406,2.333792,0.05327,0.373818,0.126165,0.242092,35.096989
min,0.38,0.612,-9.85,0.0348,0.0,0.0387,0.102,75.659
25%,0.413,0.783,-7.689,0.0476,0.132,0.0923,0.715,120.022
50%,0.578,0.889,-5.338,0.0548,0.73,0.14,0.868,150.073
75%,0.626,0.947,-4.608,0.0686,0.834,0.226,0.933,165.156
max,0.781,0.985,-2.533,0.233,0.934,0.462,0.961,195.972


The averages for energy and valence are the highest among the three playlist, as expected, and this is because we associate high energy and happiness with the fun game of chase and catch between Tom and Jerry. Similarly, the same thing can be said about tempo, chase and catch is suppose to be a game of speed, so it should reflect a fast tempo, which it did.

###### Stitch:

In [11]:
stitch_audio_features_df.describe()

Unnamed: 0,danceability,energy,loudness,speechiness,instrumentalness,liveness,valence,tempo
count,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0
mean,0.5194,0.4728,-12.174,0.04878,0.05461,0.24342,0.64431,113.2151
std,0.177884,0.234718,3.312598,0.041483,0.162889,0.216795,0.35266,20.099327
min,0.182,0.105,-18.314,0.0284,5e-06,0.0669,0.0351,78.757
25%,0.4825,0.29125,-14.227,0.02925,2.2e-05,0.105,0.36025,102.06325
50%,0.509,0.4475,-11.789,0.0357,0.000963,0.151,0.794,115.5495
75%,0.65675,0.70575,-10.1675,0.0427,0.00565,0.31675,0.9485,125.67125
max,0.73,0.756,-7.633,0.165,0.518,0.76,0.972,143.549


Personally, I was expecting a higher danceability average because in Lilo and Stitch, Elvis's songs are used for dance performances or spur dance movement in the move. However, since it's only a perception given by the movie, we can see that's not the case. This playlist actually have the lowest danceability out of all three. 

In fact, out of all the attribute looked at, Stitch's playlist have the lowest value for every one of them compared to the other two playlist. Even though this is true, that does not mean it does not support the hypothesis. This playlist still have high valence of 0.64 and moderately fast tempo of 113.2 (Allegretto which is from 112–120 bpm).

###### Anime:

In [12]:
anime_audio_features_df.describe()

Unnamed: 0,danceability,energy,loudness,speechiness,liveness,valence,tempo
count,10.0,10.0,10.0,10.0,10.0,10.0,10.0
mean,0.5225,0.8018,-5.8565,0.05864,0.27731,0.5155,146.8487
std,0.158316,0.156437,1.7675,0.022128,0.286655,0.145807,17.901911
min,0.269,0.507,-8.701,0.0357,0.0236,0.268,119.974
25%,0.46075,0.78875,-6.34975,0.043425,0.08225,0.441,130.48925
50%,0.534,0.865,-6.041,0.05205,0.213,0.4835,153.981
75%,0.5815,0.871,-4.64,0.066825,0.335,0.6615,159.928
max,0.827,0.975,-3.286,0.106,0.981,0.696,167.874


Anime Recollection playlist has the highest mean of the tempo out of the three, so we can expect that most of the songs in this playlist are all fast and upbeat. It also has high energy of 0.8, there could be an expect correlation between high energy and high tempo. On the other hand, valence is a lot lower, barely passing the middle point of 0.51. So, there would not be as great of a correlation between valence to tempo or energy.

#### Charts & Graph Comparison

The cell below determines whether a track is high or low energy and categorized tempo into slow or fast using a set threshold. 

We can tell right away if a track is high energy or fast tempo instead of looking at the number and converting it in our head to see if it is.

The tempo threshold partly follows the musical definition of tempo bpm, except it defines everything above 108 bpm as fast, so we are considering the medium pace as fast.

In [13]:
feature_based_tracks = all_tracks_features.copy() # make a copy of the DataFrame
feature_based_tracks["energy_tune"] = np.where(feature_based_tracks['energy'] >= 0.5, True, False)
feature_based_tracks["pace"] = np.where(feature_based_tracks['tempo'] <=108, "Slow", "Fast")
feature_based_tracks

Unnamed: 0,artist,album,track_name,track_id,danceability,energy,key,loudness,mode,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,playlist,energy_tune,pace
0,Various Artists,It's Always Sunny In Philadelphia (Music from ...,Off Broadway,7uSWYgt3MmFjLshyRdIgPL,0.602,0.668,2,-8.71,1,0.0352,0.876,0.462,0.849,150.073,150635,4,Tom&Jerry,True,Fast
1,Vulfpeck,Mr Finish Line,Tee Time,3XxpR8oXqq5Km5X4LlS0pi,0.578,0.889,6,-9.708,1,0.0348,0.73,0.0984,0.868,140.986,169004,4,Tom&Jerry,True,Fast
2,Various Artists,SUPER EUROBEAT presents INITIAL D 〜D SELECTION 2〜,LOVE & MONEY,16211VKpjM4Hh3MfIlcu4s,0.564,0.971,9,-5.338,0,0.0362,0.132,0.223,0.961,155.437,299933,4,Tom&Jerry,True,Fast
3,Various Artists,SUPER EUROBEAT presents INITIAL D〜D SELECTION 3〜,NIGHT & DAY,1tYWuxQgXJqA4L4jACzjcK,0.38,0.985,1,-5.074,0,0.0548,5.3e-05,0.226,0.871,75.659,311300,4,Tom&Jerry,True,Slow
4,Various Artists,The Fast And The Furious: Tokyo Drift (Origina...,Speed,5EbTHYIQQtNNMloPNirkKi,0.413,0.912,4,-5.092,0,0.0878,0.0203,0.14,0.497,165.156,168587,4,Tom&Jerry,True,Fast
5,Fatboy Slim,The Greatest Hits: Why Try Harder,Weapon Of Choice,08kB9HSfrcIi83rymwgjMz,0.626,0.947,8,-4.608,0,0.057,0.78,0.209,0.959,195.972,219813,4,Tom&Jerry,True,Fast
6,Cory Wong,The Optimist,Jax,39PRMLzu72Nz6lSOCF3bWN,0.781,0.84,5,-7.689,1,0.0578,0.834,0.0505,0.698,111.995,252253,4,Tom&Jerry,True,Fast
7,Ricky Martin,Ricky Martin,Livin' la Vida Loca,0Ph6L4l8dYUuXFmb71Ajnd,0.425,0.954,1,-3.756,0,0.0476,0.0,0.0555,0.933,178.043,243160,4,Tom&Jerry,True,Fast
8,A Certain Ratio,Sextet,Lucinda,7km2akG0G3Z5STqoKrHZPf,0.69,0.902,2,-6.632,1,0.0686,0.833,0.0923,0.783,124.855,234813,4,Tom&Jerry,True,Fast
9,Kristofer Maddigan,Cuphead (Original Soundtrack),Fiery Frolic,02h2lqTc872LnYQ7TOFMmF,0.607,0.679,7,-6.221,0,0.114,0.659,0.336,0.715,150.229,217627,4,Tom&Jerry,True,Fast


###### Bar Chart 1: The Numbers of Energy Tunes

In [14]:
alt.Chart(feature_based_tracks).mark_bar().encode(
    x='energy_tune',
    y='count()'
)

From this chart, 28 songs are energy tunes (have an "energy" score of at least 0.5) and 5 are not. As we can see clearly that most of the songs from a childhood show indeed have high energy.

###### Bar Chart 2: Slow and Fast Pace

In [15]:
alt.Chart(feature_based_tracks).mark_bar().encode(
    x='pace',
    y='count()'
)

Here we have the same case, where 28 songs are fast paced and 5 songs are slow paced. Again, the majority of the songs are fast.

###### Bar Chart 3: Relation Between Energy Tune and Pace

In [90]:
bars = alt.Chart().mark_bar().encode(
    x=alt.X('energy_tune', title=""),
    y=alt.Y('count()', title='Count'),
    color=alt.Color('energy_tune', title="High energy")
)

alt.layer(bars, data=feature_based_tracks).facet(
    column=alt.Column('pace', title = "Pace")
)

Most of fast pace songs have high energy where out of 28 fast paced songs, 25 songs are high energy and 3 are low energy. Conversely, slow paced songs are typically split half and half, where 3 songs are high energy and 2 songs are low energy out of 5 songs.

There's not a very clear relationship between energy tune and slow pace, but for fast pace if the pace is fast then there's a high chance that it will also be high energy.

###### Bar Chart 4: Count of Records by Valence

In [91]:
alt.Chart(feature_based_tracks).mark_bar().encode(
    alt.X("valence", bin=True),
    y='count()',
)

The tracks designated to children as shown tend to have high valence as there is around one third of the track that have a valence that ranges from 0.9 to 1.0 and two third of the tracks having a valence over 0.5. Songs that are sounds more positive (associating with happiness, cheerful, euphoric) are more appealing to children.

##### Correlation Using Scatter Plots
By now we identify the trends and attributes that are found appealing to children in cartoon shows, namely high energy, tempo, and valence, as expected. Now we move on to answer the next musical question: What are the correlations between these said music features? Is there a relationship between them?

###### Scatter Plot 1: Energy
This visualize both the individual and combined playlists for the attribute "energy".

In [16]:
alt.Chart(all_tracks_features).mark_point().encode(
    x=alt.X("track_name", sort=None),
    y='energy',
    color="playlist",
    tooltip=["artist", "track_name"]
).properties(
    width=1000
)

We can see Tom & Jerry have higher trend of energy, while Stitch has the lowest, which is the same result when we looked at the average of energy in the individual playlist. As a whole, most of the tracks are indeed above 0.6 (around the mean from describe function), we can say that songs designated for children's show does generally have high energy tunes.

###### Scatter Plot 2: Tempo
This illustrate both the individual and combined playlists for the attribute "tempo".

In [17]:
alt.Chart(all_tracks_features).mark_point().encode(
    x=alt.X("track_name", sort=None),
    y='tempo',
    color="playlist",
    tooltip=["artist", "track_name"]
).properties(
    width=1000
)

Tom & Jerry have a diversed tempo from ~80 bpm (slow) to ~200 bmp (fastest), while Anime has the most stable tempo that stayed in fast (from 120 to 170 bpm). Stitch has a spreadout tempo in both slow and fast. 

###### Scatter Plot 3: Correlation Between Energy and Tempo

In [18]:
alt.Chart(all_tracks_features).mark_point().encode(
    x="energy",
    y="tempo",
    color="playlist"
)

By the correlation graph, we can kind of see a trendline where higher energy tracks have faster tempo. We can justify this by calculating the Pearson correlation coefficient:

In [95]:
all_tracks_features['energy'].corr(all_tracks_features['tempo'])

0.47228186376818315

The correlation coefficient of 0.47 between these two attribute demonstrates a positive moderate correlation. The positive correlation means as energy increases, tempo also increase which supports our previous findings that high energy tunes are faster paced. Since, it's only 0.47 that means the correlation is only moderate.

Then, we apply the same analysis to relationship between valence and these two features.

###### Scatter Plot 4: Valence
First we take a look at valence overall

In [19]:
alt.Chart(all_tracks_features).mark_point().encode(
    x=alt.X("track_name", sort=None),
    y='valence',
    color="playlist",
    tooltip=["artist", "track_name"]
).properties(
    width=1000
)

We can see the majority of the tracks in Tom & Jerry and Stitch have high valence, while Anime Recollection has a medium trend of valence.

###### Scatter Plot 5: Correlation Between Energy and Valence

In [20]:
alt.Chart(all_tracks_features).mark_point().encode(
    x="energy",
    y="valence",
    color="playlist"
)

There are five tracks that formed a straight trendline, signifying a positive correlation, but datas other than that have an unusal relationship where some song of high energy have high valence and some have low valence.

In [21]:
all_tracks_features['energy'].corr(all_tracks_features['valence'])

0.353516772121192

The correlation coefficient of 0.35 justified our claim here, there is a positive correlation that as energy increases, valence also increase. But, the correlation are even weaker compared to the previous correlation we looked at.

###### Scatter Plot 6: Correlation Between Valence and Tempo

In [22]:
alt.Chart(all_tracks_features).mark_point().encode(
    x="valence",
    y="tempo",
    color="playlist"
)

By looking at the graph, we can expect a very low correlation as most data falls within the midrange with small increases.

In [100]:
all_tracks_features['tempo'].corr(all_tracks_features['valence'])

0.23165363848124895

Using the correlation coefficient that is true as we have the lowest correlation coefficient seen so far. The is a weak positive correlation between valence and tempo.

##### Conclusion on the relationship between common features (energy, tempo, valence) found in tracks for childhood shows:
There exist a correlation between these three variables, in fact all positive relationships, meaning as one variable increases so does the other other that we are comparing it to. The is a stronger correlation between energy and tempo compared to energy and valence, while tempo and valence has the weakest correlation.

#### Visualization: Radar Plots

With the radar plots we can very well visualize the notable and unsignificant features that are found in childhood shows unabridged.

##### Import Libraries and Define Functions

In [23]:
feature_columns = ["danceability", "energy", "speechiness", "liveness", "instrumentalness", "valence", "danceability"]

def createRadarElement(row, feature_cols):
    return go.Scatterpolar(
        r = row[feature_cols].values.tolist(), 
        theta = feature_cols, 
        mode = 'lines', 
        name = row['track_name'])

def get_radar_plot(playlist_id, features_list):
    current_playlist_audio_df = get_audio_features_df(pd.DataFrame(sp.playlist_items(playlist_id)))
    current_data = list(current_playlist_audio_df.apply(createRadarElement, axis=1, args=(features_list, )))  
    fig = go.Figure(current_data, )
    fig.show(renderer='iframe', width=1200, height=800)
    fig.write_image(playlist_id + '.png', width=1200, height=800)
    
def get_radar_plots(playlist_id_list, features_list):
    for item in playlist_id_list:
        get_radar_plot(item, features_list)

##### Create Radar Plots for each individual playlist

In [24]:
playlist_ids = ["16rZpzLucvgWLOEULEyKBM",
                "5taHWErP8pC1xhtSWWPYLE",
                "5szvg6CvBqEmY2axuwweNT"]
get_radar_plots(playlist_ids, feature_columns)

With the radar plot we can easily see the stand out features of the playlist.
* Tom & Jerry playlist indicates high energy, valence, and instrumentalness. I was not expecting high instrumentalness for a common feature found in children's track, but it does make sense since it's the show Tom & Jerry where they almost never speak and high instrumentalness is only found common in this playlist.
* For Stitch's playlist, energy and danceability fall along 0.7 which is higher than the medium range, and does display our wanted quality of high valence like the ones found in Tom & Jerry.
* In the Anime Recollection, it display high energy and have medium average of valence and danceability. 
Stitch and Anime playlist seem to be an alternation to the Tom & Jerry playlist where each of the playlist contain one very high value for a quality either energy or valence that's both found in Tom & Jerry playlist. The radar plots show that typically it's the attributes of high energy and valence which supports our original hypothesis. 

##### Import Libraries and Define Functions with Small Adjustment

A small adjustment was make to the get_radar_plot function, by removing sp.playlist_items. This is done because instead of inserting the playlist id, I can insert the name of what I identify the playlist as when I obtain/extract the data from the beginning. The sake of doing so, increase my personal efficiency to pull data to produce the radar plot as I don't have to look for the playlist id of each individual ones, and I can directly identify which playlist I'm analyzing at the moment.


In [25]:
feature_columns = ["danceability", "energy", "speechiness", "liveness", "instrumentalness", "valence", "danceability"]

def createRadarElement(row, feature_cols):
    return go.Scatterpolar(
        r = row[feature_cols].values.tolist(), 
        theta = feature_cols, 
        mode = 'lines', 
        name = row['track_name'])

def get_radar_plot(playlist_id, features_list):
    current_playlist_audio_df = get_audio_features_df(pd.DataFrame(playlist_id))
    # current_playlist_audio_df = get_audio_features_df(pd.DataFrame(sp.playlist_items(playlist_id)))
    # slight tweaks by removing sp.playlist_items
    current_data = list(current_playlist_audio_df.apply(createRadarElement, axis=1, args=(features_list, )))  
    fig = go.Figure(current_data, )
    fig.show(renderer='iframe', width=1200, height=800)
    fig.write_image(playlist_id + '.png', width=1200, height=800)
    
def get_radar_plots(playlist_id_list, features_list):
    for item in playlist_id_list:
        get_radar_plot(item, features_list)

This above is an attempt made for that purpose, however this cannot be applied to the previous radar plots because TypeError is return back afterwards and can't keep going to produce the next radar plot for several playlist. Although that's the case, we can still use it for one playlist, namely the playlisy that combined all three. The TypeError still displays, if we can just ignore this, because the desired radar plot is indeed produced with all tracks and features.

##### Create Radar Plot for one combined playlist

Let's compare the results when we combined the playlists

In [26]:
playlist_id = pd.concat([TomJerry_tracks,stitch_tracks,anime_tracks])
get_radar_plot(playlist_id, feature_columns)

TypeError: unsupported operand type(s) for +: 'dict' and 'str'

We can see that the attribute that we are talking about are still there, with more affirmation this time. Although it's a bit hinted in Tom & Jerry's radar plot, but an unexpected result cames as high instrumentalness is also an attribute found in children's tracks. Hence, we have proven that the trends of attributes associated with the characters in childhood shows or its related theme tracks are high energy, valence, instrumentalness, and fast tempo. There are undeniably some positive correlation between these qualities.

### Reflection

Since we have discover that tracks for children's show have some similarly qualities then we can proposed these questions:
* Can all tracks with the same attributes be used in children's show?
* Can the creator of the tracks for children follow the same pattern and achieve the same effect of appealing to children?
* Is there other qualities that are not analyzed by Spotify that they share in common?
