# GWC Pandas Challenge

The best way to learn how to program is by working on projects. We'll be accessing the Spotify API in this challenge. I will supply the code to access a playlist and extract meta-data from it (which all can be found by Google-ing it). The Spotify-data exploration project is a fun one. You can take it a step forward and build recommendation engines on your apps, or regression/classification models (it has been done before, Google it). Just make sure you collect the enough training data. We access Playlists in this code, accessing your own liked songs is a little tricky but its possible. Also, you can only access your own liked songs (in case you plan on snooping on your friends music). So get a large training data set for a machine learning model simply get the Playlist ID for a massive playlist.


## Extracting & Loading Your Spotify Playlists
First we'll connect to the Spotify API, and collect data from a playlist of your choice. This is the same data Spotify uses to suggest new music to you. Go to the [Spotify Dashboard](https://developer.spotify.com/dashboard/login), login with your Spotify credentials, create a new app, and then copy your `Client ID` and `Client Secret` over into the appropriate variables below.

In [None]:
# Importing dependencies
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.oauth2 import SpotifyOAuth
import os
import numpy as np
import pandas as pd

client_id = '' # Fill this in
client_secret = '' # Fill this in

client_credentials_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)

# Enter your Spotify username
username = "" # Fill this in

In [None]:
# Get your playlists
playlists = sp.user_playlists(username)
while playlists:
    for i, playlist in enumerate(playlists['items']):
        print("%4d %s %s" % (i + 1 + playlists['offset'], playlist['uri'],  playlist['name']))
    if playlists['next']:
        playlists = sp.next(playlists)
    else:
        playlists = None

https://open.spotify.com/playlist/7jyUrsgKKFp6YP2Y2FpaH?si=a98d2f726ab4b0e

Go to your Spotify, and pick a playlist that you'd like to explore! Get the link to your playlist, which can be found by pressing the three dots, going to `share` on the drop down and clicking `Copy link to playlist`. You can choose of yours if you'd like, or any playlist out there. Above is a link to one of my playlists, I edited the URL for privacy. Starting from the `7` after the `/`, and ending at the `H` before the `?` is the playlist ID. Extract the Playlist ID and paste that string of characters into the code below to access the playlist.

The output below looks like a bunch of giberish, but that is how JSON files look raw! Notice how the data is packaged as a Python dictionary. I will supply the code to organize the data.

In [None]:
playlist_id = ''

results = sp.user_playlist(username, playlist_id, 'tracks')
results

In [None]:
# Extracting Metadata from the Playlist
def extracting_metadata(results):
    playlist_tracks_data = results['tracks']
    playlist_tracks_id = []
    playlist_tracks_titles = []
    playlist_tracks_artists = []
    playlist_tracks_first_artists = []
    playlist_tracks_first_release_date = []
    playlist_tracks_popularity = []

    
    for track in playlist_tracks_data['items']:
        playlist_tracks_id.append(track['track']['id'])
        playlist_tracks_titles.append(track['track']['name'])
        playlist_tracks_first_release_date.append(track['track']['album']['release_date'])
        playlist_tracks_popularity.append(track['track']['popularity'])
        # adds a list of all artists involved in the song to the list of artists for the playlist
        artist_list = []
        for artist in track['track']['artists']:
            artist_list.append(artist['name'])
        playlist_tracks_artists.append(artist_list)
        playlist_tracks_first_artists.append(artist_list[0])

    
    features = sp.audio_features(playlist_tracks_id)
    features_df = pd.DataFrame(data=features, columns=features[0].keys())
    features_df['title'] = playlist_tracks_titles
    features_df['first_artist'] = playlist_tracks_first_artists
    features_df['all_artists'] = playlist_tracks_artists
    features_df['popularity'] = playlist_tracks_popularity
    features_df['release_date'] = playlist_tracks_first_release_date
    #features_df = features_df.set_index('id')
    features_df = features_df[['id', 'title', 'first_artist', 'all_artists', 'popularity', 'release_date',
                               'danceability', 'energy', 'key', 'loudness',
                               'mode', 'acousticness', 'instrumentalness',
                               'liveness', 'valence', 'tempo',
                               'duration_ms', 'time_signature']]
    
    return features_df

features_df = extracting_metadata(results)
features_df.head()

In [None]:
# [easy] Use the `.describe()` method on the dataframe
# ...

## Data Exploration

How exciting! We have your playlist loaded up into memory, and we have a lot of metadata to work with. Thankfully, Spotify automates the calculations for their features, so they have no missing data, still it doesn't hurt to double check.

In [None]:
# [easy] Check to see if there is any missing data in the dataframe
# ...


In [None]:
# [easy] Run the `decribe()` method on your dataframe
# ...


In [None]:
# [intermediate] Run a for loop to find the average of each numeric feature and save it to the appropriate variable
# Save each variable to a dictionary in the form of a key:value pair
# ...


In [None]:
# [intermediate] Filter the dataframe to an artist of your choice
# ...


# Filter the dataframe for another artist of your choice
# ...


# Use `.describe()` on both. Do you think the metrics agree with their style?
# ...


In [None]:
# [hard] Create a dataframe where every row is a unique artist and their 
# values in the corresponding columns are their averages.
# ...

In [None]:
# [intermediate] Find the top 10 most dancable songs on your playlist
# ...

# [Extra Credit] Share your top 10 most dancable songs with GWC

In [None]:
# [intermediate] Find the correlation between `popularity` and `dancability`
# ...

# Find the correlation between `popularity` and `energy`
# ...

# Find the correlation between `energy` and `tempo`
# ...

In [None]:
# [intermediate]
# The `duration_ms` column is in units of milliseconds. Covnvert it to seconds.
# ...

# Convert it to minutes:seconds format.
# ...

## Data Visualization

Numbers and statistics are nice, but who doesn't love a beautiful visualization. Try your hand at some visualizations! You might have to pip install and import some Python Visualization libraries. Get creative! 

Here is a link to the [EVERY SONG ever basically](https://open.spotify.com/playlist/0LEbhcWqOsiIlQn9HHVN4S?si=f757a27cd3bc4e43). We'll use it to create some cool visualizations, but of course, you can use any playlist you want.

The `get_playlist_tracks_more_than_100_songs` function exists because the regular function only returns metadata for a limit of 100 songs, and not for all the songs in a playlist. I built the function below so that we may extract all of the metadata for all songs in a playlist. It takes a few minutes to finish running. If you think you can build a more graceful function, by all means, try your hand.

If you want to extract MetaData from a playlist that has more than 100 songs, use the function below and pass in your username and playlist ID.

In [None]:
# Function to extract MetaData from a playlist thats longer than 100 songs
def get_playlist_tracks_more_than_100_songs(username, playlist_id):
    results = sp.user_playlist_tracks(username,playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    results = tracks    

    playlist_tracks_id = []
    playlist_tracks_titles = []
    playlist_tracks_artists = []
    playlist_tracks_first_artists = []
    playlist_tracks_first_release_date = []
    playlist_tracks_popularity = []

    for i in range(len(results)):
        print(i) # Counter
        if i == 0:
            playlist_tracks_id = results[i]['track']['id']
            playlist_tracks_titles = results[i]['track']['name']
            playlist_tracks_first_release_date = results[i]['track']['album']['release_date']
            playlist_tracks_popularity = results[i]['track']['popularity']

            artist_list = []
            for artist in results[i]['track']['artists']:
                artist_list= artist['name']
            playlist_tracks_artists = artist_list

            features = sp.audio_features(playlist_tracks_id)
            features_df = pd.DataFrame(data=features, columns=features[0].keys())
            features_df['title'] = playlist_tracks_titles
            features_df['all_artists'] = playlist_tracks_artists
            features_df['popularity'] = playlist_tracks_popularity
            features_df['release_date'] = playlist_tracks_first_release_date
            features_df = features_df[['id', 'title', 'all_artists', 'popularity', 'release_date',
                                       'danceability', 'energy', 'key', 'loudness',
                                       'mode', 'acousticness', 'instrumentalness',
                                       'liveness', 'valence', 'tempo',
                                       'duration_ms', 'time_signature']]
            continue
        else:
            try:
                playlist_tracks_id = results[i]['track']['id']
                playlist_tracks_titles = results[i]['track']['name']
                playlist_tracks_first_release_date = results[i]['track']['album']['release_date']
                playlist_tracks_popularity = results[i]['track']['popularity']
                artist_list = []
                for artist in results[i]['track']['artists']:
                    artist_list= artist['name']
                playlist_tracks_artists = artist_list
                features = sp.audio_features(playlist_tracks_id)
                new_row = {'id':[playlist_tracks_id],
               'title':[playlist_tracks_titles],
               'all_artists':[playlist_tracks_artists],
               'popularity':[playlist_tracks_popularity],
               'release_date':[playlist_tracks_first_release_date],
               'danceability':[features[0]['danceability']],
               'energy':[features[0]['energy']],
               'key':[features[0]['key']],
               'loudness':[features[0]['loudness']],
               'mode':[features[0]['mode']],
               'acousticness':[features[0]['acousticness']],
               'instrumentalness':[features[0]['instrumentalness']],
               'liveness':[features[0]['liveness']],
               'valence':[features[0]['valence']],
               'tempo':[features[0]['tempo']],
               'duration_ms':[features[0]['duration_ms']],
               'time_signature':[features[0]['time_signature']]
               }

                dfs = [features_df, pd.DataFrame(new_row)]
                features_df = pd.concat(dfs, ignore_index = True)
            except:
                continue
                
    return features_df

In [None]:
# Extracting metadata for the `EVERY SONG ever basically` playlist
playlist_id_for_every_song_ever = '0LEbhcWqOsiIlQn9HHVN4S'

every_song_basically_ever_df = get_playlist_tracks_more_than_100_songs(username, playlist_id_for_every_song_ever)

**Super Hard** *Good luck*

You will have to read up on documentation and experiment with different approaches to complete this challenge.

- Create a new column called `first_artist` where you take the first artist in the `all_artists` column and store the value there
- Create a new Dataframe where each index is a unique artist and the values under the columns represent their corresponding averages of all the songs in the `every_song_basically_ever_df` DataFrame.
- Sort this new DataFrame based on Popularity
- Filter for only the top 10 artists based on Popularity
- Create a Bar Chart with the artists name on the x-axis and their popularity on the y-axis.
- Create the same Bar Chart but now each artist has two corresponding columns, popularity, and another metric of your choice.
- If you've gotten this far please hydrate and take a break! You deserve it.


In [None]:
# ...

In [None]:
# [hard] Create a Radar Chart with one of the artists metrics
# ...

In [None]:
# [hard] Create a Radar Chart with another artists metrics, make sure their the same metrics
# ...

In [None]:
# [hard] Overlay the Radar Charts
# ...

In [None]:
# [hard] Create a Pearson's r correlation chart with the numeric categories
# ...

## GWC Visualization Competition

Pick any playlist, collect your data, and make the coolest visualization!
Once you do, post it to the **GWC Discord Server** under the `workshop-chat` channel!

You don't have to stick to Python. Check out Tableau or Power BI!

In [None]:
# ...

## Pandas Profiling

Need to make a dashboard on the fly? Pandas Profiling is certainly the way to go. It's really easy to use, and it generate profile reports on the fly that can be saved as .html files!

Here's some documentation: https://pypi.org/project/pandas-profiling/

Try it yourself!

In [None]:
# Importing dependencies
import sys

!{sys.executable} -m pip install -U pandas-profiling[notebook]
!jupyter nbextension enable --py widgetsnbextension

from pathlib import Path

import numpy as np
import pandas as pd
import requests

import pandas_profiling
from pandas_profiling.utils.cache import cache_file

In [None]:
profile_report = every_song_basically_ever_df.profile_report(html={"style": {"full_width": True}}, explorative=True)
profile_report.to_file("Every_Song_Basically_Ever_Playlist.html")