## My Spotify Data Analysis

The objective of this analyisis is to start with an EDA that can give me some clues about what kind of Machine Learning project could I do from this data.

For instance, I have a couple of ideas:

- Clustering the songs and finding potentianl "like" new songs of every kind
- Predicting if I would like a new song
- Make a new songs recomendation

The way you can get your personal data from Spotify, can be found in [this article](https://towardsdatascience.com/get-your-spotify-streaming-history-with-python-d5a208bbcbd3).

### 1. Importing the data

In [1]:
import ast
from typing import List
from os import listdir

What we receive from Spotify is a group of `.json` files. Let's explore the __streaming history__ by adding it to a list: 

In [2]:
def get_streamings(path: str) -> List[dict]:
    file = path + '/my_spotify_data/StreamingHistory0.json'
    
    all_streamings = []

    with open(file, 'r', encoding='UTF-8') as f:
        new_stremings = ast.literal_eval(f.read())
        all_streamings += [streaming for streaming in new_stremings]

    return all_streamings

Putting the `get_streamings` in action:

In [3]:
streamings = get_streamings('../data')

In [4]:
print(len(streamings))
print(streamings[333])

4995
{'endTime': '2021-01-25 20:53', 'artistName': 'Depeche Mode', 'trackName': 'Strangelove', 'msPlayed': 294569}


### 2. Using spotipy to conect to the Spotify API

In [2]:
import spotipy.util as util



Let's define our credentials. For this, we have to use __environment varables__ in order to not push our API keys to the GitHub repository:

In [1]:
from dotenv import load_dotenv
import os

Taking the evironment variables from `.env`:

In [3]:
load_dotenv()

True

In [4]:
username = os.getenv('username')
client_id = os.getenv('client_id')
client_secret = os.getenv('client_secret')
redirect_uri = os.getenv('redirect_uri')
scope = os.getenv('scope')

The next function will pop-up a browser window and ask for permission to use my data: 

In [7]:
token = util.prompt_for_user_token(username=username, scope=scope, client_id=client_id,
                                   client_secret=client_secret, redirect_uri=redirect_uri)

After agreeing the use of my data, we're going to have a token and a hidden file `.cache-anferben`. As the author of the article says [Vlad Gheorghe](https://medium.com/@contact_84057): 

"_it's important that you recall the_ `prompt_for_user_token` _function to load the token every time you run your script_" 

In [8]:
token

'BQDVmRt2_5kxNeLmwEK0iYjkLHJyTTVPlOsSxC-sxVzOX8lzT2Ai-Igu4QEIG_IpfCIL8UM47QcMwEzmUKSYs6c1MkrJJEXcmXDVy08BuUMZo4Bu-J4FFzsRJIZybXaPnI2qNs2aRC0dXT6RQkB2'

### 3. Getting the tracks' IDs

Since we need the IDs to __request__ the features, we have to retreive them first from the API:

In [9]:
import requests

The details of headers and params could be seen [here](https://developer.spotify.com/documentation/web-api/reference/#/operations/search).

In [12]:
def get_id(track_name: str, token: str) -> str:
    headers = {
        'Accept': 'application/json',
        'Content-Type': 'application/json',
        'Authorization': f'Bearer ' + token
    }

    params = [
        ('q', track_name),
        ('type', 'track')
    ]

    try:
        r = requests.get('https://api.spotify.com/v1/search', headers=headers,
                         params=params, timeout=5)
        
        json = r.json()
        first_result = json['tracks']['items'][0]
        track_id = first_result['id']
        
        return track_id

    except:
        return None 

Testing our `get_id` function:

In [13]:
track_id = get_id('Strangelove', token)
track_id

'6ZCyDN2ArlWB4GKAj644Cd'

### 4. Getting the tracks' features

In [12]:
import spotipy

In [13]:
def get_features(track_id: str, token: str) -> dict:
    sp = spotipy.Spotify(auth=token)

    try:
        features = sp.audio_features([track_id])
        return features[0]

    except:
        return None

Testing our `get_features` function:

In [14]:
features = get_features('6ZCyDN2ArlWB4GKAj644Cd', token)
features

{'danceability': 0.566,
 'energy': 0.961,
 'key': 4,
 'loudness': -3.46,
 'mode': 0,
 'speechiness': 0.0354,
 'acousticness': 0.044,
 'instrumentalness': 0.00106,
 'liveness': 0.352,
 'valence': 0.866,
 'tempo': 118.988,
 'type': 'audio_features',
 'id': '6ZCyDN2ArlWB4GKAj644Cd',
 'uri': 'spotify:track:6ZCyDN2ArlWB4GKAj644Cd',
 'track_href': 'https://api.spotify.com/v1/tracks/6ZCyDN2ArlWB4GKAj644Cd',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/6ZCyDN2ArlWB4GKAj644Cd',
 'duration_ms': 225880,
 'time_signature': 4}

### 5. Building a streaming history dataframe

Firstly, we'll get the unique tracks form our streaming history:

In [16]:
streamings = get_streamings('../data')
unique_tracks = [streaming['trackName'] for streaming in streamings]

First, we're adding the `track_id` to the original streamings dictionary we received from Spotify:

In [15]:
for stream in streamings:
    track = stream['trackName']
    track_id = get_id(track, token)
    if track_id:
        stream['track_id'] = track_id 

Loading that into a `.csv` file:

In [20]:
df = pd.DataFrame(streamings)
df.to_csv('../data/streams.csv', index=False)

Extracting every tarck's features:

In [25]:
all_features = {}

for track in unique_tracks:
    track_id = get_id(track, token)
    features = get_features(track_id, token)
    if features:
        all_features[track] = features

with_features = []

for track_name, features in all_features.items():
    with_features.append({'name': track_name, **features})    

HTTP Error for GET to https://api.spotify.com/v1/audio-features/?ids=3mYCd23hPxJW5okSvMoy3x with Params: {} returned 401 due to The access token expired


In [26]:
with_features[0]

{'name': 'Somos Coyotes',
 'danceability': 0.62,
 'energy': 0.832,
 'key': 5,
 'loudness': -5.526,
 'mode': 0,
 'speechiness': 0.0361,
 'acousticness': 0.00106,
 'instrumentalness': 3.24e-06,
 'liveness': 0.096,
 'valence': 0.548,
 'tempo': 105.998,
 'type': 'audio_features',
 'id': '0RGuPZmtJxMblQwZIvcNsQ',
 'uri': 'spotify:track:0RGuPZmtJxMblQwZIvcNsQ',
 'track_href': 'https://api.spotify.com/v1/tracks/0RGuPZmtJxMblQwZIvcNsQ',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/0RGuPZmtJxMblQwZIvcNsQ',
 'duration_ms': 222185,
 'time_signature': 4}

Exporting the data to a `.csv` file:

In [18]:
import pandas as pd

In [27]:
df = pd.DataFrame(with_features)
df.tail()

Unnamed: 0,name,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
1111,Sure Shot,0.692,0.799,1,-7.924,1,0.164,0.388,0.0,0.301,0.549,97.978,audio_features,21REQ1bCUWphT2QK3bLWYQ,spotify:track:21REQ1bCUWphT2QK3bLWYQ,https://api.spotify.com/v1/tracks/21REQ1bCUWph...,https://api.spotify.com/v1/audio-analysis/21RE...,199667,4
1112,Como Te Voy a Olvidar - Edit,0.712,0.735,11,-3.903,0,0.0329,0.115,0.000383,0.12,0.963,150.152,audio_features,1vNuHEbQONnN0DJUBKeUXJ,spotify:track:1vNuHEbQONnN0DJUBKeUXJ,https://api.spotify.com/v1/tracks/1vNuHEbQONnN...,https://api.spotify.com/v1/audio-analysis/1vNu...,254145,4
1113,Oh Yeah!,0.711,0.696,5,-6.927,0,0.0515,0.00885,2.2e-05,0.0755,0.734,110.904,audio_features,1jas4QjqGPF9jIYukRMKke,spotify:track:1jas4QjqGPF9jIYukRMKke,https://api.spotify.com/v1/tracks/1jas4QjqGPF9...,https://api.spotify.com/v1/audio-analysis/1jas...,169643,4
1114,Miami - Remasterizado 2008,0.685,0.783,7,-6.115,0,0.0501,0.0507,0.00612,0.107,0.72,88.964,audio_features,5BarjJ1fLPy5cV7ydaijiD,spotify:track:5BarjJ1fLPy5cV7ydaijiD,https://api.spotify.com/v1/tracks/5BarjJ1fLPy5...,https://api.spotify.com/v1/audio-analysis/5Bar...,181027,4
1115,Among the Clouds,0.174,0.0404,2,-29.213,0,0.0393,0.947,0.977,0.106,0.0397,66.343,audio_features,7KWzzOj2W7ZwkGYJAgX17D,spotify:track:7KWzzOj2W7ZwkGYJAgX17D,https://api.spotify.com/v1/tracks/7KWzzOj2W7Zw...,https://api.spotify.com/v1/audio-analysis/7KWz...,173333,4


In [28]:
df.to_csv('../data/track_features.csv', index=False)