# Data gathering

In this notebook we connect to the spotifi API (spotipy) in order to retreive the dataset for our work. 
Let's go step by step into this process

In [104]:
import pip 
import os

# Only package not included with python
try : 
    import spotipy
except ImportError : 
    pip.main(['install', 'spotipy'])
    import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

import numpy as np 
import pandas as pd

## Genres selection

We decided to focus on classic, jazz, metal and rap as we beleive those are genres that greatly differs between each other, while at the same time sharing some common gray zones

In [105]:
genres = ['classic', 'jazz', 'metal', 'rap']


Loading spotify public and private key, and authentication

In [106]:
credentials = {
    'public' : 'e2b7e92cf8684577a314a8804b97337a', 
    'private': 'a847df678a5145d0a62381b255e4e4fd'
    }

client_credentials_manager = SpotifyClientCredentials(client_id=credentials['public'], client_secret=credentials['private'])
spotyCarlo = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

Interaction with the API

In [107]:
#TODO : Clean this function 
def createFrameFromUrl(url,  carlo = spotyCarlo) : 

    URI = url.split("/")[-1].split("?")[0]
    offs = 0
    feats = list()
    
    # Spotify Limit the number of items that it can be sent in a request to 100 so we have to loop adding offset untill empty body in response
    while True:
        # NOTE:  playlist_track method works only wor playlists made by users ... not genres !!!!
        track_uris = [x["track"]["uri"] for x in carlo.playlist_tracks(URI,offset=offs)["items"]]
        # Empty body
        if track_uris == [] : 
            break
        feats += carlo.audio_features(track_uris)
        offs += 100
    return pd.DataFrame(feats)


In [114]:
dF_dict = {x : [] for x in genres}

# We then extract the dataframes for each genre and for each playlist of the genres and save them into a dictionary. so it is easy to concatenate them
for genre in genres :     
    with open(os.path.join('urls', ''.join((genre, '_url.txt'))), 'r') as f:
        for uri in f.readlines():
            uri = uri.strip()
            try:
                dF_dict[genre].append(createFrameFromUrl(uri))
            except : 
                print(f"failed on {genre}")
                
    # This will raise error if nothing were retrieved
    dF_dict[genre] = pd.concat([x for x in dF_dict[genre]])
    dF_dict[genre].to_csv(genre + '.csv')

{'classic':     danceability   energy  key  loudness  mode  speechiness  acousticness  \
0         0.2750  0.15700    7   -18.752     1       0.0636         0.890   
1         0.2210  0.12600    0   -25.427     1       0.0447         0.989   
2         0.2890  0.03060    9   -30.790     0       0.0446         0.987   
3         0.0753  0.07000    2   -27.272     1       0.0440         0.918   
4         0.1300  0.15800    2   -16.132     1       0.0350         0.748   
..           ...      ...  ...       ...   ...          ...           ...   
45        0.3340  0.00732    4   -30.908     0       0.0355         0.991   
46        0.3510  0.00622    0   -33.286     0       0.0388         0.994   
47        0.3780  0.00553    4   -30.861     0       0.0759         0.995   
48        0.3780  0.01540    5   -29.740     1       0.0445         0.993   
49        0.3210  0.02720    3   -25.066     1       0.0370         0.993   

    instrumentalness  liveness  valence    tempo            typ