### Spotipy Workshop

### Step-by-step guide for Part 1
    * Get your Spotify Credentials and create an app
    * Learn about scopes
    * Login and get some basic user information
    * Learn about time horizon ['short_term', 'medium_term', 'long_term']
    * Get top of tracks and artists for different horizons 
    * Get genre distributions for different time horizons and compare it
    * Try to find out how much popular music you listen
    * Try to find out how much local music you listen

First, you will need to create a new application and obtain the following credentials:
~~~
SPOTIPY_CLIENT_ID 
SPOTIPY_CLIENT_SECRET
SPOTIPY_REDIRECT_URI
SPOTIFY_USERNAME
~~~

[Here you can start creating an app](https://developer.spotify.com/dashboard/).
You will need to edit the seetings and set up a redirect URI (*http://localhost/* is recommended):

<img src="img/spotify_app.png" width=600 alt="spotify-app-settings">


For the best experience, you need to set them as *environment variables* (you might need to do it from the terminal)

In [53]:
!export SPOTIPY_CLIENT_ID=None
!export SPOTIPY_CLIENT_SECRET=None
!export SPOTIPY_REDIRECT_URI='http://localhost/'

SPOTIPY_CLIENT_ID = None
SPOTIPY_CLIENT_SECRET = None
SPOTIPY_REDIRECT_URI = 'http://localhost/'
SPOTIFY_USERNAME = None

Now let's import the necessary libraries

In [2]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.util as util

In [101]:
from IPython.display import Image, Audio, display
from IPython.core.display import HTML 

from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Before connecting to the app, let's learn about **authorization scopes**. [Scopes](https://developer.spotify.com/documentation/general/guides/scopes/) define which information the user can share with the application. 

The common practice is to ask as little information from a user as possible. Otherwise, something like Cambridge Analytica scandal will happen again.

Here we only need to read the user's library, playlists and top tracks and artists. In Part 2 you will need to create a playlist, so you will need to define another set of scopes

In [None]:
scope = 'user-library-read playlist-read-private user-read-recently-played user-top-read'

Finally, we can connect to Spotify from the code and get an *authorization token*. 

Authorization token is a special hashcode. The *spotipy* library uses this hashcode to sign every request you will send to the server. 

The token is unique for every set of [user, scope, app] and has a lifetime of one hour. After one hour it expires and you need to get a new token in order to continue working. 

In [62]:
client_credentials_manager = SpotifyClientCredentials(client_id=SPOTIPY_CLIENT_ID, client_secret=SPOTIPY_CLIENT_SECRET) 
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
token = util.prompt_for_user_token(SPOTIFY_USERNAME, scope)
if token:
    sp = spotipy.Spotify(auth=token)
else:
    print('Check the enviroment variables, the redirect URI and try again')

Let's see if we can get the information about the chosen spotify user

In [63]:
user = sp.user(SPOTIFY_USERNAME)
print(user)
Image(url= user['images'][0]['url'])

{'display_name': 'Olga Slizovskaya', 'external_urls': {'spotify': 'https://open.spotify.com/user/veleslavia'}, 'followers': {'href': None, 'total': 11}, 'href': 'https://api.spotify.com/v1/users/veleslavia', 'id': 'veleslavia', 'images': [{'height': None, 'url': 'https://scontent.xx.fbcdn.net/v/t1.0-1/p200x200/11667329_1009521095748148_1708056228268924624_n.jpg?_nc_cat=0&oh=23ae3caecfe74fd94b861eb7a3ca3e08&oe=5BA29547', 'width': None}], 'type': 'user', 'uri': 'spotify:user:veleslavia'}


Now it's time to get the top songs and artists from your profile and analyse them!

Please, notice that you can analyse the songs within 3 different time ranges: 'short_term', 'medium_term' and 'long_term'. Long term is calculated from several years of data and including all new data as it becomes available, medium term is calculated from the last 6 months (approximately) and short term is from the last 4 weeks (approximately).

The limit is 50 tracks, but we can use offset=XX as a parameter to extract the next portion of data.

Let's begin with just storing everything we can get.

In [106]:
ranges = ['short_term', 'medium_term', 'long_term']
limit = 50 # set to max limit 50
top_artists = dict()
top_tracks = dict()
for time_range in ranges:
    top_artists[time_range] = sp.current_user_top_artists(time_range=time_range, limit=limit)['items']
    top_tracks[time_range] = sp.current_user_top_tracks(time_range=time_range, limit=limit)['items']
    print('range:', time_range)
    print('artists:', [artist['name'] for artist in top_artists[time_range]])
    print('tracks:')
    # show some tracks
    for track in top_tracks[time_range][:5]:
        print(track['artists'][0]['name'] + ' - ' + track['name'])
        if track['preview_url']:
            # we are adding .mp3 here because Audio module can't gett mimetype='audio/mpeg' properly from the url
            audio = Audio(track['preview_url']+'.mp3')
            display(audio)


range: short_term
artists: ['Kovacs', "Blackmore's Night", 'Chico Trujillo', 'OMNIA', 'ADHD', 'Zaz', 'Twisted Jukebox', 'Jenia Lubich', 'John Williams', 'АЛЁNA', 'Bob Dylan', 'FolkBeat', 'Martin Stegner', 'David Garrett', 'Avicii', 'The Hipster Orchestra', 'Lindsey Stirling', 'Caro Emerald', 'Simply Three', 'HAEVN', 'Океан Ельзи', 'Josh Vietti', 'Edvin Marton', 'ERA', 'Armin van Buuren', 'David Guetta', 'Loreena McKennitt', 'Anton & Sully', 'Waldeck', 'Imany', 'Garmarna', 'GoGo Penguin', 'Sunrise Avenue', 'Мельница', 'Би-2', 'Jeangu Macrooy', 'Ronan Hardiman', 'No Blues', 'EDX', 'Beltaine', 'Alex Christensen', 'Axwell /\\ Ingrosso', 'Philip Koutev Ensemble']
tracks:
Blackmore's Night - Home Again


Kovacs - My Love


Jax Jones - Breathe
Kovacs - He Talks That Shit


Blackmore's Night - Crowning of the King


range: medium_term
artists: ['Anton & Sully', 'The Hipster Orchestra', 'OMNIA', 'Kovacs', 'Мельница', "Blackmore's Night", 'Chico Trujillo', 'ADHD', 'Zaz', 'Twisted Jukebox', 'John Williams', 'Zedd', 'Jenia Lubich', 'АЛЁNA', 'Adrian Von Ziegler', 'Bob Dylan', 'FolkBeat', 'Coldplay', 'New York Gypsy All Stars', 'Martin Stegner', 'Netta', 'Ludovico Einaudi', 'Solas', 'Shakira', 'The Velvet Underground', 'Armin van Buuren', 'The Piano Guys', 'David Guetta', 'Avicii', 'David Garrett', 'Helen Jane Long', 'Stephen Walker', 'Irish Songs Music', 'Lindsey Stirling', 'Break of Reality', 'Aitana', 'Roberto Cacciapaglia', 'Daft Punk', 'La Raíz', 'David Bowie', 'Slovak Philharmonic', 'Caro Emerald', 'Simply Three', 'HAEVN', 'Океан Ельзи', 'Stephen Carolan', 'Josh Vietti', 'Martino Vergnaghi', 'Edvin Marton', 'Darren Checkley']
tracks:
Aitana - Lo Malo
Blackmore's Night - Home Again


Beltaine - Burning Pipers Hut


Anton & Sully - Reels, Pt. 1 (113)


Calambres - María del Serrat


range: long_term
artists: ['Adrian Von Ziegler', 'Yuriy Vizbor', 'АЛЁNA', 'Armin van Buuren', 'Ramin Djawadi', 'Мельница', '77 Bombay Street', 'Atli Örvarsson', 'Anton & Sully', 'Океан Ельзи', 'The Beatles', 'Justin Hurwitz', 'Paraskevas Grekis', 'Bruno Coulais', 'Daft Punk', 'Emancipator', 'The Velvet Underground', 'David Garrett', 'Helen Jane Long', 'The Hipster Orchestra', 'Рекорд Оркестр', 'Imagine Dragons', 'Patrick Fiori', 'FolkBeat', 'David Guetta', 'OMNIA', 'Garou', 'Kovacs', "Blackmore's Night", 'Adele', 'Zedd', 'Coldplay', 'Bruno Pelletier', 'Keiko Matsui', 'Daniel Lavoie', 'Sting', 'Within Temptation', 'Chico Trujillo', 'Виктор Цой', 'Кино', 'Celtic Harp Soundscapes', 'Warsaw National Philharmonic Orchestra', 'ADHD', 'Data Romance', 'LIlia Vera', "The O'Neill Brothers Group", 'The Young Wolfe Tones', 'Ryan Gosling', 'Hélène Ségara', 'Armando Manzanero']
tracks:
Муся Тотибадзе - Баллада о детях Большой Медведицы (Из к/ф "Территория")


Julie Fowlis - Touch the Sky
Carlos Vives - La Bicicleta


LIlia Vera - Pueblos Tristes


Beyoncé - I Was Here


**Important:** Before going to the next step, **try to explore the obtained data by yourself and see what info you can get from there.**

For example: ** TODO : ADD THE INFO HERE **

In [108]:
top_tracks[time_range][0]

{'album': {'album_type': 'SINGLE',
  'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/1BwdRE3agFM87VCeaOFPYY'},
    'href': 'https://api.spotify.com/v1/artists/1BwdRE3agFM87VCeaOFPYY',
    'id': '1BwdRE3agFM87VCeaOFPYY',
    'name': 'Муся Тотибадзе',
    'type': 'artist',
    'uri': 'spotify:artist:1BwdRE3agFM87VCeaOFPYY'}],
  'external_urls': {'spotify': 'https://open.spotify.com/album/7HFWGPmwdaJ8R8YyVzTtkF'},
  'href': 'https://api.spotify.com/v1/albums/7HFWGPmwdaJ8R8YyVzTtkF',
  'id': '7HFWGPmwdaJ8R8YyVzTtkF',
  'images': [{'height': 640,
    'url': 'https://i.scdn.co/image/27b2ca6e49f99ae8106068bd5d5ab9f7f16ae34f',
    'width': 640},
   {'height': 300,
    'url': 'https://i.scdn.co/image/fc06cafba804f896e36ec1799404d16ad7037677',
    'width': 300},
   {'height': 64,
    'url': 'https://i.scdn.co/image/1f130b1860fb1d5527714bd166207642ded855f2',
    'width': 64}],
  'name': 'Баллада о детях Большой Медведицы',
  'release_date': '2015-12-18',
  'release_date

In [109]:
top_artists[time_range][0]

{'external_urls': {'spotify': 'https://open.spotify.com/artist/6qDQ6rJG5TOT7PppxBKhqI'},
 'followers': {'href': None, 'total': 54075},
 'genres': ['medieval folk'],
 'href': 'https://api.spotify.com/v1/artists/6qDQ6rJG5TOT7PppxBKhqI',
 'id': '6qDQ6rJG5TOT7PppxBKhqI',
 'images': [{'height': 640,
   'url': 'https://i.scdn.co/image/ced943c3896d876067300887b58c51352b530618',
   'width': 640},
  {'height': 300,
   'url': 'https://i.scdn.co/image/65244e5c95896ae21b0d4c125066227a52c13c5d',
   'width': 300},
  {'height': 64,
   'url': 'https://i.scdn.co/image/dbab4e63dc4e1553e79ddf8ba589c120c41902eb',
   'width': 64}],
 'name': 'Adrian Von Ziegler',
 'popularity': 50,
 'type': 'artist',
 'uri': 'spotify:artist:6qDQ6rJG5TOT7PppxBKhqI'}

Let's do some analysis finally and get to know which genres you listen! (Also, how much local and popular songs)

** TODO: THIS BLOCK DOES IT **

In [112]:
genres = dict()
artist_popularities = dict()
track_popularities = dict()
local_tracks = Counter()
for time_range in ranges:
    genres[time_range] = Counter()
    artist_popularities[time_range] = list()
    for artist in top_artists[time_range]:
        for genre in artist['genres']:
            genres[time_range][genre] += 1
        artist_popularities[time_range].append(artist['popularity'])
    
    track_popularities[time_range] = list()
    for track in top_tracks[time_range]:
        track_popularities[time_range].append(track['popularity'])
        if track['is_local']:
            local_tracks[time_range] += 1

We can now visualise the top genres for each time range

In [132]:
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
init_notebook_mode(connected=True)

# set max number of genres to display
top_genres_limit = 15
data = list()
for time_range in ranges:
    data.append(go.Bar(x=[item[1] for item in genres[time_range].most_common(top_genres_limit)],
                   y=[item[0] for item in genres[time_range].most_common(top_genres_limit)],
                   orientation = 'h',
                   name=time_range))

layout = go.Layout(
    barmode='group',
    yaxis=dict(
        title='Genre Name',
        showticklabels=True
    ),
    margin = dict(
        r = 10,
        t = 25,
        b = 40,
        l = 150
    )
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='genres-grouped-bar')

The popularities distribution for tracks and artists:

In [196]:
import plotly.figure_factory as ff

# Add histogram data
hist_data = [value for value in track_popularities.values()]
group_labels = [key for key in track_popularities.keys()]

fig = ff.create_distplot(hist_data, group_labels, bin_size=1)
fig['layout']['xaxis1'].update(title='Track popularity index')
fig['layout']['yaxis1'].update(title='Track proportion')
fig['layout'].update(title='Track popularities distribution')

iplot(fig, filename='track-popularities-distribution')

Based on the previous examples, make a histogram with artist popularities. You have to complete the lines marked as # MISSING CODE HERE

In [None]:
# Add histogram data
hist_data = [] # MISSING CODE HERE
group_labels = [] # MISSING CODE HERE

fig = ff.create_distplot(hist_data, group_labels, bin_size=1)
fig['layout']['xaxis1'].update(title='Artist popularity index')
fig['layout']['yaxis1'].update(title='Artist proportion')
fig['layout'].update(title='Artist popularities distribution')

iplot(fig, filename='artist-popularities-distribution')

Based on the previous examples, make a pie chart with the proportions of local tracks for each time range:

In [137]:
local_tracks
## You code here

Counter()

Now let's move to recommending some new songs with Spotify recommendations, audio features and a custom classifier!

### End of Part 1