# Deezer playlist dataset and song recommendation with word2vec

In this mini project we will develop a word2vec network and use it to build a playlist completion tool (song suggestion). The data is hosted on the following repository: http://github.com/comeetie/deezerplay.git. To know more about word2vec and these data you can read the two following references:

- Efficient estimation of word representations in vector space, Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. (https://arxiv.org/abs/1301.3781)
- Word2with applied to Recommendation: Hyperparameters Matter, H. Caselles-Dupré, F. Lesaint and J. Royo-Letelier. (https://arxiv.org/pdf/1804.04212.pdf)

The elements you have to do are highlighted in red.

## Preparation of data

The data is in the form of a playlist list. Each playlist is a list with the deezer ID of the psong followed by the artist ID.

In [None]:
import numpy as np
data = np.load("data/music_2.npy", allow_pickle=True)

# info visualization
print("number of playlists: " + str(len(data)))
print("average number of songs in a playlist: " + str(np.mean([len(p) for p in data])))
print("\nfirst row of dataset:\n" + str(data[0]))

number of playlists: 100000
average number of songs in a playlist: 24.21338

first row of dataset:
['track_3248376', 'artist_4660', 'track_68116150', 'artist_210', 'track_68116150', 'artist_210', 'track_3169189', 'artist_7188', 'track_6523608', 'artist_2961', 'track_407020492', 'artist_2961', 'track_6523613', 'artist_2961', 'track_348627211', 'artist_396485', 'track_348627221', 'artist_396485', 'track_348627231', 'artist_396485', 'track_348627241', 'artist_396485', 'track_348627251', 'artist_2961', 'track_348627261', 'artist_2961', 'track_348627271', 'artist_2961', 'track_348627281', 'artist_2961', 'track_348627291', 'artist_2961']


The dataset we are going to work on contains 100000 playlists which are composed of an average of 24.1 songs. We will start by keeping only the song identifiers. 

In [None]:
playlist_track = [list(filter(lambda w: w.split("_")[0] == u"track", playlist)) for playlist in data]
playlist_artist = [list(filter(lambda w: w.split("_")[0] == u"artist", playlist)) for playlist in data]

# info visualization
print("first row of playlist_track dataset:\n" + str(playlist_track[0]))
print("\nfirst row of playlist_artist dataset:\n" + str(playlist_artist[0]))

first row of playlist_track dataset:
['track_3248376', 'track_68116150', 'track_68116150', 'track_3169189', 'track_6523608', 'track_407020492', 'track_6523613', 'track_348627211', 'track_348627221', 'track_348627231', 'track_348627241', 'track_348627251', 'track_348627261', 'track_348627271', 'track_348627281', 'track_348627291']

first row of playlist_artist dataset:
['artist_4660', 'artist_210', 'artist_210', 'artist_7188', 'artist_2961', 'artist_2961', 'artist_2961', 'artist_396485', 'artist_396485', 'artist_396485', 'artist_396485', 'artist_2961', 'artist_2961', 'artist_2961', 'artist_2961', 'artist_2961']


In [None]:
# songs != playlists
tracks = np.unique(np.concatenate(playlist_track))
Vt = len(tracks)

# info visualization
print("number of tracks: " + str(Vt))

number of tracks: 338509


The number of different songs in this data-set is quite high with more than 300,000 songs.

## Creating a song dictionary
We will assign to each song an integer that will serve as a unique identifier and input for our network. In order to save a little bit of resources we will only work in this project on songs that appear in at least two playlists.

In [None]:
# counting occurences for each track
track_counts = dict((tracks[i], 0) for i in range(0, Vt))
for p in playlist_track:
    for track in p:
        track_counts[track] = track_counts[track] + 1

# info visualization
print("first 10 rows of track_counts dictionary:")
for key in list(track_counts.keys())[:10]:
    print(str(key) + ": " + str(track_counts[key]))

first 10 rows of track_counts dictionary:
track_100001352: 1
track_100001490: 1
track_100001878: 1
track_100001884: 12
track_100004586: 16
track_100004588: 14
track_100004590: 219
track_100004592: 13
track_100004594: 13
track_100004596: 10


In [None]:
# Filter very rare songs to save resources
playlist_track_filter = [list(filter(lambda track : track_counts[track] > 1, playlist)) for playlist in playlist_track]
# get the counts
counts = np.array(list(track_counts.values()))
# sort
order = np.argsort(-counts)
# deezed_id array
tracks_list_ordered = np.array(list(track_counts.keys()))[order]
# Vocabulary size = number of kept songs
Vt = np.where(counts[order] == 1)[0][0]         # or Vt = sum([1 for count in counts if count != 1])
# dict construction id_morceaux num_id [0,Vt]
track_dict = dict((tracks_list_ordered[i], i) for i in range(0, Vt))
# playlist conversion to list of integers
corpus_num_track = [[track_dict[track] for track in play] for play in playlist_track_filter]


# info visualization
print("first row of playlist_track_filter dataset:\n" + str(playlist_track_filter[0]))
print("\ncounts: " + str(counts))
print("indexes which sort the counts array in descending order: " + str(order))
print("\nsorted track_list:\n" + str(tracks_list_ordered))
print("\nnumber of tracks that appear at least two times: " + str(Vt))
print("\nfirst 10 rows of track dictionary:")
for key in list(track_dict.keys())[:10]:
    print(str(key) + ": " + str(track_dict[key]))
print("\nfirst row of corpus_num_track dataset: " + str(corpus_num_track[0]))

first row of playlist_track_filter dataset:
['track_3248376', 'track_68116150', 'track_68116150', 'track_3169189', 'track_6523608', 'track_407020492', 'track_6523613']

counts: [1 1 1 ... 1 1 1]
indexes which sort the counts array in descending order: [193287 194559 106371 ... 135378 135361 338508]

sorted track_list:
['track_380684541' 'track_382428781' 'track_139470659' ...
 'track_16514878' 'track_16504678' 'track_999941']

number of tracks that appear at least two times: 123241

first 10 rows of track dictionary:
track_380684541: 0
track_382428781: 1
track_139470659: 2
track_402932972: 3
track_375689861: 4
track_398649632: 5
track_403074632: 6
track_402932922: 7
track_362795841: 8
track_375437431: 9

first row of corpus_num_track dataset: [17104, 13945, 13945, 13845, 19340, 79029, 23294]


### Creation of test and validation learning sets

To learn the parameters of our method we will keep the first l-1 songs of each playlist (with l the length of the playlist) for learning. To evaluate the completion performance of our method we keep for each playlist the last two songs. The objective will be to find the last one from the next-to-last one. 



In [None]:
# playlist main part used for training
play_app  = [corpus_num_track[i][:(len(corpus_num_track[i])-1)] for i in range(len(corpus_num_track)) if len(corpus_num_track[i]) > 1]      # or play_app = [playlist[:len(playlist) - 1] for playlist in corpus_num_track if len(playlist) > 1]

# the two last elements are used for validation and training
index_tst = np.random.choice(100000, 20000)
index_val = np.setdiff1d(range(100000), index_tst)

play_tst  = np.array([corpus_num_track[i][(len(corpus_num_track[i]) - 2):len(corpus_num_track[i])] 
             for i in index_tst if len(corpus_num_track[i]) > 3])
play_val  = np.array([corpus_num_track[i][(len(corpus_num_track[i]) - 2):len(corpus_num_track[i])] 
             for i in index_val if len(corpus_num_track[i]) > 3])[:10000]


# info visualization
print("first row of play_app dataset: " + str(play_app[0]))
print("20000 test indexes less than 100000: " + str(index_tst))
print("int not contained in index_tst: " + str(index_val))
print("\nfirst 3 test pairs:")
for pair in play_tst[:3]:
    print(pair)
print("\nfirst 3 validation pairs:")
for pair in play_val[:3]:
    print(pair)

first row of play_app dataset: [17104, 13945, 13945, 13845, 19340, 79029]
20000 test indexes less than 100000: [13986 34801 62511 ... 26738 31848 84564]
int not contained in index_tst: [    0     1     2 ... 99996 99997 99999]

first 3 test pairs:
[ 163 1363]
[41116 92988]
[17166 21675]

first 3 validation pairs:
[79029 23294]
[ 532 2016]
[50537 33361]


In [None]:
# import Keras
from keras.models import Sequential, Model
from keras.layers import Embedding, Reshape, Activation, Input, Dense, Flatten
from keras.layers.merge import Dot
from keras.utils import np_utils
from keras.preprocessing.sequence import skipgrams

### hyper-parameters of word2vec :

the method word2vec needs some hyper-parameters. We are going to give them the first values, but we will refine them later:


In [None]:
# latent space dimension
vector_dim = 30
# window size
window_width = 3
# number of negative sample per positive sample
neg_sample = 5
# size o mini-batch
min_batch_size = 50
# smoothing factor for the sampling table of negative pairs 
samp_coef = 0.5
# cparameter to sub-sample frequent song
sub_samp = 0.00001

### Creation of the draw probability tables (smoothed) and unsmoothed

To draw the negative examples we need the smoothed frequencies of each song in our dataset. Likewise to under-sample very frequent pieces we need the raw frequencies. We will calculate these two vectors.

In [None]:
# get the counts
counts = np.array(list(track_counts.values()), dtype='float')[order[:Vt]]
# normalization
st = counts/np.sum(counts)
# smoothing
st_smooth = np.power(st, samp_coef)
st_smooth = st_smooth / np.sum(st_smooth)

# info visualization
print("counts of tracks which appear at least two times: " + str(counts))
print("\nfrequencies of tracks which appear at least two times:\n" + str(st))
print("\nsmoothed frequencies of tracks which appear at least two times:\n" + str(st_smooth))


counts of tracks which appear at least two times: [1898. 1805. 1673. ...    2.    2.    2.]

frequencies of tracks which appear at least two times:
[1.90676923e-03 1.81333955e-03 1.68072968e-03 ... 2.00924050e-06
 2.00924050e-06 2.00924050e-06]

smoothed frequencies of tracks which appear at least two times:
[1.56339799e-04 1.52461451e-04 1.46780856e-04 ... 5.07500463e-06
 5.07500463e-06 5.07500463e-06]


### Building the word2 network with

A word2 network with takes in input two integers corresponding to two songs, these are embedded in a latent space of dimension (vector_dim) thanks to a layer of embedding type (you will have to use the same layer to project the two pieces). Once these two vectors have been extracted, the array must calculate their scalar product normalize appleler cosine distance : 

$$cos(\theta_{ij})=\frac{z_i.z_j}{||z_i||||z_j||}$$

To carry out this treatment you will use a "dot" layer for "dot product". The model then uses a sigmoid layer to produce the output. This output will be 0 when both songs are randomly drawn from the whole dataset and 1 when they were extracted from the same playslist. <span style="color:red">You have to create the keras Track2Vec model corresponding to this architecture.</span>


In [None]:
# Two inputs: the target song and a real or negative context song
input_target = Input((1,), dtype='int32')
input_context = Input((1,), dtype='int32')

# Embedding try to compress large one-hot song vectors into much smaller vectors which preserve some of the      # meaning and context of the word
embedding = Embedding(Vt, vector_dim, input_length=1, name='embedding')
target = embedding(input_target)
context = embedding(input_context)

# The dot product is used to get the real similarity
dot_product = Dot(axes=2)([target, context])
flatten = Flatten()(dot_product)

# Output the similarity to a sigmoid layer to give us a 1 or 0 indicator which we can match with the label given # to the Context word (1 for a true context word, 0 for a negative sample).
output = Dense(1, activation='sigmoid',name="classif")(flatten)

Track2Vec = Model(inputs=[input_target, input_context], outputs=output)
Track2Vec.compile(loss='binary_crossentropy', optimizer='adam', metrics=["accuracy"])

In [None]:
Track2Vec.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 1)]          0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 1)]          0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 1, 30)        3697230     input_1[0][0]                    
                                                                 input_2[0][0]                    
__________________________________________________________________________________________________
dot (Dot)                       (None, 1, 1)         0           embedding[0][0]              

### Creation of the data generator

To learn the projection layer at the heart of our model we will build a generator of positive and negative pair examples of close or random songs from our training data. The following function will allow us to generate such examples from a playlist (seq) provided as input. This function will first build all the pairs of songs that can be extracted from the playlist if they are within (windows) distance of each other. These pairs will constitute the positive pairs. The pairs concerning very frequent songs will be removed with a probability that depends on their frequencies. Finally a number of negative examples (corresponding to neg_samples * positive number of examples) will be randomly drawn using the neg_sampling_table.

In [None]:
# function to generate word2vec positive and negative pairs 
# from an array of int that represent a text ot here a playlist
# params 
# seq : input text or playlist (array of int)
# neg_samples : number of negative sample to generate per positive ones
# neg_sampling_table : sampling table for negative samples
# sub sampling_table : sampling table for sub sampling common words songs
# sub_t : sub sampling parameter
def word2vecSampling(seq, window, neg_samples, neg_sampling_table, sub_sampling_table, sub_t):
    # vocab size
    V = len(neg_sampling_table)
    # extract positive pairs 
    positives = skipgrams(sequence=seq, vocabulary_size=V, window_size=window, negative_samples=0)
    ppairs    = np.array(positives[0])
    # sub sampling
    if (ppairs.shape[0]>0):
        f = sub_sampling_table[ppairs[:,0]]
        subprob = ((f-sub_t)/f)-np.sqrt(sub_t/f)
        tokeep = (subprob<np.random.uniform(size=subprob.shape[0])) | (subprob<0)
        ppairs = ppairs[tokeep,:]
    nbneg     = ppairs.shape[0]*neg_samples
    # sample negative pairs
    if (nbneg > 0):
        negex     = np.random.choice(V, nbneg, p=neg_sampling_table)
        negexcontext = np.repeat(ppairs[:,0],neg_samples)
        npairs    = np.transpose(np.stack([negexcontext,negex]))
        pairs     = np.concatenate([ppairs,npairs],axis=0)
        labels    = np.concatenate([np.repeat(1,ppairs.shape[0]),np.repeat(0,nbneg)])
        perm      = np.random.permutation(len(labels))
        res = [pairs[perm,:],labels[perm]]
    else:
        res=[[],[]]
    return res

<span style="color:red">Use this function to build a "track_ns_generator" of data which will generate positive and negative examples from "nbm" playlists randomly drawn from the "corpus_num" dataset provided as input. </span> 

In [None]:
import random

def track_ns_generator(corpus_num, nbm):
    while 1:
        # Extraction of nbm playlist from corpus_num
        playlists = [corpus_num[random.randint(0, len(corpus_num) - 1)] for _ in range(nbm)]
        # Creation of x and y 
        x = np.ndarray((0, 2), dtype=np.int32)
        y = np.ndarray((0), dtype=np.int32)
        for playlist in playlists:
            # For each playlist we use the word2vecSampling function to get the couple of negative or real                   # similar songs (sx) and the labels, 1 for real similar songs and 0 for negative samples (sy) 
            sx, sy = word2vecSampling(playlist, window_width, neg_sample, st_smooth, st, sub_samp)

            # Check if sx is not empty
            if len(sx) > 0:
                x = np.vstack((x, sx))
                y = np.append(y, sy)
        
        # Generate the results
        yield ((x[:, 0], x[:, 1]), y)

## Learning 
You should now be able to learn your first model with the following code. This should take between 15 and 30 min.

In [None]:
hist = Track2Vec.fit(track_ns_generator(play_app, min_batch_size), steps_per_epoch=200, epochs=60)

Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60


## Save latent space
Once the learning is done, we can save the position of the songs in the latent space with the following code:

In [None]:
# retrieve the tracks positions in the projection space
vectors_tracks = Track2Vec.get_weights()[0]
with open("result/latent_positions.npy", "wb") as f:
    np.save(f, vectors_tracks)

# info visualization
print("first track position (30 dimensions):\n" + str(vectors_tracks[0]))



first track position (30 dimensions):
[ 0.02344619  0.03580823  0.02686919 -0.01389409  0.02441807  0.01420432
 -0.01518399 -0.0377614   0.01022662 -0.04573558 -0.00276524  0.02158012
 -0.00873798 -0.00718191 -0.00675355  0.02202994  0.01106698 -0.0160887
 -0.00503298 -0.02465426 -0.01198023  0.03262806  0.00116764 -0.01458681
  0.00299946 -0.02654702 -0.02195956  0.05242296 -0.01643946 -0.01570888]


And latter load it with :

In [None]:
vectors_tracks = np.load("result/latent_positions.npy")

## Use in completion and evaluation
We can now use this space to make suggestions. <span style="color:red">Build a predict_batch function that takes as input a number vector of songs (seeds), (s) a number of suggestions to make per request, the vectors of the songs in the latent space X and a kd-tree to speed up the computation of closest neighbors. To make its propositions this function will return the indices of the s closest neighbors of each seed. </span> So that these predictions don't take too much time you will use a kd-tree (available in scikit learn) to speed up the search for nearest neighbors.

In [None]:
from sklearn.neighbors import KDTree
kdt = KDTree(vectors_tracks, leaf_size=10, metric='euclidean')

In [None]:
# The predict_batch function returns the indexes of the k-nearest neighbors of a vector of songs (seeds)
# using a KD-tree to speed up the process. In particular we use the query function.
def predict_batch(seeds, k, X, kdt):
    return kdt.query(X=X[seeds], k=k, return_distance=False)

<span style="color:red">Use this function to propose songs to complete the playlist of the validation dataset (the seeds correspond to the first column of play_val).</span>

In [None]:
indexes = predict_batch(play_val[:,0], 10, vectors_tracks, kdt)

#info visualization
print("10 closest points for " + str(play_val[0, 0]) + ": " + str(indexes[0]))

10 closest points for 79029: [ 79029  94595  59293  94617  28266  47002 106017  94594  69891  89717]


<span style="color:red">Compare these suggestions with the second column of play_val (the songs actually present). To do this you will calculate the hit@10 which is 1 if the song actually present in the playlist is one of the 10 suggestions (this score is averaged over the validation set) and the NDCG@10 (Normalized Discounted Cumulative Gain) which takes into account the order of the suggestions. This second score is worth $1/log2(k+1)$ if proposal k (k between 1 and 10) is the correct proposal and 0 if no proposal is correct. As before you will calculate the average score on the validation set. </span>


In [None]:
import math
n = len(play_val)
# This function computes the NDCGatK score, it indicates how correct the predictions are, giving importance to  # the location of the prediction
def compute_NDCGatK(indexes):
    NDCGatK = 0
    for i in range(n):
        if play_val[i, 1] in indexes[i]:
            # We take the index of the prediction
            index = list(indexes[i]).index(play_val[i, 1]) + 1

            #We use this index to calculate the score
            NDCGatK += 1 / math.log(index + 1, 2)
    # Average        
    NDCGatK /= n
    return NDCGatK

        
print("NDCG@10: " + str(compute_NDCGatK(indexes)))

NDCG@10: 0.09613142578539993


In [None]:
# This function computes the HitatK score, it indicates how correct the predictions are, without giving 
# importance to the location of the prediction
def compute_HitatK(indexes):
    # We count all the times that the song in the second column of play_val is in our predictions list,
    # then we calculate the average
    HitatK = sum([1 for i in range(n) if play_val[i, 1] in indexes[i]]) / n
    return HitatK

#info visualization
print("Hit@10: " + str(compute_HitatK(indexes)))

Hit@10: 0.1583


## hyper parameters tunning

<span style="color:red">You can now try to vary the hyper parameters to improve your performance. Pay attention to the computing time : prepare a grid with about ten different configurations and evaluate each of them on your validation set.
Evaluate the final performance of the best configuration found on the test set. Don't forget to save your results.</span>



In [None]:
###### We decided to use the NDCGatK score as the primary metric to decide which is the best model ######

# The dictionary with all the results
results = {}

# The list with the NDCGatK scores
NDCGatK_results = []

# Define the hyper parameters
hyper_parameters = {
    "epochs": range(40, 81, 20),
    "steps_per_epoch": range(200, 301, 100),
    "min_batch_size": range(50, 101, 50)
}

# Try all the permutations with the hyper parameters above
i = 0
for epochs in hyper_parameters["epochs"]:
    for steps in hyper_parameters["steps_per_epoch"]:
        for batch_size in hyper_parameters["min_batch_size"]:
            hist = Track2Vec.fit(track_ns_generator(play_app, batch_size), steps_per_epoch=steps, epochs=epochs, verbose=0)
            vectors_tracks = Track2Vec.get_weights()[0]
            indexes = predict_batch(play_val[:,0], 10, vectors_tracks, kdt)
            NDCGatK = compute_NDCGatK(indexes)
            NDCGatK_results.append(NDCGatK)
            results[i] = {
                "hyper_parameters": {
                    "min_batch_size": batch_size,
                    "steps_per_epoch": steps,
                    "epochs": epochs,
                },
                "metrics": {
                    "accuracy": hist.history["accuracy"][-1],
                    "loss": hist.history["loss"][-1],
                    "NDGC@10": NDCGatK,
                    "Hit@10": compute_HitatK(indexes),
                },
                "vectors_tracks": vectors_tracks,
            }
            i += 1


In [None]:
# The index of the best configuration, the one with maximum NDCGatK
best_index = NDCGatK_results.index(max(NDCGatK_results))

# Save the results
with open('result/best_latent_positions.npy', 'wb') as f:
    np.save(f, results[best_index]["vectors_tracks"])

# Load the results
vectors_tracks = np.load("result/best_latent_positions.npy")

# Create the KDTree
kdt = KDTree(vectors_tracks, leaf_size=10, metric='euclidean')

# info visualization
print("Index of the best configuration: " + str(best_index))

In [None]:
import pandas as pd
import csv

# Write the results in a csv file
with open('result/results.csv', 'w') as file:
    writer = csv.writer(file)
    writer.writerow(["id", "min_batch_size", "steps_per_epoch", "epochs", "accuracy", "loss", "NDGC@10", "Hit@10"])
    for id in results:
        writer.writerow([id] + [results[id][super_key][key] 
                        for super_key in results[id] if super_key != "vectors_tracks"
                            for key in results[id][super_key]])

results_df = pd.read_csv("result/results.csv")
results_df.set_index("id", inplace=True)
results_df


## Bonus, a little music

The TrackArtists file contains meta.data on the tracks and the artists for a subset of the 300,000 tracks in the dataset. We can use it to search for the number of a song from its title:

In [None]:
tr_meta = pd.read_csv("data/TracksArtists.csv")
joindf = pd.DataFrame({"track_id": tracks_list_ordered[:Vt], "index": range(Vt)})
meta = tr_meta.merge(joindf, left_on="track_id", right_on="track_id")
meta.set_index("index", inplace=True)
meta[["title", "name", "preview", "track_id"]]

In [None]:
def find_track(title):
    return meta.loc[meta["title"]==title,:].index[0]

tr = find_track("Hexagone")

# info visualization
print("index of 'Hexagone' track: " + str(tr))

## Radio

The deeezer api allows you to retrieve information about the pieces of the dataset from their deezer id. Among this information when it is available a url to listen to a free sample is provided.

In [None]:
import urllib.request, json 
def gettrackinfo(number):
    track_url =  "https://api.deezer.com/track/{}".format(tracks_list_ordered[number].split("_")[1])
    with urllib.request.urlopen(track_url) as url:
        data = json.loads(url.read().decode())
    return data
track_apidata = gettrackinfo(find_track("Hexagone"))

# info visualization
print("info about 'Hexagone' track:")
for key in track_apidata:
    print("\t" + str(key) + ": " + str(track_apidata[key]))

So we can use it to listen a preview:

In [None]:
from IPython.display import display, Audio, clear_output
display(Audio(track_apidata["preview"], autoplay=True))

<span style="color:red">Create a radio function that takes as input a track number in the dataset and launches a series of nb_track tracks by randomly pulling in the neighborhood of the current track the next track to listen to. The size of the neighborhood will be configurable and you will delete from the proposals the songs already listened to. You will handle exceptions if the track does not have an available extract. You can delete the current song with the clear_display function.</span>

In [None]:
import time
def start_radio(seed, nb_candidates, duration, nbsteps=20):
    print(meta.loc[seed, "title"])
    display(Audio(meta.loc[seed, "preview"], autoplay=True))
    time.sleep(duration)
    clear_output()
    already_played = [seed]
    listened = [meta.loc[seed, "title"]]
    while nbsteps > 0:
        try:
            # Recommended nb_candidates tracks given the seed using the predict_batch function
            recommended_tracks = predict_batch([seed], nb_candidates, vectors_tracks, kdt)[0]
            # If the current seed was already played we choose another seed from the recommended tracks
            while seed in already_played:
                seed = recommended_tracks[random.randint(1, nb_candidates - 1)]
            # Add the new seed in the list of the already played songs
            already_played.append(seed)
            print(meta.loc[seed, "title"])
            display(Audio(meta.loc[seed, "preview"], autoplay=True))
            time.sleep(duration)
            # Add the title of the song listened in the "listened" list
            listened.append(meta.loc[seed, "title"])
            nbsteps -= 1
        except:
            print("track with id " + str(seed) + " not found in meta dataframe")
            pass
        clear_output()
    print("Thanks for listening our radio!\nYou have listened the following songs:\n\t" + "\n\t".join(listened))

In [None]:
start_radio(find_track("Hexagone"), 5, 5, 10)