Artist Model 

In the following code, we open the file, create a mapping from a song ID to the number of times this song appears, and close the file.

In [92]:
import numpy as np 
import pandas as pd

In [93]:
f = open('/Users/sinasaba/Desktop/MillionSongSubset/AdditionalFiles/kaggle/kaggle_visible_evaluation_triplets.txt', 'r')
song_to_count = dict() 
for line in f: 
    _, song, _ = line.strip().split('\t')
    if song in song_to_count: 
        song_to_count[song] += 1 
    else:
        song_to_count[song] = 1
        
f.close() 

we reorder the songs by decreasing popularity:

In [94]:
songs_orderd = sorted(song_to_count.keys(), 
                     key = lambda s: song_to_count[s], 
                     reverse = True)

We will recommend the most popular songs to every user, but we must filter out songs already in the user’s library. Reopening the triplets file, we will create a map from user to songs they have listened to.

In [95]:
f = open('/Users/sinasaba/Desktop/MillionSongSubset/AdditionalFiles/kaggle/kaggle_visible_evaluation_triplets.txt', 'r')
user_to_songs = dict()
for line in f: 
    user, song, _ = line.strip().split('\t')
    if user in user_to_songs: 
        user_to_songs[user].add(song)
    else:
        user_to_songs[user] = set([song])
f.close()

Ok, we now have the songs ordered by popularity, and listening history for each user. To produce our submission file, we’ll need to load the canonical ordering of users:

In [96]:
f = open('/Users/sinasaba/Desktop/MillionSongSubset/AdditionalFiles/kaggle/kaggle_users.txt', 'r')
canonical_users = map(lambda line: line.strip(), f.readlines()) 
f.close()

In [97]:
canonical_users = list(canonical_users)

We are almost there, but we're missing one more thing. To reduce the size of submission files, we do not submit a list of song IDs such as SOSOUKN12A8C13AB79, but rather their index in the canonical list of songs.
Let's create the map from song ID to song index (for those unfamiliar with python, this line is even more magic than before).

In [98]:
g = open('/Users/sinasaba/Desktop/MillionSongSubset/AdditionalFiles/kaggle/kaggle_songs.txt', 'r')
song_to_index = dict(map(lambda line: line.strip().split(' '), g.readlines()))
g.close()

now we define a dic to map song id to artist 

In [99]:
f = open('/Users/sinasaba/Desktop/MillionSongSubset/AdditionalFiles/kaggle/unique_tracks.txt', 'r')
sondId_to_artists = dict()
for line in f: 
    track_id, song_id, artist, title = line.strip().split('<SEP>')
    sondId_to_artists[song_id] = artist

f.close()

Now we use previous dic to map the artists to each user id

In [100]:
user_to_artists = dict()
for user in user_to_songs.keys():
    for song in user_to_songs[user]:
        if user in user_to_artists:
            user_to_artists[user].add(sondId_to_artists[song])
        else:
            user_to_artists[user] = set([sondId_to_artists[song]])

Finally, we are ready to create the submission file. For each user in the canonical list, recommend the songs in order of popularity, except those already in the user’s profile. also in this case we check if the user have that artist in his or her history, for now we need to following dict for reordering the songs 

In [101]:
index_orderd = dict()
for i in range(len(songs_orderd)): 
    index_orderd[songs_orderd[i]] = i

the following dict will map each artist to his or her songs 

In [102]:
artist_songs = dict() 

In [103]:
for song in songs_orderd: 
    if sondId_to_artists[song] in artist_songs.keys(): 
        artist_songs[sondId_to_artists[song]].add(song)
    else:
        artist_songs[sondId_to_artists[song]] = set([song])

And finally we are ready. here we search for the other songs from the same artist that a user had listen to. then we put them in order and add other popular songs if we need more.

In [104]:
f = open('/Users/sinasaba/Desktop/MillionSongSubset/AdditionalFiles/kaggle/submission_artist.txt', 'w')
for user in list(canonical_users):
    songs_to_recommend = []

    new_order = []   
    artists = user_to_artists[user]
    
    for artist in artists: 
        for song in artist_songs[artist]: 
            new_order.append(song)
    
    new_order = sorted(new_order, key = lambda s: index_orderd[s], reverse = False)
            
    for song in songs_orderd:
        if len(new_order) >= 600:
                break
        if not song in new_order:  
            new_order.append(song)
            
    for song in new_order:
        if len(songs_to_recommend) >= 500:
            break
        if not song in user_to_songs[user]:
            songs_to_recommend.append(song)
    # Transform song IDs to song indexes
    indices = map(lambda s: song_to_index[s],
                  songs_to_recommend)
    # Write line for that user
    f.write(' '.join(indices) + '\n')
    
f.close()