### Simple Association Rules – AR. 
To assess the strength of simple two-element co-occurrence patterns, we included a method named AR which can be considered as an association rule technique with a maximum rule size of two. Technically, we create a rule r(p,q) for every two items 'p' and 'q' that appear together in the training sessions. We determine the weight, w(p,q), of each rule simply as the number of times p and q appear together in past sessions. Given the current session 's', the AR score of a target item 'i' will be then computed as   **score_ar(i,s) = w(i,j) × 1_AR(r(i,j))** where 'j' is the last item of the current session 's' for which we want to predict the successor and AR is the set of rules and their weights as determined based on the training data. The indicator function 1_AR(ri,j) = 1 when AR contains r(i,j) and 0 otherwise. 

 ### Simple Sequential Rules – SR
 The sr method is a variant of AR, which aims to take the order of the events into account. Similar to the AR method, we create a sequential rule for the co-occurrence of every two items 'p' and 'q' as r(p,q) in the training data. This time, however, we consider the distance between 'p' and 'q' in the session when computing the weight of the rules. In our implementation, we use the multiplicative inverse as a weight function and set w(p,q) = 1/x, where 'x' is the number of items that appear between 'p'
and 'q' in a session. Other heuristics such as a linear or a logarithmic function can also be used. In case that those two items appear together in another session in the training data, the weight of the rule in that session will be added to the current weight. We finally normalize the weight and divide it by the total number of sessions that contributed to the weight. Given the current session 's', the SR score of a target item i is then computed as **score_sr(i,s) = w(j,i) × 1SR(r(j,i))** where 'j' is the last item of session 's' and SR is the set of sequential rules. The indicator function 1SR(r(j,i)) = 1 when SR contains r(j,i) and 0 otherwise.

In [11]:
import pandas as pd
import numpy as np
import gc
import dateutil.parser
from scipy.sparse import csr_matrix
import operator
from sklearn.neighbors import NearestNeighbors
from scipy.spatial.distance import correlation, cosine
from sklearn.metrics.pairwise import cosine_similarity

In [12]:
df_hist = pd.read_table('userid-timestamp-artid-artname-traid-traname.tsv', error_bad_lines = False)

  """Entry point for launching an IPython kernel.
b'Skipping line 2120260: expected 6 fields, saw 8\n'
b'Skipping line 2446318: expected 6 fields, saw 8\n'
b'Skipping line 11141081: expected 6 fields, saw 8\n'
b'Skipping line 11152099: expected 6 fields, saw 12\nSkipping line 11152402: expected 6 fields, saw 8\n'
b'Skipping line 11882087: expected 6 fields, saw 8\n'
b'Skipping line 12902539: expected 6 fields, saw 8\nSkipping line 12935044: expected 6 fields, saw 8\n'
b'Skipping line 17589539: expected 6 fields, saw 8\n'


In [13]:
df_hist.columns = ['userid', 'timestamp', 'artistid','artistname','trackid','trackname']

In [14]:
df_profile = pd.read_table('userid-profile.tsv')

  """Entry point for launching an IPython kernel.


In [15]:
user_id = df_profile['#id'].tolist()[:100]

df_hist = df_hist[df_hist['userid'].isin(user_id)]

In [16]:
df_hist['timestamp'] = df_hist['timestamp'].apply(lambda x : dateutil.parser.parse(x))

In [17]:
df_hist.sort_values(by=['userid','timestamp'], inplace=True)
cond1 = df_hist.timestamp-df_hist.timestamp.shift(1) > pd.Timedelta(5, 'm')
cond2 = df_hist.userid != df_hist.userid.shift(1)
df_hist['sessionid'] = (cond1|cond2).cumsum()

In [27]:
df_hist.head()

Unnamed: 0,sessionid,track_id
0,0,191986
1,0,226631
2,1,194672
3,2,124560
4,2,216761


In [21]:
df_hist['track_id'] = df_hist['trackname'].astype("category").cat.codes
track_lookup = df_hist[['track_id', 'trackname','artistname']].drop_duplicates()

In [23]:
df_hist.drop(['userid','timestamp','artistid','artistname','trackid','trackname'],axis=1,inplace=True)

In [24]:
df_hist.reset_index(drop=True, inplace=True)

In [26]:
df_hist['sessionid'] = df_hist['sessionid'].apply(lambda x: x-1)

In [58]:
session = df_hist.sessionid.unique().tolist()
sessn_past = {}
for ssn in session:
    sessn_past[ssn] = []
for row in df_hist.itertuples():
    sessn_past[row.sessionid].append(row.track_id)

In [66]:
all_tracks  = df_hist.track_id.unique().tolist()
track_past = {}
for track in all_tracks:
    track_past[track] = []
for key,value in sessn_past.items():
    for track in value:
        if key not in track_past[track]:
            track_past[track].append(key)

In [76]:
song_id = 191987
t_id = 226632
past = track_past[song_id]
t_past = track_past[t_id]
common = list(set(past) & set(t_past))
if len(common)>0:
    wt = 0
    for ssn in common:
        hist = sessn_past[ssn]
        idx_song = hist.index(song_id)
        idx_t = hist.index(t_id)
        diff = abs(idx_song - idx_t)
        wt+=diff
        print("Session:{0}  Song_index:{1}  Track_index:{2}  weight:{3}".format(ssn,idx_song,idx_t,diff))
    normalized_wt = wt/len(common)
print(normalized_wt)

Session:0  Song_index:0  Track_index:1  weight:1
Session:390  Song_index:0  Track_index:1  weight:1
Session:427  Song_index:0  Track_index:1  weight:1
Session:6059  Song_index:3  Track_index:4  weight:1
Session:495  Song_index:3  Track_index:4  weight:1
Session:241  Song_index:0  Track_index:1  weight:1
Session:1078  Song_index:0  Track_index:1  weight:1
Session:1273  Song_index:1  Track_index:2  weight:1
Session:766  Song_index:1  Track_index:2  weight:1
1.0


In [78]:
track_a = [] 
track_b = []
weight = []
for track in all_tracks:
    t_past = track_past[track]
    for song in all_tracks:
        if track!=song:
            s_past = track_past[song]
            common = list(set(t_past) & set(s_past))
            if len(common) > 0:
                wt = 0
                for ssn in common:
                    hist = sessn_past[ssn]
                    idx_t = hist.index(track)
                    idx_s = hist.index(song)
                    diff = abs(idx_t - idx_s)
                    wt+=diff
                normalized_wt = wt/len(common)
                track_a.append(track)
                track_b.append(song)
                weight.append(normalized_wt)

KeyboardInterrupt: 

In [86]:
track_lookup[track_lookup.track_id==191987]['trackname'].tolist()[0]

'The Launching Of Big Face'

In [105]:
def getrecommendation(sessionid):
    last_track_id = sessn_past[sessionid][-1]
    track = []
    weight = []
    t_past = track_past[last_track_id]
    for song in all_tracks:
        if last_track_id!=song:
            s_past = track_past[song]
            common = list(set(t_past) & set(s_past))
            if len(common) > 0:
                wt = 0
                for ssn in common:
                    hist = sessn_past[ssn]
                    idx_t = hist.index(last_track_id)
                    idx_s = hist.index(song)
                    diff = abs(idx_t - idx_s)
                    wt+=diff
                normalized_wt = wt/len(common)
                track.append(song)
                weight.append(normalized_wt)
    track,weight = zip(*sorted(zip(track,weight)))
    track = track[::-1]
    weight = weight[::-1]
    print("Last track listened to was: ")
    print("Track id:{0}  Track Name:{1}".format(last_track_id,track_lookup[track_lookup.track_id==last_track_id]['trackname'].tolist()[0]))
    print("Recommended songs are:")
    for i in range(len(track)):
        print("Track id:{0}  Track Name:{1}".format(track[i],track_lookup[track_lookup.track_id==track[i]]['trackname'].tolist()[0]))
    return track

In [106]:
rec_tracks = getrecommendation(0)

Last track listened to was: 
Track id:226632  Track Name:Zn Zero
Recommended songs are:
Track id:226202  Track Name:Zala
Track id:206385  Track Name:Twin Home
Track id:191987  Track Name:The Launching Of Big Face
Track id:169730  Track Name:Sleep Warm
Track id:162624  Track Name:Scum
Track id:138752  Track Name:Omstart
Track id:128127  Track Name:Music
Track id:111880  Track Name:Like A Rolling Stone
Track id:87062  Track Name:I Citizen The Loathsome
Track id:77155  Track Name:Gum
Track id:36676  Track Name:Chin Hippy
Track id:29692  Track Name:Breezin'
Track id:22371  Track Name:Beep It
Track id:14606  Track Name:Anirog D9


A lot of these recommended  songs are from same artists which is pretty intuitive as people tend to listen to songs from same artists in a session.