# Item-based kNN (iknn)

The iknn method as used in [Hidasi et al. 2016a] only considers
the last element in a given session and then returns those items as recommendations that are most
similar to it in terms of their co-occurrence in other sessions. Technically, each item is encoded
as a binary vector, where each element corresponds to a session and is set to “1” in case the
item appeared in the session. The similarity of two items can then be determined, e.g., using the
cosine similarity measure, and the number of neighbours k is implicitly defined by the desired
recommendation list length. Conceptually, the method implements a certain form of a “Customers who bought . . . also bought” scheme.

In [0]:
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import operator

In [0]:
events_df = pd.read_csv('events.csv')

In [3]:
events_df.head()

Unnamed: 0,timestamp,visitorid,event,itemid,transactionid
0,1433221332117,257597,view,355908,
1,1433224214164,992329,view,248676,
2,1433221999827,111016,view,318965,
3,1433221955914,483717,view,253185,
4,1433221337106,951259,view,367447,


In [0]:
events_df['item_id'] = events_df['itemid'].astype('category').cat.codes
item_lookup = events_df[['item_id','itemid']].drop_duplicates()

In [0]:
events_df.drop(['itemid'],axis=1,inplace=True)

In [0]:
events_df['visitor_id'] = events_df['visitorid'].astype('category').cat.codes
visitor_lookup = events_df[['visitor_id','visitorid']].drop_duplicates()

In [0]:
events_df.drop(['visitorid'],axis=1,inplace=True)

Here, we consider each unique visitorid as a sessionid i.e.  an entire journey of a user is considered a session

In [0]:
item = []
session = []
data = []
for row in events_df.itertuples():
  item.append(row.item_id)
  session.append(row.visitor_id)
  data.append(1)

In [0]:
row = len(events_df.item_id.unique().tolist())
col = len(events_df.visitor_id.unique().tolist())

In [0]:
data_csr = csr_matrix((data,(item,session)), shape = (row,col))

In [0]:
sim = cosine_similarity(data_csr,dense_output=False)

In [0]:
def getrecommendation(sessionid):
  item = events_df[events_df.visitor_id == sessionid].sort_values('timestamp').item_id.tolist()
  last_item_interacted = item[-1]
  sim_item = sim[last_item_interacted]
  sim_array = sim_item.toarray()
  r,c = np.nonzero(sim_array)
  sim_score = {}
  for col in c:
    sim_score[col] = sim_array[0,col]
  sorted_sim =  sorted(sim_score.items(), key=operator.itemgetter(1),reverse=True)
  rng = 10 if len(sorted_sim)>10 else len(sorted_sim)
  for i in range(1,rng):
    print("Item : {0} with similarity {1}".format(sorted_sim[i][0],sorted_sim[i][1]))

In [44]:
getrecommendation(2)

Item : 172691 with similarity 0.1357781029751052
Item : 108948 with similarity 0.11080184472247168
Item : 39581 with similarity 0.09046995130062717
Item : 141775 with similarity 0.082107082725596
Item : 13405 with similarity 0.07495316889958614
Item : 25256 with similarity 0.07495316889958614
Item : 26368 with similarity 0.07495316889958614
Item : 30909 with similarity 0.07495316889958614
Item : 47341 with similarity 0.07495316889958614
