Simple nearest neighbor based recommender system :

Given a rating matrix $(r_{(i,j)})_{i,j=1}^{i=nb\_users, j = nb\_items}$, and a user of index $i\_user$ we sample n users from the rating matrix.

Among these n users, we keep the k closest neighbors in terms of rating to our user of index $i\_user$.

We average the ratings of the k picked neighbors, and the recommended items are the items (not yet rated by user $i\_user$) with highest value.

In [1]:
import numpy as np

Below a toy function to generate rating matrixes

In [2]:
def generate_ratings(nb_rows, nb_cols, min_nb_recomm=1, max_nb_recomm=2, min_rating =1, max_rating=5, seed = 0):
  '''Function to generate test rating matrices 
  nb_rows : nb of users
  nb_cols : nb of items
  min_nb_recomm : minimum number of recommendation for each user
  max_nb_recomm : max number of recommendation for each user
  min_rating : minimum rating possible 
  max_rating : maximum rating possible
  seed : np random seed for reproducibility '''
  np.random.seed(seed)
  ratings = np.zeros((nb_rows,nb_cols))

  for i in range(nb_rows):

    nb_recomm = np.random.randint(min_nb_recomm, max_nb_recomm+1)

    indexes = np.random.choice(np.arange(nb_cols), nb_recomm, replace = False)

    for index in indexes :
      ratings[i,index] = np.random.randint(min_rating, max_rating)
  return ratings.astype(int)

ratings = generate_ratings(500,1000,2,5)

Below the nearest neighbors based recommender system

In [3]:
class nn_recommender :

  '''Nearest neighbors based recommender system'''

  def __init__(self, nb_neighbors, nb_sample):
    '''Instantiates class
    nb_neighbors : number of closest neighbors to consider to make predictions
    nb_sample : To lower computationnal cost, we sample a number nb_sample of rows from initial matrix and look from closest neighbors among these sampled rows'''
    self.nb_neighbors = nb_neighbors
    self.nb_sample = nb_sample
  
  def get_full_matrix_from_sparse(self, sparse, max_col = None):

    '''Function to get full matrix from sparse matrix with rows [user_id, item_id, rating]
    sparse : sparse matrix
    max_col : max item id'''

    max_row = np.max(sparse[:,0])+1
    max_col = max_col if max_col else (np.max(sparse[:,1])+1)

    result = np.zeros((max_row, max_col))
    
    for user_id, item_id, rating in sparse :

      result[user_id, item_id] = rating
  
    return result
  
  def get_ratings_of_user_id(self, sparse, user_id, max_item_id = None):
    '''retrieve user rating vector from sparse format matrix (rows in format [user_id, item_id, rating])
    sparse : sparse user matrix
    user_id : user_id
    max_item_id : len of output rating vector (None to set to the maximum item id in rating matrix )'''

    max_mat_item_id = max_item_id if max_item_id else (np.max(sparse[:,1]) +1) #Keep max id item ..
    ratings = sparse[sparse[:,0] == user_id]

    rating_vect = np.zeros(max_mat_item_id)

    for _, item_id, rating in ratings:
      if item_id <max_mat_item_id:
        rating_vect[item_id] = rating
    
    return rating_vect
  
  def get_recommendation(self, ratings, user_vect = None, user = None, nb_recommendations=2, sparse_shape = False, max_col = None):
    '''Get recommended items id for given user
    ratings : rating matrix
    user_vect : user ratings vector, if user is in rating matrix can be left as None
    user : user id if id is in ratings matrix, else None
    nb_recommendations : number of items to recommend
    sparse_shape : set to True if matrix is with rows of format [user_id, item_id, rating]'''
    
    if not sparse_shape :

      indexes = np.random.choice(len(ratings), self.nb_sample, replace=False)

      if user :

        indexes =[index for index in indexes if index != user]#Do not count user as one of their neighbors

        user_vect = ratings[user]

      sampled_mat = ratings[indexes]

      squared_dist = (sampled_mat-user_vect)**2
      
      distances = np.sum(squared_dist, axis = 1)

      sorted_distances_idx = np.argsort(distances)

      neighbors_indexes = sorted_distances_idx[-self.nb_neighbors:]

      for i in range(self.nb_neighbors) :

        if i == 0:
          res = ratings[neighbors_indexes[i]]
        else :
          res += ratings[neighbors_indexes[i]]
      
      res = res / self.nb_neighbors

      ordered_item_indexes = np.argsort(res)

      ordered_item_indexes = [index for index in ordered_item_indexes if user_vect[index]==0]

      recommendations = ordered_item_indexes[-nb_recommendations:]
    
    else:

      user_indexes = np.unique(ratings[:,0])

      indexes = np.random.choice(user_indexes, self.nb_sample, replace=False)
      
      if user:
        
        indexes =[index for index in indexes if index != user]

        user_vect = self.get_ratings_of_user_id(ratings, user, max_col)

      mask = np.isin(ratings[:,0], indexes)

      filtered_sparse_ratings = ratings[mask]

      full_matrix = self.get_full_matrix_from_sparse(filtered_sparse_ratings, max_col)

      squared_dist = (full_matrix-user_vect)**2
      
      distances = np.sum(squared_dist, axis = 1)

      sorted_distances_idx = np.argsort(distances)

      neighbors_indexes = sorted_distances_idx[-self.nb_neighbors:]

      for i in range(self.nb_neighbors) :

        if i == 0:
          res = full_matrix[neighbors_indexes[i]]
        else :
          res += full_matrix[neighbors_indexes[i]]
      
      res = res / self.nb_neighbors

      ordered_item_indexes = np.argsort(res)

      ordered_item_indexes = [index for index in ordered_item_indexes if user_vect[index]==0]
      recommendations = ordered_item_indexes[-nb_recommendations:]
      

    return(recommendations)


Below we define a recommender system which samples 200 users from our matrix, and keeps the 10 closest neighbors

In [4]:
rs = nn_recommender(10,200)

First example using toy generated rating matrix to get recommendations for user 300

In [5]:
rs.get_recommendation(ratings, user = 300)

[625, 868]

Generate recommendations for a user not in our dataset

In [7]:
toy_user_rating = generate_ratings(1,1000,1,5)[0]#Generate toy rating of a single user not in our dataset
rs.get_recommendation(ratings, user_vect = toy_user_rating)

[909, 941]

Use our model on Netflix movies rating test dataset

In [5]:
!gdown https://drive.google.com/uc?id=1aTq-QEQIqOOEn7MrDylAovQE0Ncms6lN

Downloading...
From: https://drive.google.com/uc?id=1aTq-QEQIqOOEn7MrDylAovQE0Ncms6lN
To: /content/netflix_test.txt
100% 21.1M/21.1M [00:00<00:00, 60.7MB/s]


In [6]:
test = np.genfromtxt("/content/netflix_test.txt", delimiter = " ", dtype = "int32")[:,[0,1,3]]

This dataset is in a sparse format : rows of this dataset are of the form [user\_id, item\_id, rating].

Below an Example of a row of this matrix

In [50]:
test[0]

array([4, 1, 4], dtype=int32)

Keep only 5000 first rows of matrix, avoid using large matrices

In [7]:
small_test = test[:5000]

Get recommendations for user of id 4

In [45]:

rs.get_recommendation(test, user = 4, sparse_shape = True)

[28, 30]

Get recommendations for a user not in dataset

In [8]:
max_item_id = np.max(small_test[:,1]) + 1
toy_user_rating = generate_ratings(1,max_item_id,5,10)[0]
rs.get_recommendation(small_test,user_vect = toy_user_rating, sparse_shape = True)

[8, 30]