Recommender Systems, they are simple algorithms which aim to provide the most relevant and accurate items to the user by filtering useful stuff from of a huge pool of information base. Recommendation engines discovers data patterns in the data set by learning consumers choices and produces the outcomes that co-relates to their needs and interests. Companies nowadays are building smart and intelligent recommendation engines by studying the past behavior of their users. Hence providing them recommendations and choices of their interest in terms of “Relevant Job postings”, “Movies of Interest”, “Suggested Videos”, “Facebook friends that you may know” and “People who bought this also bought this. In this assignment, we are building Item and user Based Recommender System.

Item Based Recommender System : These types of recommender system identify similar items based on users’ previous ratings.

User Based Recommender System : In this type of recommender system the products are recommended to a user based on the fact that the products have been liked by users similar to the user.

In this assignment I am using Movielens Dataset. Dataset can be downloaded from [MovieLens dataset of 100k records](http://files.grouplens.org/datasets/movielens/ml-100k/)

Following code is my first approach to build the recommender system: 

In [None]:
import numpy as np
import pandas as pd

from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cosine, correlation

# Loading movielens data
# User's data
users_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('C:/Users/Deepika Saxena/PycharmProjects/ml-100k/u.user', sep='|', names=users_cols,
                    parse_dates=True)
# Ratings
rating_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv('C:/Users/Deepika Saxena/PycharmProjects/ml-100k/u.data', sep='\t', names=rating_cols)
# Movies
movie_cols = ['movie_id', 'title', 'release_date', 'video_release_date', 'imdb_url']
movies = pd.read_csv('C:/Users/Deepika Saxena/PycharmProjects/ml-100k/u.item', sep='|', names=movie_cols,
                     usecols=range(5), encoding='latin-1')

# Merging movie data with their ratings
movie_ratings = pd.merge(movies, ratings)
# merging movie_ratings data with the User's dataframe
df = pd.merge(movie_ratings, users)

# pre-processing
# dropping colums that aren't needed
df.drop(df.columns[[3, 4, 7]], axis=1, inplace=True)
ratings.drop("unix_timestamp", inplace=True, axis=1)
movies.drop(movies.columns[[3, 4]], inplace=True, axis=1)

# Pivot Table(This creates a matrix of users and movie_ratings)
ratings_matrix = ratings.pivot_table(index=['movie_id'], columns=['user_id'], values='rating').reset_index(drop=True)
ratings_matrix.fillna(0, inplace=True)

# specify the range of the reviews
lower_rating = ratings['rating'].min()
upper_rating = ratings['rating'].max()
print('Review Range : {0} to {1}'.format(lower_rating, upper_rating))


# Cosine Similarity(Creates a cosine matrix of similaraties ..... which is the pairwise distances
# between two items )
movie_similarity = 1 - pairwise_distances(ratings_matrix.values, metric="cosine")
np.fill_diagonal(movie_similarity, 0)
print(movie_similarity)
ratings_matrix = pd.DataFrame(movie_similarity)

# Recommender
try:
    # user_inp=input('Enter the reference movie title based on which recommendations are to be made: ')
    user_inp = "Speed (1994)"
    inp = movies[movies['title'] == user_inp].index.tolist()
    print(inp)
    inp = inp[0]
    movies['similarity'] = ratings_matrix.iloc[inp]
    movies.columns = ['movie_id', 'title', 'release_date', 'similarity']
    movies.head(5)


except:
    print("Sorry, the movie is not in the database!")

print("Recommended movies based on your choice of ", user_inp, ": \n",
      movies.sort_values(["similarity"], ascending=False)[1:10])


*I wasn't able to evaluate predictions through the above code. Therefore, I tried a different approach, as follows : *
In this approach I used surprise library. It was used to create the dataset and calculate predictions and their accuracy.

In [2]:
from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

from surprise import KNNWithMeans
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split

# Movielens-100k dataset
# UserID::MovieID::Rating::Timestamp
data = Dataset.load_builtin('ml-100k')
training_set, test_set = train_test_split(data, test_size=.15)

# Using user_based T/F to toggle between user-based or item-based collaborative filtering
# this is for user based recommender system
algo = KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': True})
algo.fit(training_set)

# we can now query for specific predicions
uid = str(196)  # raw user id
iid = str(302)  # raw item id

# evaluates prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=4, verbose=True)

# run the trained model against the test_set
test_pred = algo.test(test_set)

# get root mean squared error (RMSE)
print("User-based Model : Test Set")
accuracy.rmse(test_pred, verbose=True)

# if you wanted to evaluate on the training_set
print("User-based Model : Training Set")
train_pred = algo.test(training_set.build_test_set())
accuracy.rmse(train_pred)

# Calculating for the item based recommender system
algo = KNNWithMeans(k=50, sim_options={'name': 'pearson_baseline', 'user_based': False})
algo.fit(training_set)

# run the trained model against the testset
test_pred = algo.test(test_set)

# get RMSE
print("Item-based Model : Test Set")
accuracy.rmse(test_pred, verbose=True)

# if you wanted to evaluate on the trainset
print("Item-based Model : Training Set")
train_pred = algo.test(training_set.build_test_set())
accuracy.rmse(train_pred)


ModuleNotFoundError: No module named 'surprise'