## Also-View Recommendation System with Graph Theory

In practice, it is hard to gather the user to item interest score most of the time, and sometimes we might not have other information to help building the recommendation system. For example, when we need to build a recommendation model for webpages, and we only have the user cookie id trace and their visited page. We may employ the clustering method to build a basic item-item recommendation system.

To build an also-view recommender system, __item-to-item recommendation system__, we can implement the system using clustering. There are various ways to construct the clustering-based recommendation system. In this example, we are going to use the network analysis to partition the item-item graph based on the movielen dataset.



### Dataset
In this example, we will be using the movielens dataset (ml-100k).

source: https://grouplens.org/datasets/movielens/


### Package requirements

In this example, we need some extra packages to support our model.
- networkx
- community_louvain

source: https://python-louvain.readthedocs.io/en/latest/api.html

In [1]:
import numpy as np
import pandas as pd
import itertools 
import math
import time
from collections import defaultdict
import networkx as nx
import matplotlib.pyplot as plt
import community.community_louvain as community_louvain


In [2]:
data_path = '../data/ml-100k/u.data'

# load train and test data
df = pd.read_csv(data_path, delimiter = '\t', names = ['userid', 'itemid', 'rating', 'timestamp'])
df = df[['userid', 'itemid', 'rating']]



To build the graph with high correlation, we set the minimum rating to be 5.

In [3]:
min_rating= 5
rated_movie = df.drop(df[df['rating']<min_rating].index)
rated_movie.shape

(21201, 3)

In [4]:
rated_movie.head()

Unnamed: 0,userid,itemid,rating
7,253,465,5
11,286,1014,5
12,200,222,5
16,122,387,5
26,38,95,5


### Build the graph

To build the item-item network graph, we first group the data by __userid__ and generate the list of item to the corresponding user. Next we build the edge by generating the combination of the itemid list. In this practice, we set the edge weight to be the number of users rating the same pair.

In [5]:
user_itemlist = rated_movie.groupby('userid')['itemid'].apply(list)

edge_dict = defaultdict(lambda: 0)
for item_list in user_itemlist:
    item_list = sorted(item_list)
    pairs = itertools.combinations(item_list, 2)
    for pair in list(pairs):
        edge_dict[pair] += 1

len(edge_dict.keys())

174005

In [6]:
edges = [tuple([e[0], e[1], edge_dict[e]]) for e in edge_dict]

# edges contains the edge of (item1, item2, weight)
edges[:5]

[(1, 6, 2), (1, 9, 20), (1, 12, 24), (1, 13, 5), (1, 14, 11)]

In [7]:
g= nx.Graph()
g.add_weighted_edges_from(edges)
print("Total number of graph nodes:", g.number_of_nodes())
print("Total number of graph edges:", g.number_of_edges())

degrees = []
for node in g.nodes:
    degrees.append(g.degree[node])

print("Average node degree:", round(sum(degrees) / len(degrees), 2))

Total number of graph nodes: 1172
Total number of graph edges: 174005
Average node degree: 296.94


In [8]:
partitions = community_louvain.best_partition(g)
values = list(partitions.values())
print('Number of communities:', len(np.unique(values)))

Number of communities: 4


In [9]:
category_col = ["unknown","Action","Adventure","Animation","Children's","Comedy","Crime","Documentary","Drama","Fantasy","Film-Noir","Horror","Musical","Mystery","Romance","Sci-Fi","Thriller","War","Western"]

column_arr = ["movie id","movie title","release date","video release date","IMDb URL"] + category_col
item_data = pd.read_csv('../data/ml-100k/u.item', delimiter = '|', names =column_arr, encoding='latin1')

item_dict = defaultdict(lambda:[])
item_data = item_data.to_numpy()

for d in item_data:
    res = []
    for indx in range(2,len(d)):
        if d[indx] == 1:
            res.append(column_arr[indx])
    
    item_dict[d[0]] = res
    

In [10]:
res_dict = defaultdict(lambda: [])
for k in partitions:
    res_dict[partitions[k]].append(item_dict[k])


In [13]:
res_dict[0][:10]

[['Animation', "Children's", 'Comedy'],
 ['Drama'],
 ['Drama', 'Thriller'],
 ['Action', 'Adventure', 'Romance', 'Sci-Fi', 'War'],
 ['Crime', 'Drama', 'Romance', 'Thriller'],
 ['Drama'],
 ['Action', 'Adventure', 'Sci-Fi'],
 ['Drama'],
 ['Action', 'Sci-Fi', 'Thriller'],
 ['Comedy', 'Sci-Fi']]