# Content-based Course Recommender System Using User Profile and Course Genres

One of the most common recommendation system, is to recommend items to the users based on their profiles. A users profile shows his taste and preferences. 

For my data a users profile is shaped based on user rating, the number of times a user has clicked on different items or liked those items. 

The recommendation process is based on the similarity of the items. In order to find the similarity between the items, I a must find the similaity between the content. Content is for example the genre, category, tag, description etc.

For the content based systems the basic ingredient is to build an algorithm to create the users profile.

# Prepare the lab

In [1]:
import pandas as pd
import numpy as np
from sklearn import preprocessing

In [2]:
# create the random state
rs = 123

# Example on how to create a users profile

In [None]:
# Create the courses dataframe
course_genres = ['Python', 'Database', 'MachineLearning']
courses = [['Machine Learning with Python', 1, 0, 1], ["SQL with Python", 1, 1, 0]]
courses_df = pd.DataFrame(courses, columns = ['Title'] + course_genres)
courses_df.head(2)

Unnamed: 0,Title,Python,Database,MachineLearning
0,Machine Learning with Python,1,0,1
1,SQL with Python,1,1,0


In [4]:
# Create the rating dataframe
users = [['user0', 'Machine Learning with Python', 3], ['user1', 'SQL with Python', 2]]
users_df = pd.DataFrame(users, columns = ['User', 'Title', 'Rating'])
users_df

Unnamed: 0,User,Title,Rating
0,user0,Machine Learning with Python,3
1,user1,SQL with Python,2


In [5]:
# User 0 rated course 0 as 3 and course 1 as 0/NA (unknown or not interested)
u0 = np.array([[3, 0]])

In [6]:
# The course genre's matrix
C = courses_df[['Python', 'Database', 'MachineLearning']].to_numpy()

In [7]:
print(f"User profile vector shape {u0.shape} and course genre matrix shape {C.shape}")

User profile vector shape (1, 2) and course genre matrix shape (2, 3)


If we multiple a $1 x 2$ vector with a $2 x 3$ matrix, we will get a 1 x 3 vector representing the user profile vector.

In [8]:
u0_weights = np.matmul(u0, C)
u0_weights

array([[3, 0, 3]], dtype=int64)

In [9]:
course_genres

['Python', 'Database', 'MachineLearning']

 This `u0_weights` is also called the weighted genre vector and represents the interests of the user for each genre based on the courses they have rated. As we can see from the results, user0 seems interested in `Python` and `MachineLearning` with a rating of 3.

In [10]:
# User 1 rated course 0 as 0 (unknown or not interested) and course 1 as 2
u1 = np.array([[0, 2]])

In [11]:
u1_weights = np.matmul(u1, C)
u1_weights

array([[2, 2, 0]], dtype=int64)

As we can see from the `u1_weights` vector, user1 seems very interested in `Python` and `Database` with a value 2.

In [12]:
weights = np.concatenate((u0_weights.reshape(1, 3), u1_weights.reshape(1, 3)), axis=0)
profiles_df = pd.DataFrame(weights, columns=['Python', 'Database', 'MachineLearning'])
profiles_df.insert(0, 'user', ['user0', 'user1'])

In [13]:
profiles_df

Unnamed: 0,user,Python,Database,MachineLearning
0,user0,3,0,3
1,user1,2,2,0


# Generate recommendations

With the user profiles generated, we can see that `user0` is very interested in Python and machine learning, and `user1` is very interested in Python and database.

Now, suppose we published some new courses titled as `Python 101`, `Database 101`, and `Machine Learning with R`:

In [14]:
new_courses = [['Python 101', 1, 0, 0], ["Database 101", 0, 1, 0], ["Machine Learning with R", 0, 0, 1]]
new_courses_df = pd.DataFrame(new_courses, columns = ['Title', 'Python', 'Database', 'MachineLearning'])
new_courses_df

Unnamed: 0,Title,Python,Database,MachineLearning
0,Python 101,1,0,0
1,Database 101,0,1,0
2,Machine Learning with R,0,0,1


In [15]:
profiles_df

Unnamed: 0,user,Python,Database,MachineLearning
0,user0,3,0,3
1,user1,2,2,0


convert the course genre dataframe into a 2-D numpy array:

In [16]:
# Drop the title column
new_courses_df = new_courses_df.loc[:, new_courses_df.columns != 'Title']
course_matrix = new_courses_df.values
course_matrix

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]], dtype=int64)

In [17]:
# course matrix shape
course_matrix.shape

(3, 3)

Convert the user profile dataframe into another 2-d numpy array:


In [18]:
# Drop the user column
profiles_df = profiles_df.loc[:, profiles_df.columns != 'user']
profile_matrix = profiles_df.values
profile_matrix

array([[3, 0, 3],
       [2, 2, 0]], dtype=int64)

In [19]:
profile_matrix.shape

(2, 3)

The profile matrix is a 2 x 3 matrix and each row is a user profile vector:

If we multiply the course matrix and the user profile matrix, we can get the 2 x 3 course recommendation matrix with each element `(i, j)` representing a recommendation score of course `i` to user `j`. Intuitively, if a user `j` is interested in some topics(genres) and if a course `i` also has the same topics(genres), it means the user profile vector and course genre vector share many common dimensions and a dot product is likely to have a large value.


In [20]:
scores = np.matmul(course_matrix, profile_matrix.T)
scores

array([[3, 2],
       [0, 2],
       [3, 0]], dtype=int64)

Add the course titles and user ids back to make the results more clear:

In [21]:
scores_df = pd.DataFrame(scores, columns=['User0', 'User1'])
scores_df.index = ['Python 101', 'Database 101', 'Machine Learning with R']

In [22]:
# recommendation score dataframe
scores_df

Unnamed: 0,User0,User1
Python 101,3,2
Database 101,0,2
Machine Learning with R,3,0


# From the score results, we can see that:
- For user0, the recommended courses are `Python 101` and `Machine Learning with R` because user0 is very interested in Python and machine learning
- For user1, the recommended courses are `Python 101` and `Database 101` because user1 seems very interested in topics like Python and database