# Recommender Systems

Recommender systems or recommendation systems, are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item. In nowadays world, we could find this module is widely used, everytime you shop online, read books or watch movies, there would be recommendations based on your history behaviour.  

We can introduce it from a real situation. Try to assume there is a movie website. Anybody can watch movies and rank them after watching, so based on viewers' rank, we could guess what movies they would like and recommend to them. So in this note it will show you how to slove this problem.

1, Dataset:
For this specific problem, the input dataset would be like this:
<img src="https://github.com/GongliDuan/exdata-data-household_power_consumption/blob/master/5.0.png?raw=true" alt="5.0.png">

The columns mean each moive, and rows mean each viewer. The numbers are between 1 to 10, means how much they like the movie. And for the Nan, it means people never wathc or rank this movie. We need to do prediction here, to forecast the missing value and by the result to decide whether to recommend the movie to viewer.


2, Define the notation:

Let's firstly assume there are $\textbf{n}$ parameters for each movie and person.

m: number of movies

u: number of viewers

x: movie-feature matrix. It's m*n deminsional.

$\theta$: viewer-feature matrix. It's u*n deminsional.

To help understand:<img src="https://github.com/GongliDuan/exdata-data-household_power_consumption/blob/master/5.1.png?raw=true" alt="5.1.png">

y: the rating scores matrix(Input matrix). $y^{(i,j)}$ means the rating for movie i by user j

R: an binary-valued indicator matrix, where R(i,j) = 1 if user j gave a rating to movie i, and R(i,j) = 0 otherwise.

3, Collaborative filtering learning algorithm

We already know, for predictive y, it equals x * $\theta$, but either of them is for sure, so we have to estimate both of them at the same time. 

Here is the cost function with regularization item we need to minimize:
<img src="https://github.com/GongliDuan/exdata-data-household_power_consumption/blob/master/5.2.png?raw=true" alt="5.2.png">

Here is the regularized gradient we can use to finish minimization:
<img src="https://github.com/GongliDuan/exdata-data-household_power_consumption/blob/master/5.3.png?raw=true" alt="5.3.png">

###### The following is the module built by python:

In [None]:

import numpy as np
import scipy

class RS():
    
    def __init__(self,num_users,num_movies,mum_features=10,lam=1):
        
        self.num_users=num_users
        self.num_movies=num_movies
        self.mum_features=mum_features
        self.theta_matrix = np.random.rand(self.num_users,self.num_features)
        self.feature_matrix = np.random.rand(self.num_movies,self.num_features)
        self.R=None
        self.lam=lam
        self.coef=None
                
         
    def cal_cost(self,X,theta):
        # Build collaborative filtering learning algorithm
        self.R=X        
        self.R[np.isnan(self.R)]=0
        self.R[np.nonzero(self.R)]=1
        X[np.isnan(X)]=0
        theta=np.concatenate((self.theta_matrix, self.feature_matrix), axis=0)
        # theta is the combination of theta_matrix and feature_matrix
        # we combine them together to easy our optimization process
        # self.theta_matrix=theta[0:self.num_users]
        # self.feature_matrix=theta[self.num_users:]
        
        J= 0.5*sum(np.square(np.multiply(np.dot(np.transpose(theta[0:self.num_users]), theta[self.num_users:]) ,self.R)-X)) + \
        (self.lam/2)*sum(np.square(theta[0:self.num_users])) + (self.lam/2)*sum(np.square(theta[self.num_users:]))

        return J

    def cal_grad(self,X,theta):
        # Build gradient function
        self.R=X        
        self.R[np.isnan(self.R)]=0
        self.R[np.nonzero(self.R)]=1
        X[np.isnan(X)]=0
        theta=np.concatenate((self.theta_matrix, self.feature_matrix), axis=0)
        grad[0:self.num_users] = sum(np.dot(np.multiply(np.dot(np.transpose(theta[0:self.num_users]), theta[self.num_users:]) ,self.R)-X,theta[0:self.num_users])) + sum(self.lam*theta[0:self.num_users]) 
        grad[self.num_users:] =sum(np.dot(np.multiply(np.dot(np.transpose(theta[0:self.num_users]), theta[self.num_users:]) ,self.R)-X,theta[self.num_users:])) + sum(self.lam*theta[self.num_users:])
        
        return grad.ravel()
        
    def train(self,X):
        # Fit the module
        theta=np.concatenate((self.theta_matrix, self.feature_matrix), axis=0)
        result = scipy.optimize.minimize(fun=self.calc_cost, x0=theta, args=X,method='BFGS', jac=self.calc_gradient,options={"maxiter": 100, "disp": False})
        self.coef = result.x
        return self
        
    def predict(self, X):
        # preidct.
        # the output would be a movie-viewer rating score matrix just like input, but the 
        # missing score will be filled by prediction
        y=np.dot(np.transpose(self.coef[0:self.num_users]), self.coef[self.num_users:]) 
        X[np.isnan(X)]=0
        temp=np.multiply(X,y)
        y=y-temp+X
        return y
        

