# Movie Recommender Modeling

author: Ben Sturm <br />
contact: bwsturm@gmail.com <br />
date: 6/16/2018

In this notebook, I plan to implement Andrew Ng's Collaborative Filtering method.  This is basically doing gradient descent using a method called Low rank matrix factorization.

In [1]:
import pandas as pd
import numpy as np

In [2]:
npzfile = np.load('Movie_data.npz')
print('npzfile.files: {}'.format(npzfile.files))

npzfile.files: ['Y', 'R']


In [3]:
Y = npzfile['Y']
R = npzfile['R']

In [4]:
# The "parameters" we are minimizing are both the elements of the
# X matrix (nm*nf) and of the Theta matrix (nu*nf)
# To use off-the-shelf minimizers we need to flatten these matrices
# into one long array
def flattenParams(myX, myTheta):
    """
    Hand this function an X matrix and a Theta matrix and it will flatten
    it into into one long (nm*nf + nu*nf,1) shaped numpy array
    """
    return np.concatenate((myX.flatten(),myTheta.flatten()))

# A utility function to re-shape the X and Theta will probably come in handy
def reshapeParams(flattened_XandTheta, mynm, mynu, mynf):
    assert flattened_XandTheta.shape[0] == int(nm*nf+nu*nf)
    
    reX = flattened_XandTheta[:int(mynm*mynf)].reshape((mynm,mynf))
    reTheta = flattened_XandTheta[int(mynm*mynf):].reshape((mynu,mynf))
    
    return reX, reTheta

Regularized cost function for collaborative filtering

In [5]:
def cofiCostFunc(myparams, myY, myR, mynu, mynm, mynf, mylambda = 0.):
    
    # Unfold the X and Theta matrices from the flattened params
    myX, myTheta = reshapeParams(myparams, mynm, mynu, mynf)
  
    # Note: 
    # X Shape is (nm x nf), Theta shape is (nu x nf), Y and R shape is (nm x nu)
    # Behold! Complete vectorization
    
    # First dot theta and X together such that you get a matrix the same shape as Y
    term1 = myX.dot(myTheta.T)
    
    # Then element-wise multiply that matrix by the R matrix
    # so only terms from movies which that user rated are counted in the cost
    term1 = np.multiply(term1,myR)
    
    # Then subtract the Y- matrix (which has 0 entries for non-rated
    # movies by each user, so no need to multiply that by myR... though, if
    # a user could rate a movie "0 stars" then myY would have to be element-
    # wise multiplied by myR as well) 
    # also square that whole term, sum all elements in the resulting matrix,
    # and multiply by 0.5 to get the cost
    cost = 0.5 * np.sum( np.square(term1-myY) )
    
    # Regularization stuff
    cost += (mylambda/2.) * np.sum(np.square(myTheta))
    cost += (mylambda/2.) * np.sum(np.square(myX))
    
    return cost

Regularized gradient function for collaborative filtering

In [6]:
# Remember: use the exact same input arguments for gradient function
# as for the cost function (the off-the-shelf minimizer requires this)
def cofiGrad(myparams, myY, myR, mynu, mynm, mynf, mylambda = 0.):
    
    # Unfold the X and Theta matrices from the flattened params
    myX, myTheta = reshapeParams(myparams, mynm, mynu, mynf)

    # First the X gradient term 
    # First dot theta and X together such that you get a matrix the same shape as Y
    term1 = myX.dot(myTheta.T)
    # Then multiply this term by myR to remove any components from movies that
    # weren't rated by that user
    term1 = np.multiply(term1,myR)
    # Now subtract the y matrix (which already has 0 for nonrated movies)
    term1 -= myY
    # Lastly dot this with Theta such that the resulting matrix has the
    # same shape as the X matrix
    Xgrad = term1.dot(myTheta)
    
    # Now the Theta gradient term (reusing the "term1" variable)
    Thetagrad = term1.T.dot(myX)

    # Regularization stuff
    Xgrad += mylambda * myX
    Thetagrad += mylambda * myTheta
    
    return flattenParams(Xgrad, Thetagrad)

Now reading in the movies csv file using pandas.

In [7]:
movies = pd.read_csv('movie_ids.csv')

In [8]:
movies.tail()

Unnamed: 0,index,movieId,title,genres
3491,3491,142488,Spotlight (2015),Thriller
3492,3492,146656,Creed (2015),Drama
3493,3493,148626,"Big Short, The (2015)",Drama
3494,3494,152077,10 Cloverfield Lane (2016),Thriller
3495,3495,152081,Zootopia (2016),Action|Adventure|Animation|Children|Comedy


In [9]:
# I'm now going to rate some movies

nm, nu = np.shape(Y)

In [11]:
my_ratings = np.zeros((nm,1))

# My first set of ratings
'''
my_ratings[16] = 4.5
my_ratings[58] = 4
my_ratings[7182] = 4
my_ratings[203] = 1
my_ratings[302] = 4
my_ratings[3854] = 5
my_ratings[3871] = 4
my_ratings[4601] = 5
my_ratings[1001] = 4.5
my_ratings[535] = 2
my_ratings[866] = 5
my_ratings[1000] = 4
my_ratings[237] = 4.5
my_ratings[5604] = 3.5
my_ratings[7594] = 3
my_ratings[4098] = 4
my_ratings[1484] = 4
my_ratings[1479] = 4.5
my_ratings[2492] = 4.5
my_ratings[4512] = 4.5
my_ratings[3228] = 4
my_ratings[5523] = 4.5
my_ratings[6684] = 5
my_ratings[5913] = 1
'''

# Second set of ratings to check if I get similar results to the tutorial notebook
my_ratings[52] = 4
my_ratings[104] = 4
my_ratings[211] = 3
my_ratings[232] = 4
my_ratings[248] = 3
my_ratings[613] = 5
my_ratings[724] = 4
my_ratings[725] = 5
my_ratings[160] = 1

In [12]:
# Add my ratings to the Y matrix, and the relevant row to the R matrix
myR_row = my_ratings > 0
Y = np.hstack((Y,my_ratings))
R = np.hstack((R,myR_row))

In [13]:
nm, nu = np.shape(Y)

In [14]:
def normalizeRatings(myY, myR):
    """
    Preprocess data by subtracting mean rating for every movie (every row)
    This is important because without this, a user who hasn't rated any movies
    will have a predicted score of 0 for every movie, when in reality
    they should have a predicted score of [average score of that movie].
    """

    # The mean is only counting movies that were rated
    Ymean = np.sum(myY,axis=1)/np.sum(myR,axis=1)
    Ymean = Ymean.reshape((Ymean.shape[0],1))
    
    return myY-Ymean, Ymean    

In [15]:
Ynorm, Ymean = normalizeRatings(Y,R)

In [16]:
import scipy.optimize

In [17]:
nf = 10
X = np.random.rand(nm,nf)
Theta = np.random.rand(nu,nf)

In [29]:
myflat = flattenParams(X, Theta)

# Regularization parameter of 10 is used (as used in the homework assignment)
mylambda = 10.

# Training the actual model with fmin_cg
result = scipy.optimize.fmin_ncg(cofiCostFunc, x0=myflat, fprime=cofiGrad, \
                               args=(Ynorm,R,nu,nm,nf,mylambda), \
                                maxiter=10,disp=True,full_output=True)

         Current function value: 13775482.461996
         Iterations: 1
         Function evaluations: 24
         Gradient evaluations: 19
         Hessian evaluations: 0


In [30]:
# Reshape the trained output into sensible "X" and "Theta" matrices
resX, resTheta = reshapeParams(result[0], nm, nu, nf)

In [31]:
# After training the model, now make recommendations by computing
# the predictions matrix
prediction_matrix = resX.dot(resTheta.T)

In [32]:
#prediction_matrix[0:5,0:5]
#Ymean[0:100]

In [33]:
# Grab the last user's predictions (since I put my predictions at the
# end of the Y matrix, not the front)
# Add back in the mean movie ratings
my_predictions = prediction_matrix[:,-1] + Ymean.flatten()

In [34]:
# Sort my predictions from highest to lowest
pred_idxs_sorted = np.argsort(my_predictions)
pred_idxs_sorted[:] = pred_idxs_sorted[::-1]

print("Top recommendations for you:")
for i in range(30):
    print('Predicting rating {0:.1f} for movie {1}.'.format(\
    my_predictions[pred_idxs_sorted[i]],movies.loc[pred_idxs_sorted[i],'title']))
    
print("\nOriginal ratings provided:")
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print('Rated {0:.1f} for movie {1}.'.format(my_ratings[i][0],movies.loc[i,'title']))
        

Top recommendations for you:
Predicting rating 5.3 for movie Inherit the Wind (1960).
Predicting rating 5.0 for movie Anne Frank Remembered (1995).
Predicting rating 4.7 for movie The Martian (2015).
Predicting rating 4.6 for movie Mister Roberts (1955).
Predicting rating 4.5 for movie Missing (1982).
Predicting rating 4.5 for movie Carnal Knowledge (1971).
Predicting rating 4.5 for movie Body of Lies (2008).
Predicting rating 4.5 for movie Her (2013).
Predicting rating 4.5 for movie Rush (2013).
Predicting rating 4.5 for movie Clear and Present Danger (1994).
Predicting rating 4.4 for movie Roman Holiday (1953).
Predicting rating 4.3 for movie To Have and Have Not (1944).
Predicting rating 4.3 for movie Trip to the Moon, A (Voyage dans la lune, Le) (1902).
Predicting rating 4.3 for movie Autumn Sonata (Höstsonaten) (1978).
Predicting rating 4.3 for movie Discreet Charm of the Bourgeoisie, The (Charme discret de la bourgeoisie, Le) (1972).
Predicting rating 4.3 for movie Coming Home (1

In [41]:
# Grab a user's predictions 
# Add back in the mean movie ratings
user1_predictions = prediction_matrix[:,1] + Ymean.flatten()

In [44]:
# Sort my predictions from highest to lowest
pred_idxs_sorted = np.argsort(user1_predictions)
pred_idxs_sorted[:] = pred_idxs_sorted[::-1]

print("Top recommendations for User1:")
for i in range(10):
    print('Predicting rating {0:.1f} for movie {1}.'.format(\
    my_predictions[pred_idxs_sorted[i]],movies.loc[pred_idxs_sorted[i],'title']))
    
print("\nOriginal ratings provided:")
for i in range(len(Y[:,1])):
    if Y[i,1] > 0:
        print('Rated {0:.1f} for movie {1}.'.format(Y[i][1],movies.loc[i,'title']))

Top recommendations for User1:
Predicting rating 5.3 for movie Inherit the Wind (1960).
Predicting rating 4.6 for movie Mister Roberts (1955).
Predicting rating 4.3 for movie Autumn Sonata (Höstsonaten) (1978).
Predicting rating 4.3 for movie Trip to the Moon, A (Voyage dans la lune, Le) (1902).
Predicting rating 4.4 for movie Roman Holiday (1953).
Predicting rating 3.9 for movie Jeffrey (1995).
Predicting rating 4.5 for movie Carnal Knowledge (1971).
Predicting rating 4.7 for movie The Martian (2015).
Predicting rating 4.1 for movie Fame (1980).
Predicting rating 3.9 for movie Paperman (2012).

Original ratings provided:
Rated 4.0 for movie GoldenEye (1995).
Rated 5.0 for movie Sense and Sensibility (1995).
Rated 5.0 for movie Clueless (1995).
Rated 4.0 for movie Seven (a.k.a. Se7en) (1995).
Rated 4.0 for movie Usual Suspects, The (1995).
Rated 3.0 for movie Mighty Aphrodite (1995).
Rated 3.0 for movie Mr. Holland's Opus (1995).
Rated 4.0 for movie Braveheart (1995).
Rated 3.0 for mov

Observations about my recommender system:

1. I recommended the move "Bandit Queen" even though only one user had given it a rating.  I think I should throw out any movie that hasn't been rated more than X times.  Perhaps X=5?  The only problem is that once I filter out these movies, then my Y and R matrices will have a new index to movie mapping.  Try to find a solution to this problem.
1. To be a decent recommender system, I need to be able to take into account the actual user preferences.  For instance, since I don't like violence, I really don't want my recommender system to recommend any violent films.  How to implement this?
1. I'm getting different results every time I run my recommender system.  I think the reason for this is (a) I'm using a different seed for theta and X every time and (b) it doesn't appear that my gradient decent algorithm is converging.  Ask Robert his thoughts on this.

In [39]:
result[0].shape

(41680,)

In [40]:
nu*nf+nm*nf

41680