# Matrix Factorization for Recommendations

## Topics

In this lesson, you will learn about three main topics:
1. We will look from a high level at how you might go about validating your recommendations.
2. We will look at matrix factorization as a method to use machine learning to make recommendations.
3. We will look at combining recommendation techniques to make predictions to existing and new users and for existing and new items.

As we go through this lesson, you will come to realize that there are a lot of difficulties in working with recommendation engines which make them still an exciting field to study! This is especially true when you combine your recommendations with a specific product type.

Recommending movies, recommending restaurants, or recommending clothing might happen in a number of different ways. However, the techniques you will learn in this lesson are often extendable to any of these cases.

## Training and Testing Data For Recommendations

In the last lesson, you were making recommendations by providing a list of popular items, or a list of items that the user hadn't observed but that someone with similar tastes had observed. However, understanding if these recommendations are good in practice means that you have to deploy these recommendations to users and see how it impacts your metrics (sales, higher engagement, clicks, conversions, etc.).

You may not want your recommendations to go live to understand how well they work. In these cases, you will want to split your data into training and testing portions. In these cases, you can train your recommendation engine on a subset of the data, then you can test how well your recommendation engine performs on a test set of data before deploying your model to the world.

However, the cases you saw in the last lesson, where just a list of recommendations was provided, don't actually lend themselves very well to training and testing methods of evaluation. In the next upcoming pages, you will be introduced to matrix factorization, which actually does work quite well for these situations.

If you have user data from before and after the recommendations, you can evaluate a metric on pre and post recommendation data
<center><img src='mat_fact_01.png' width=500></center>

If you dont have user data, or if you have new users, you may need to think of another way, like predicting what they would rate other movies. Some assumptions about this will follow.
<center><img src='mat_fact_02.png' width=500></center>

## Common Method - Singular Value Decomposition (SVD):

* Provide a predicted rating value, not a list of recommendations
* Can use regression based metrics like MSE or MAE to assess performance
* Metrics can be understood before deploying our recommendations to our customers

## Validating Your Recommendations

### Online Testing

For online methods of testing a recommender's performance, many of the methods you saw in the previous lesson work very well - you can deploy your recommendations and just watch your metrics carefully. It is common in practice to set up online recommendations to have an "old" version of recommended items, which is compared to a new page that uses a new recommendation strategy.

All ideas associated with A/B testing that you learned in the earlier lessons are critical to watching your metrics in online learning, and ultimately, choosing a recommendation strategy that works best for your products and customers.

### Offline Testing

In many cases, a company might not let you simply deploy your recommendations out into the real world any time you feel like it. Testing out your recommendations in a training-testing environment prior to deploying them is called **offline** testing.

The recommendation methods you built in the previous lesson actually don't work very well for offline testing. In offline testing, it is ideal to not just obtain a list of recommendations for each individual, because we ultimately don't know if a user doesn't use an item because they don't like it, or because they just haven't used it yet (but would like it). Rather, it would be great if we have an idea of how much each user would like each item using a predicted rating. Then we can compare this predicted rating to the actual rating any individual gives to an item in the future.

In the previous video, you saw an example of a user to whom we gave a list of movies that they still hadn't seen. Therefore, we couldn't tell how well we were doing with our recommendations. Techniques related to matrix factorization lend themselves nicely to solving this problem.

### User Groups

The final (possible) method of validating your recommendations is by having user groups give feedback on items you would recommend for them. Obtaining good user groups that are representative of your customers can be a challenge on its own. This is especially true when you have a lot of products and a very large consumer base.

### Quiz Question

For each metric below, identify whether the metric is generally used for **regression** or **classification**. In offline methods of looking at the results of our recommendations systems, we often are able to use the metrics associated with regression or classification algorithms.

<center><img src='mat_fact_03.png' width=500></center>

## Singular Value Decomposition - SVD

In the next part of this lesson, you will first get exposure to Singular Value Decomposition, or SVD. We will soon see why this technique falls short for many recommendation problems. However, understanding traditional SVD approaches to matrix factorization is useful as a start to a number of matrix factorization techniques that are possible in practice.

In order to implement SVD for many recommendation engines, we will need to use a slightly modified approach known as FunkSVD. This approach proved to work incredibly well during the [Netflix competition](https://en.wikipedia.org/wiki/Netflix_Prize), and therefore, it is one of the most popular recommendation approaches in use today.

Let's first take a closer look at traditional SVD.

## Latent Factors

* **Latent Factor:** Feature that isn't actually observed in the data, but can be inferred based on the relationships that occur
<center><img src="mat_fact_04.png" width=500></center>

When performing SVD, we create a matrix of users by items (or customers by movies in our specific example), with user ratings for each item scattered throughout the matrix. An example is shown in the image below.
<center><img src="mat_fact_05.png" width=500></center>

You can see that this matrix doesn't have any specific information about the users or items. Rather, it just holds the ratings that each user gave to each item. Using SVD on this matrix, we can find **latent features** related to the movies and customers. This is amazing because the dataset doesn't contain any information about the customers or movies!

### Latent Factor Question

Imagine a situation in which you collect 1-5 ratings data from a bunch of people related to how they feel about lots of different animals (birds, horses, cats, dogs, etc.). What are some examples of possible latent factors you might observe in this ratings data?
* The observed ratings values
* **The size of the animal**
* **How many legs the animal has**
* **Whether the animal can fly or not**
> The ratings are observed, so they are not latent! However, the other three variables are latent in that we wouldn't have raw data on these values by just collecting ratings of how a user feels about each animal.

### Singular Value Decomposition

Let's do a quick check of understanding. If we let AA be our user-item matrix, we can write the decomposition of that matrix in the following way.

$$A = U \Sigma V^{T}$$ 

Use the quizzes below to test your understanding of what these matrices represent, as well as the dimensions of these matrices.<br>
<left><img src="mat_fact_06.png" width=500></left> $ \rightarrow$ <right><img src="mat_fact_07.png" width=500></right>

#### $U$: index (users) relationship to latent features
<center><img src="mat_fact_08.png" width=500></center>

#### $V$: item (movies) relationship to latent features
<center><img src="mat_fact_09.png" width=500></center>

#### $\Sigma$: matrix of relationship to latent features importance to reconstructing original matrix

* Has the same number of rows and columns (k) as the number of latent factors you decide to keep (e.g. 3: AI, dogs, sad)
* Values of matrix are in descending order
* Off-diagonal values are zero

<left><img src="mat_fact_10.png" width=500></left> $\rightarrow$ <right><img src="mat_fact_11.png" width=500></right>

#### Combining:
<left><img src="mat_fact_12.png" width=500></left> $\rightarrow$ <right><img src="mat_fact_13.png" width=500></right>

#### Quiz:
<center><img src="mat_fact_14.png" width=500></center>

### Singular Value Decomposition Takeaways
Three main takeaways [from the previous notebook](07_matrix_factorization/1_Intro_to_SVD.ipynb):

#### 1. **The latent factors retrieved from SVD aren't actually labeled.**
There will not be a clear labeling of latent features (it will require investigation to understand)

<center><img src="mat_fact_16.png" width=500></center>


#### 2. We can get an idea of how many latent factors we might want to keep by using the Sigma matrix.

The sum of squared diagonal elements tells us the total variability to be explained in the matrix.

By separating the elements, we can see which latent features explain the most variability; we may not need all of them to reconstruct the original matrix

<center><img src="mat_fact_15.png" width=500></center>

#### 3. SVD in NumPy will not work when our matrix has missing values. This makes this technique less than useful for our current user-movie matrix.

Most matrices will not be completely filled. Without that, we cannot discern the eigen-values/vectors. We'll see that Funk SVD will work for missing values.

<center><img src="mat_fact_17.png" width=500></center>

### SVD CLosed Form Solution

**What Is A Closed Form Solution?**<br>
A closed form solution is one where you can directly find the solution values (unlike iterative solutions, which are commonly used in practice). There isn't an iterative approach to solving a particular equation. One of the most popular examples of a closed form solution is the solution for multiple linear regression. That is if we want to find an estimate for $\beta$ in the following situation: $y=X\beta$

We can find it by computing the **Best Linear Unbiased Estimate (BLUE)**. It can found **in closed form** using the equation: $\hat{\beta}=\left(X^{T}X \right)^{-1}X^{T}y$

Where **X** is a matrix of explanatory inputs and **y** is a response vector.

Another common example of a closed form solution is the quadratic equation. If we want to find **x** that solves: $ax^{2}+bx+c=0$

We can find these values using the quadratic formula: $x= \frac{-b \pm \sqrt{ b^{2} - 4ac}}{2a}$

**Each of these is an example of a closed form solution, because in each case we have an equation that allows us to solve directly for our values of interest.**

**Closed Form Solutions for SVD**
It turns out there is a closed form solution for Singular Value Decomposition that can be used to identify each of the matrices of interest ($U,\Sigma,V$). The most straightforward explanation of this closed form solution can be found at [this MIT link](http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm).

As put in the paper -

"Calculating the SVD consists of finding the eigenvalues and eigenvectors of $AA^{T}$ and $A^{T}A$. The eigenvectors of $A^{T}A$ make up the columns of $V$, the eigenvectors of $AA^{T}$ make up the columns of $U$. Also, the singular values in $\Sigma$ are square roots of eigenvalues from $AA^{T}$ or $A^{T}A$. The singular values are the diagonal entries of the $\Sigma$ matrix and are arranged in descending order. The singular values are always real numbers. If the matrix $A$ is a real matrix, then $U$ and $V$ are also real"

Again, you can see a fully worked example of the closed form solution at the [MIT link here](http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm).

**A More Common Approach**
The main issue with the closed form solution (especially for us) is that it doesn't actually work when we have missing data. Instead, Simon Funk (and then many followers) came up with other solutions for finding our matrices of interest in these cases using **gradient descent**.

So all of this is to say, people don't really use the closed form solution for SVD, and therefore, we aren't going to spend a lot of time on it either. The link above is all you need to know. Now, we are going to look at the main way that the matrices in SVD are estimated, as this is what is used for estimating values in FunkSVD.

**Additional Resources**
Below are some additional resources in case you are looking for others that go beyond what was shown in the simplified MIT paper.
* [Stanford Discussion on SVD](http://infolab.stanford.edu/~ullman/mmds/ch11.pdf)
* [Why are Singular Values Always Positive on StackExchange](https://math.stackexchange.com/questions/2060572/why-are-singular-values-always-non-negative)
* [An additional resource for SVD in Python](https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/)
* [Using Missing Values to Improve Recommendations in SVD](https://www.hindawi.com/journals/mpe/2015/380472/)

### Funk SVD Practice

The issue Funk SVD solves is where the SVD matrices have missing values.

<center><img src="mat_fact_21.png" width=500></center>

Previously, replacing all missing values with zero was common.

<center><img src="mat_fact_22.png" width=500></center>

Instead, Funk SVD ignores the missing values.

<center><img src="mat_fact_23.png" width=500></center>

For this approach, consider two matrices like we saw earlier. To begin, we randomly place values into each of these matrices:

<center><img src="mat_fact_24.png" width=500></center>

Then we search our user/item matrix for a rating that already exists:

<center><img src="mat_fact_25.png" width=500></center>

And when we find the rating, we update the randomly filled matrix with the following:

<center><img src="mat_fact_26.png" width=500></center>

That will give us a prediction for the rating. You'll notice there is a difference between the predicted rating and the true rating. This is the error and we will use it to update the values of the $U$ and $V$ matrices with gradient descent.

<center><img src="mat_fact_27.png" width=500></center>

<center><img src="mat_fact_28.png" width=500></center>

* $\left( y - uv \right)$ is the error, the difference between the `actual` and `predicted` values.

* $u_{i}$ and $v_{i}$ are each of the i-th values of the matrices $U$ and $V$.

#### An example of applied Gradient Descent

<center><img src="mat_fact_29.png" width=500></center>

<center><img src="mat_fact_30.png" width=500></center>

We then perform this same update for the entirety of both $U$ and $V$ matrix values noting that the values yielded from gradient descent on one, in this case $U$, influence the values updated in the second matrix, $V$ in this case.

<center><img src="mat_fact_31.png" width=500></center>

Yielding this at the end of the first iteration:

<center><img src="mat_fact_32.png" width=500></center>

We then start over, finding the next non-missing value and update our $U$ and $V$ matrices again.

<center><img src="mat_fact_33.png" width=500></center>

In the notebook on the next page, you will be writing the code to implement Funk SVD. Before you dive in, let's do a practice run here.

First, consider we have a user-item matrix that looks like the matrix below, where we want to make an update of $U$ and $V$ matrices based on the 4 highlighted.

<center><img src="mat_fact_18.png" width=500></center>

Also consider we have the following $U$ and $V$ matrices
<center><img src="mat_fact_19.png" width=500></center>

Use the quiz below to identify what your current prediction would be for the 4 rating (highlighted in the first image above) based on the current values in the $U$ and $V$ matrices. Notice, this should be the prediction for the third user on the movie AI.

Q: The predicted value for the `third user` on the movie `AI` based on the current $U$ and $V$ matrices. -1.8
> * By taking the dot product of the row associated with the third user and the movie AI, we can find the predicted rating. We aren't doing very well with the random values!
> * $(0.4 \cdot 1.2) + (1.1 \cdot -0.9) + (-1.2 \cdot 1.1) = -1.83$

Next, let's look at updating the value 0.4 as shown in the U matrix below.
<center><img src="mat_fact_20.png" width=500></center>

Q: Select the formula that would be used to give the new, updated value that would replace the 0.4 value. Consider we use a learning rate of 0.01.: $\text{new_val}=0.4+0.01 \cdot 2 \cdot (4-1.8) \cdot 1.2$
> * This is the same process you will use to update each value in the matrix: `u_new = u_old + {learn_rate * 2 * (actual - pred) * v_old}`

[You can now see this applied in this notebook.](07_matrix_factorization/2_Implementing_FunkSVD_Solution.ipynb)

#### Funk SVD Algorithm by hand

```python
import numpy as np

def FunkSVD(ratings_mat, latent_features=4, learning_rate=0.0001, iters=100):
    '''
    This function performs matrix factorization using a basic form of FunkSVD with no regularization
    
    INPUT:
    ratings_mat - (numpy array) a matrix with users as rows, movies as columns, and ratings as values
    latent_features - (int) the number of latent features used
    learning_rate - (float) the learning rate 
    iters - (int) the number of iterations
    
    OUTPUT:
    user_mat - (numpy array) a user by latent feature matrix
    movie_mat - (numpy array) a latent feature by movie matrix
    '''
    
    # Set up useful values to be used through the rest of the function
    n_users = ratings_mat.shape[0]
    n_movies = ratings_mat.shape[1]
    num_ratings = np.count_nonzero(~np.isnan(ratings_mat))
    
    # initialize the user and movie matrices with random values
    user_mat = np.random.rand(n_users, latent_features)
    movie_mat = np.random.rand(latent_features, n_movies)
    
    # initialize sse at 0 for first iteration
    sse_accum = 0
    
    # header for running results
    print("Optimizaiton Statistics")
    print("Iterations | Mean Squared Error ")
    
    # for each iteration
    for iteration in range(iters):

        # update our sse
        old_sse = sse_accum
        sse_accum = 0
        
        # For each user-movie pair
        for i in range(n_users):
            for j in range(n_movies):
                
                # if the rating exists
                if ratings_mat[i, j] > 0:
                    
                    # compute the error as the actual minus the dot product of the user and movie latent features
                    diff = ratings_mat[i, j] - np.dot(user_mat[i, :], movie_mat[:, j])
                    
                    # Keep track of the sum of squared errors for the matrix
                    sse_accum += diff**2
                    
                    # update the values in each matrix in the direction of the gradient
                    for k in range(latent_features):
                        user_mat[i, k] += learning_rate * (2*diff*movie_mat[k, j])
                        movie_mat[k, j] += learning_rate * (2*diff*user_mat[i, k])

        # print results for iteration
        print("%d \t\t %f" % (iteration+1, sse_accum / num_ratings))
        
    return user_mat, movie_mat 
```

### Cold Start Problem

The **cold start problem** is the problem that *new users* and *new items* to a platform don't have any ratings. Because these users and items don't have any ratings, it is impossible to use collaborative filtering methods to make recommendations.

Therefore, methods you used in the previous lesson like (rank-based and content-based recommenders) are the only way to get started with making recommendations for these individuals.

**Pros of SVD**
* Predicted ratings for all user-movie pairs
> * Even when the total number of ratings is very sparse
* Regression metrics to measure how well our predictions match actual values

However, if there exist no data for a user or an item, maybe because they are brand new to the system, then we must do something else. This is known as the *cold start problem*.
<center><img src="mat_fact_34.png" width=500></center>

[In this notebook, you can see how to blend collaborative filtering and Funk SVD to deal with recommending items with a cold start problem](07_matrix_factorization/4_Cold_Start_Problem_Solution.ipynb)

### Making a module for recommending movies

First, set up the helper functions:
```python
import numpy as np
import pandas as pd

def get_movie_names(movie_ids, movies_df):
    '''
    INPUT
    movie_ids - a list of movie_ids
    movies_df - original movies dataframe
    OUTPUT
    movies - a list of movie names associated with the movie_ids

    '''
    # Read in the datasets
    movie_lst = list(movies_df[movies_df['movie_id'].isin(movie_ids)]['movie'])

    return movie_lst


def create_ranked_df(movies, reviews):
        '''
        INPUT
        movies - the movies dataframe
        reviews - the reviews dataframe

        OUTPUT
        ranked_movies - a dataframe with movies that are sorted by highest avg rating, more reviews, then time, and must have more than 4 ratings
        '''

        # Pull the average ratings and number of ratings for each movie
        movie_ratings = reviews.groupby('movie_id')['rating']
        avg_ratings = movie_ratings.mean()
        num_ratings = movie_ratings.count()
        last_rating = pd.DataFrame(reviews.groupby('movie_id').max()['date'])
        last_rating.columns = ['last_rating']

        # Add Dates
        rating_count_df = pd.DataFrame({'avg_rating': avg_ratings, 'num_ratings': num_ratings})
        rating_count_df = rating_count_df.join(last_rating)

        # merge with the movies dataset
        movie_recs = movies.set_index('movie_id').join(rating_count_df)

        # sort by top avg rating and number of ratings
        ranked_movies = movie_recs.sort_values(['avg_rating', 'num_ratings', 'last_rating'], ascending=False)

        # for edge cases - subset the movie list to those with only 5 or more reviews
        ranked_movies = ranked_movies[ranked_movies['num_ratings'] > 4]

        return ranked_movies


def find_similar_movies(movie_id, movies_df):
    '''
    INPUT
    movie_id - a movie_id
    movies_df - original movies dataframe
    OUTPUT
    similar_movies - an array of the most similar movies by title
    '''
    # dot product to get similar movies
    movie_content = np.array(movies_df.iloc[:,4:])
    dot_prod_movies = movie_content.dot(np.transpose(movie_content))

    # find the row of each movie id
    movie_idx = np.where(movies_df['movie_id'] == movie_id)[0][0]

    # find the most similar movie indices - to start I said they need to be the same for all content
    similar_idxs = np.where(dot_prod_movies[movie_idx] == np.max(dot_prod_movies[movie_idx]))[0]

    # pull the movie titles based on the indices
    similar_movies = np.array(movies_df.iloc[similar_idxs, ]['movie'])

    return similar_movies


def popular_recommendations(user_id, n_top, ranked_movies):
    '''
    INPUT:
    user_id - the user_id (str) of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    ranked_movies - a pandas dataframe of the already ranked movies based on avg rating, count, and time

    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''

    top_movies = list(ranked_movies['movie'][:n_top])

    return top_movies
```

Then, create a module that allows for fitting and predicting.

```python
import numpy as np
import pandas as pd
import recommender_functions as rf
import sys # can use sys to take command line arguments

class Recommender():
    '''
    This Recommender uses FunkSVD to make predictions of exact ratings.  And uses either FunkSVD or a Knowledge Based recommendation (highest ranked) to make recommendations for users.  Finally, if given a movie, the recommender will provide movies that are most similar as a Content Based Recommender.
    '''
    def __init__(self):
        '''
        I didn't have any required attributes needed when creating my class.
        '''


    def fit(self, reviews_pth, movies_pth, latent_features=12, learning_rate=0.0001, iters=100):
        '''
        This function performs matrix factorization using a basic form of FunkSVD with no regularization

        INPUT:
        reviews_pth - path to csv with at least the four columns: 'user_id', 'movie_id', 'rating', 'timestamp'
        movies_pth - path to csv with each movie and movie information in each row
        latent_features - (int) the number of latent features used
        learning_rate - (float) the learning rate
        iters - (int) the number of iterations

        OUTPUT:
        None - stores the following as attributes:
        n_users - the number of users (int)
        n_movies - the number of movies (int)
        num_ratings - the number of ratings made (int)
        reviews - dataframe with four columns: 'user_id', 'movie_id', 'rating', 'timestamp'
        movies - dataframe of
        user_item_mat - (np array) a user by item numpy array with ratings and nans for values
        latent_features - (int) the number of latent features used
        learning_rate - (float) the learning rate
        iters - (int) the number of iterations
        '''
        # Store inputs as attributes
        self.reviews = pd.read_csv(reviews_pth)
        self.movies = pd.read_csv(movies_pth)

        # Create user-item matrix
        usr_itm = self.reviews[['user_id', 'movie_id', 'rating', 'timestamp']]
        self.user_item_df = usr_itm.groupby(['user_id','movie_id'])['rating'].max().unstack()
        self.user_item_mat= np.array(self.user_item_df)

        # Store more inputs
        self.latent_features = latent_features
        self.learning_rate = learning_rate
        self.iters = iters

        # Set up useful values to be used through the rest of the function
        self.n_users = self.user_item_mat.shape[0]
        self.n_movies = self.user_item_mat.shape[1]
        self.num_ratings = np.count_nonzero(~np.isnan(self.user_item_mat))
        self.user_ids_series = np.array(self.user_item_df.index)
        self.movie_ids_series = np.array(self.user_item_df.columns)

        # initialize the user and movie matrices with random values
        user_mat = np.random.rand(self.n_users, self.latent_features)
        movie_mat = np.random.rand(self.latent_features, self.n_movies)

        # initialize sse at 0 for first iteration
        sse_accum = 0

        # keep track of iteration and MSE
        print("Optimizaiton Statistics")
        print("Iterations | Mean Squared Error ")

        # for each iteration
        for iteration in range(self.iters):

            # update our sse
            old_sse = sse_accum
            sse_accum = 0

            # For each user-movie pair
            for i in range(self.n_users):
                for j in range(self.n_movies):

                    # if the rating exists
                    if self.user_item_mat[i, j] > 0:

                        # compute the error as the actual minus the dot product of the user and movie latent features
                        diff = self.user_item_mat[i, j] - np.dot(user_mat[i, :], movie_mat[:, j])

                        # Keep track of the sum of squared errors for the matrix
                        sse_accum += diff**2

                        # update the values in each matrix in the direction of the gradient
                        for k in range(self.latent_features):
                            user_mat[i, k] += self.learning_rate * (2*diff*movie_mat[k, j])
                            movie_mat[k, j] += self.learning_rate * (2*diff*user_mat[i, k])

            # print results
            print("%d \t\t %f" % (iteration+1, sse_accum / self.num_ratings))

        # SVD based fit
        # Keep user_mat and movie_mat for safe keeping
        self.user_mat = user_mat
        self.movie_mat = movie_mat

        # Knowledge based fit
        self.ranked_movies = rf.create_ranked_df(self.movies, self.reviews)


    def predict_rating(self, user_id, movie_id):
        '''
        INPUT:
        user_id - the user_id from the reviews df
        movie_id - the movie_id according the movies df

        OUTPUT:
        pred - the predicted rating for user_id-movie_id according to FunkSVD
        '''
        try:# User row and Movie Column
            user_row = np.where(self.user_ids_series == user_id)[0][0]
            movie_col = np.where(self.movie_ids_series == movie_id)[0][0]

            # Take dot product of that row and column in U and V to make prediction
            pred = np.dot(self.user_mat[user_row, :], self.movie_mat[:, movie_col])

            movie_name = str(self.movies[self.movies['movie_id'] == movie_id]['movie']) [5:]
            movie_name = movie_name.replace('\nName: movie, dtype: object', '')
            print("For user {} we predict a {} rating for the movie {}.".format(user_id, round(pred, 2), str(movie_name)))

            return pred

        except:
            print("I'm sorry, but a prediction cannot be made for this user-movie pair.  It looks like one of these items does not exist in our current database.")

            return None


    def make_recommendations(self, _id, _id_type='movie', rec_num=5):
        '''
        INPUT:
        _id - either a user or movie id (int)
        _id_type - "movie" or "user" (str)
        rec_num - number of recommendations to return (int)

        OUTPUT:
        recs - (array) a list or numpy array of recommended movies like the
                       given movie, or recs for a user_id given
        '''
        # if the user is available from the matrix factorization data,
        # I will use this and rank movies based on the predicted values
        # For use with user indexing
        rec_ids, rec_names = None, None
        if _id_type == 'user':
            if _id in self.user_ids_series:
                # Get the index of which row the user is in for use in U matrix
                idx = np.where(self.user_ids_series == _id)[0][0]

                # take the dot product of that row and the V matrix
                preds = np.dot(self.user_mat[idx,:],self.movie_mat)

                # pull the top movies according to the prediction
                indices = preds.argsort()[-rec_num:][::-1] #indices
                rec_ids = self.movie_ids_series[indices]
                rec_names = rf.get_movie_names(rec_ids, self.movies)

            else:
                # if we don't have this user, give just top ratings back
                rec_names = rf.popular_recommendations(_id, rec_num, self.ranked_movies)
                print("Because this user wasn't in our database, we are giving back the top movie recommendations for all users.")

        # Find similar movies if it is a movie that is passed
        else:
            if _id in self.movie_ids_series:
                rec_names = list(rf.find_similar_movies(_id, self.movies))[:rec_num]
            else:
                print("That movie doesn't exist in our database.  Sorry, we don't have any recommendations for you.")

        return rec_ids, rec_names

if __name__ == '__main__':
    import recommender as r

    #instantiate recommender
    rec = r.Recommender()

    # fit recommender
    rec.fit(reviews_pth='data/train_data.csv', movies_pth= 'data/movies_clean.csv', learning_rate=.01, iters=1)

    # predict
    rec.predict_rating(user_id=8, movie_id=2844)

    # make recommendations
    print(rec.make_recommendations(8,'user')) # user in the dataset
    print(rec.make_recommendations(1,'user')) # user not in dataset
    print(rec.make_recommendations(1853728)) # movie in the dataset
    print(rec.make_recommendations(1)) # movie not in dataset
    print(rec.n_users)
    print(rec.n_movies)
    print(rec.num_ratings)
```

### Conlusion
In this lesson, you got your hands on some of the most important ideas associated with recommendation systems:

**Recommender Validation**
You looked at methods for validating your recommendations (when possible) using offline methods. In these cases, you could split your data into training and testing data. Frequently this split is based on time, where events earlier in time are in the training data, and events later in time are in a testing dataset.

We also quickly introduced the idea of being able to see how well your recommendation engine works by simply throwing it out into the world to directly see the impact.

**Matrix Factorization with SVD**
Next, we looked at matrix factorization as a technique for making recommendations. Traditional singular value decomposition a technique can be used when your matrices have no missing values. In this decomposition technique, a user-item (A) can be decomposed as follows:
$$A=U\Sigma V^{T}$$

Where
* $U$ gives information about how users are related to latent features.
* $\Sigma$ gives information about how much latent features matter towards recreating the user-item matrix.
* $V^{T}$ gives information about how much each movie is related to latent features.

Since this traditional decomposition doesn't actually work when our matrices have missing values, we looked at another method for decomposing matrices.

**FunkSVD**
FunkSVD was a new method that you found to be useful for matrices with missing values. With this matrix factorization you decomposed a user-item (**A**) as follows:
$$A=UV^{T}$$

Where
* $U$ gives information about how users are related to latent features
* $V^{T}$ gives information about how much each movie is related to latent features

You found that you could iterate to find the latent features in each of these matrices using gradient descent. You wrote a function to implement gradient descent to find the values within these two matrices.

Using this method, you were able to make a prediction for any user-movie pair in your dataset. You also could use it to test how well your predictions worked on a train-test split of the data. However, this method fell short with new users or movies.

**The Cold Start Problem**
Collaborative filtering using FunkSVD still wasn't helpful for new users and new movies. In order to recommend these items, you implemented content based and ranked based recommendations along with your FunkSVD implementation.

**Author's Note**
There are so many ways to make recommendations, and this course provides you a very strong mind and skill set to tackle building your own recommendation systems in practice.