# SVD Exercises

Congrats, you just got hired as the lead data scientist for Minipreço! Your boss has some data of invoices for purchases and she thinks that a recommender system would lead to higher sales. Using the `retail.csv` and the **SVD** from class lets make this happen!

*Hint: The SVD notebook from class will help a lot!*

### Load in the data

In [1]:
from IPython.display import Image
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
import warnings
warnings.simplefilter("ignore")
%matplotlib inline

In [2]:
matplotlib.rcParams['figure.figsize'] = [10, 10]

In [4]:
data = pd.read_csv("data/retail.csv")

In [5]:
data.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,573744,21314,SMALL GLASS HEART TRINKET POT,8,2011-11-01 08:16:00,2.1,17733.0,United Kingdom
1,573744,21704,BAG 250g SWIRLY MARBLES,12,2011-11-01 08:16:00,0.85,17733.0,United Kingdom
2,573744,21791,VINTAGE HEADS AND TAILS CARD GAME,12,2011-11-01 08:16:00,1.25,17733.0,United Kingdom
3,573744,21892,TRADITIONAL WOODEN CATCH CUP GAME,12,2011-11-01 08:16:00,1.25,17733.0,United Kingdom
4,573744,21915,RED HARMONICA IN BOX,12,2011-11-01 08:16:00,1.25,17733.0,United Kingdom


### Create sparse CustomerID and Quantity matrix

In [9]:
invoice_mtx_df = data.pivot_table(values='Quantity', index='CustomerID',
                                     columns='StockCode')
StockCode_index = ratings_mtx_df.columns

In [10]:
invoice_mtx_df.head()

StockCode,10080,10120,10124A,10124G,10125,10135,11001,15030,15034,15036,...,90214M,90214N,90214S,BANK CHARGES,C2,CRUK,D,DOT,M,POST
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
12349.0,,,,,,,,,,,...,,,,,,,,,,1.0
12352.0,,,,,,,,,,,...,,,,,,,,,,2.0
12356.0,,,,,,,,,,,...,,,,,,,,,,
12357.0,,,,,,,,,,,...,,,,,,,,,,
12362.0,,,,,,,,,,,...,,,,,,,,,,4.0


In [11]:
from scipy.sparse import coo_matrix
invoice_mtx = invoice_mtx_df.fillna(0).as_matrix().copy()

invoice_sparse_mtx = coo_matrix(invoice_mtx)

In [13]:
invoice_sparse_mtx

<1711x2704 sparse matrix of type '<class 'numpy.float64'>'
	with 55529 stored elements in COOrdinate format>

### Take the SVD of the matrix
*Make degree 10*

In [14]:
from scipy.sparse.linalg import svds

In [16]:
U,s, V = svds(invoice_mtx,  k=10)

In [17]:
U.shape, s.shape, V.shape

((1711, 10), (10,), (10, 2704))

### Multiply the decompostion back to get approx matrix

In [18]:
s_diag_matrix = np.zeros((s.shape[0], s.shape[0]))

for i in range(s.shape[0]):
    s_diag_matrix[i,i] = s[i]

In [19]:
invoice_svd = U @ s_diag_matrix @ V

In [20]:
invoice_svd.shape

(1711, 2704)

In [21]:
invoice_svd

array([[  1.23342323e-03,   1.93415018e-03,   4.88805691e-05, ...,
          9.51224696e-05,   2.51846079e-02,   2.11898642e-02],
       [  8.65528216e-05,   1.25679213e-04,   3.90324680e-06, ...,
          7.20674251e-06,   7.46346462e-03,   2.29425951e-03],
       [  1.37734275e-06,   3.54078145e-06,   7.02370917e-08, ...,
          2.64978898e-07,   6.81608765e-05,   9.20442285e-05],
       ..., 
       [  3.68341866e-19,   3.61556380e-18,   4.39974623e-20, ...,
          2.37093215e-20,   1.48248327e-14,  -1.26704425e-17],
       [ -3.12860530e-08,  -6.00005496e-08,  -1.02239974e-09, ...,
         -2.30419421e-09,  -2.39068553e-07,  -1.04964032e-07],
       [  7.86902322e-04,   1.42018072e-03,   2.82985098e-05, ...,
          6.51039841e-05,   6.05737933e-01,   1.03481134e-02]])

### Create recommendation function

Using the approx matrix from above we can give recommendations. You got this!

In [23]:
def recommend(customer_id, num_results=5):
    """ Reccomends items to users
    Args:
      customer_id : the customer id
      num_results: the number of recs to give
    Returns:
      rec_ids: list of recommended item ids
      rec_names: list of recommended item description/names
    """
    user_recommendations = list(
                        map(
                         lambda x: x[0],
                         sorted(user_recommendations, key=lambda x: x[1])
                        )
)
    user_recommendations = [movie for movie in user_recommendations if movie not in user_rated_movies]
    user_recommendations[:10]
    raise NotImplementedError("Finish the function...")

### BONUS!
This only works for customers already in the data. How would you change this to make it recommend for a new customer that has purchased a list of items?