# SVD Exercises

Congrats, you just got hired as the lead data scientist for Minipreço! Your boss has some data of invoices for purchases and she thinks that a recommender system would lead to higher sales. Using the `retail.csv` and the **SVD** from class lets make this happen!

*Hint: The SVD notebook from class will help a lot!*

### Load in the data

In [1]:
import pandas as pd
import numpy as np

In [55]:
retail = pd.read_csv('data/retail.csv')
retail.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,573744,21314,SMALL GLASS HEART TRINKET POT,8,2011-11-01 08:16:00,2.1,17733.0,United Kingdom
1,573744,21704,BAG 250g SWIRLY MARBLES,12,2011-11-01 08:16:00,0.85,17733.0,United Kingdom
2,573744,21791,VINTAGE HEADS AND TAILS CARD GAME,12,2011-11-01 08:16:00,1.25,17733.0,United Kingdom
3,573744,21892,TRADITIONAL WOODEN CATCH CUP GAME,12,2011-11-01 08:16:00,1.25,17733.0,United Kingdom
4,573744,21915,RED HARMONICA IN BOX,12,2011-11-01 08:16:00,1.25,17733.0,United Kingdom


sum quantities of separate purchases

In [49]:
pivot = pd.pivot_table(retail, values='Quantity', index=['CustomerID','StockCode','Description'], aggfunc=np.sum).reset_index()

In [51]:
pivot.head()

Unnamed: 0,CustomerID,StockCode,Description,Quantity
0,12349.0,20685,DOORMAT RED RETROSPOT,6
1,12349.0,20914,SET/5 RED RETROSPOT LID GLASS BOWLS,6
2,12349.0,21086,SET/6 RED SPOTTY PAPER CUPS,12
3,12349.0,21136,PAINTED METAL PEARS ASSORTED,16
4,12349.0,21231,SWEETHEART CERAMIC TRINKET BOX,36


### Create sparse CustomerID and Quantity matrix

In [52]:
retail_mtx_df = pivot.pivot_table(values='Quantity', index='CustomerID',
                                     columns='StockCode')
stockCode_index = retail_mtx_df.columns

In [53]:
retail_mtx_df.shape

(1711, 2704)

In [56]:
retail_mtx_df.head()

StockCode,10080,10120,10124A,10124G,10125,10135,11001,15030,15034,15036,...,90214M,90214N,90214S,BANK CHARGES,C2,CRUK,D,DOT,M,POST
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
12349.0,,,,,,,,,,,...,,,,,,,,,,1.0
12352.0,,,,,,,,,,,...,,,,,,,,,,2.0
12356.0,,,,,,,,,,,...,,,,,,,,,,
12357.0,,,,,,,,,,,...,,,,,,,,,,
12362.0,,,,,,,,,,,...,,,,,,,,,,4.0


In [57]:
from scipy.sparse import coo_matrix
retail_mtx = retail_mtx_df.fillna(0).as_matrix().copy()

retail_sparse_mtx = coo_matrix(retail_mtx)

  


In [58]:
retail_sparse_mtx

<1711x2704 sparse matrix of type '<class 'numpy.float64'>'
	with 55529 stored elements in COOrdinate format>

### Take the SVD of the matrix
*Make degree 10*

In [8]:
from scipy.sparse.linalg import svds

In [59]:
U,s, V = svds(retail_mtx,  k=10)

In [60]:
U.shape, s.shape, V.shape

((1711, 10), (10,), (10, 2704))

In [61]:
s_diag_matrix = np.zeros((s.shape[0], s.shape[0]))

for i in range(s.shape[0]):
    s_diag_matrix[i,i] = s[i]

### Multiply the decompostion back to get approx matrix

In [62]:
retail_svd = U @ s_diag_matrix @ V

In [63]:
retail_svd.shape

(1711, 2704)

In [64]:
retail_svd

array([[ 5.36494264e-04,  1.29129903e-03,  2.01672568e-05, ...,
         5.93395296e-04, -1.53222680e-02,  5.37717788e-02],
       [ 4.88798183e-05,  1.22248826e-04,  1.72817625e-06, ...,
         5.32174054e-05,  1.43823670e-03,  4.46494237e-03],
       [ 6.73044271e-06,  2.06596698e-05,  1.97987422e-07, ...,
         7.68056356e-06, -5.90230711e-05,  1.90200304e-04],
       ...,
       [-1.94857279e-19, -4.95629750e-19, -7.05704244e-21, ...,
        -2.57072143e-19, -6.34674231e-15, -1.33642146e-17],
       [-6.49664659e-08, -1.88452450e-07, -1.44252403e-09, ...,
        -7.13121817e-08,  7.01330694e-07, -1.32385267e-07],
       [ 1.52340092e-03,  4.37867466e-03,  4.31003493e-05, ...,
         1.75878305e-03,  3.43253295e-01,  5.01966611e-02]])

### Create recommendation function

Using the approx matrix from above we can give recommendations. You got this!

In [65]:
customer_id_dict = dict(zip(pivot.CustomerID.unique(), range(len(pivot.CustomerID.unique()))))

In [75]:
customer_id = 12541.0

customer_matrix_position = customer_id_dict[customer_id]

In [76]:
customer_matrix_position

63

In [68]:
pivot[pivot.CustomerID==customer_id].head()

Unnamed: 0,CustomerID,StockCode,Description,Quantity
0,12349.0,20685,DOORMAT RED RETROSPOT,6
1,12349.0,20914,SET/5 RED RETROSPOT LID GLASS BOWLS,6
2,12349.0,21086,SET/6 RED SPOTTY PAPER CUPS,12
3,12349.0,21136,PAINTED METAL PEARS ASSORTED,16
4,12349.0,21231,SWEETHEART CERAMIC TRINKET BOX,36


In [70]:
stock_codes = pivot.StockCode

In [69]:
customer_bought_items = pivot[pivot.CustomerID==customer_id].StockCode.tolist()
customer_recommendations = list(zip(stock_codes, np.argsort(retail_svd[customer_matrix_position])))

In [71]:
customer_recommendations[:10]

[('21314', 644),
 ('21704', 2702),
 ('21791', 228),
 ('21892', 908),
 ('21915', 904),
 ('22065', 905),
 ('22340', 612),
 ('22577', 906),
 ('22578', 907),
 ('22579', 229)]

In [72]:
customer_recommendations = list(
                    map(
                     lambda x: x[0],
                     sorted(customer_recommendations, key=lambda x: x[1])
                    )
)
customer_recommendations = [StockCode for StockCode in customer_recommendations if StockCode not in customer_bought_items]

In [73]:
customer_recommendations[:10]

['22382',
 '21108',
 '47566',
 '23352',
 '23284',
 '23298',
 '22863',
 '23284',
 '23491',
 '23145']

In [74]:
pivot[pivot.StockCode.isin(customer_recommendations[:5])]

Unnamed: 0,CustomerID,StockCode,Description,Quantity
241,12374.0,21108,FAIRY CAKE FLANNEL ASSORTED COLOUR,18
287,12380.0,22382,LUNCH BAG SPACEBOY DESIGN,10
417,12391.0,47566,PARTY BUNTING,1
605,12415.0,22382,LUNCH BAG SPACEBOY DESIGN,100
629,12421.0,21108,FAIRY CAKE FLANNEL ASSORTED COLOUR,18
986,12451.0,22382,LUNCH BAG SPACEBOY DESIGN,10
1274,12476.0,23284,DOORMAT KEEP CALM AND COME IN,16
1746,12517.0,23284,DOORMAT KEEP CALM AND COME IN,2
1962,12540.0,47566,PARTY BUNTING,8
1985,12541.0,23284,DOORMAT KEEP CALM AND COME IN,2


In [29]:
def recommend(customer_id, num_results=5):
    """ Reccomends items to users
    Args:
      customer_id : the customer id
      num_results: the number of recs to give
    Returns:
      rec_ids: list of recommended item ids
      rec_names: list of recommended item description/names
    """
    customer_matrix_position = customer_id_dict[customer_id]
    customer_bought_items = retail[retail.CustomerID==customer_id].StockCode.tolist()
    customer_recommendations = list(zip(stock_codes, np.argsort(retail_svd[customer_matrix_position])))
    customer_recommendations = list(
                        map(
                         lambda x: x[0],
                         sorted(customer_recommendations, key=lambda x: x[1])
                        )
    )
    customer_recommendations = [StockCode for StockCode in customer_recommendations if StockCode not in customer_bought_items]
    return(retail[retail.StockCode.isin(customer_recommendations[:5])][['StockCode','Description']])
    
    raise NotImplementedError("Finish the function...")

In [30]:
recommend(12349.0, 5)

Unnamed: 0,StockCode,Description
48,85099B,JUMBO BAG RED RETROSPOT
78,23012,GLASS APOTHECARY BOTTLE PERFUME
161,85099B,JUMBO BAG RED RETROSPOT
172,23343,JUMBO BAG VINTAGE CHRISTMAS
253,23343,JUMBO BAG VINTAGE CHRISTMAS
254,85099B,JUMBO BAG RED RETROSPOT
318,22113,GREY HEART HOT WATER BOTTLE
401,84596B,SMALL DOLLY MIX DESIGN ORANGE BOWL
458,22113,GREY HEART HOT WATER BOTTLE
790,85099B,JUMBO BAG RED RETROSPOT


### BONUS!
This only works for customers already in the data. How would you change this to make it recommend for a new customer that has purchased a list of items?