# Recommender Systems Deep-Dive Lab

There are many different approaches that we can take when creating recommender systems. In the Intro to Recommender Systems lesson and lab, we put together a user similarity based recommender that first calculated the similarities between users and then leveraged a rank-based item recommender within each group of similar customers. In other words, for a given user, our recommender found the top 5 customers who were the most similar to them, aggregated and ranked the purchases of those 5 customers, and then recommended the top 5 most popular products among that group of similar users to the customer.

In this lab, we are going to start out with the same data set, but we are going to dive deeper into the analysis of customers and products and look at an alternative way to generate recommendations.

We will begin by importing everything we will need for this lab (libraries, data set, etc.).

In [147]:
import pandas as pd
import numpy as np
from scipy.spatial.distance import pdist, squareform
from heapq import nlargest

data = pd.read_csv('../data/customer_product_sales.csv')

## Data Preparation

We will then put together the foundational transformations of the data that we will need to eventually produce recommendations. The steps in this section should be familiar to you, as you would have had to tranform the data in this manner to create the user similarity based recommender in the Intro to Recommender Systems lab.

First, we will create a data frame that contains the total quantity of each product purchased by each customer.

In [36]:
#Create dataframe that shows the amount purchased per customer of each product
prods_per_consumer = pd.DataFrame(data.groupby(['CustomerID', 'ProductName'])['Quantity'].sum()).reset_index()
prods_per_consumer.head()

Unnamed: 0,CustomerID,ProductName,Quantity
0,33,Apricots - Dried,1
1,33,Assorted Desserts,1
2,33,Bandage - Flexible Neon,1
3,33,"Bar Mix - Pina Colada, 355 Ml",1
4,33,"Beans - Kidney, Canned",1


Then, we want to create a matrix that has customers on one axis, products on the other, and the quantity purchased as the values. There will be many instances where a customer has not purchased a product, which by default will be expressed with a null value. We will want to replace those nulls with zeros by appending `.fillna(0)` to our pivot table.

In [37]:
#Create a matrix with ProductName, CustomerID and the quantity purchased of each product.
prods_matrix = pd.pivot_table(prods_per_consumer, 
                              index='ProductName', 
                              columns='CustomerID', 
                              values='Quantity', 
                              fill_value=0, 
                              aggfunc=np.sum)

prods_matrix.head()

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
ProductName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Anchovy Paste - 56 G Tube,0,0,0,0,0,0,0,1,0,0,...,0,25,0,0,0,0,0,0,0,0
"Appetizer - Mini Egg Roll, Shrimp",0,0,0,0,0,0,0,0,0,0,...,25,25,0,0,0,0,0,0,0,0
Appetizer - Mushroom Tart,0,0,0,0,0,0,0,1,0,0,...,25,0,0,0,0,0,0,0,25,0
Appetizer - Sausage Rolls,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,25,25,25,0,25,0
Apricots - Dried,1,0,0,0,1,0,0,0,0,0,...,0,25,0,0,0,0,0,0,0,0


In the pivot table we created, the rows represent the products and the columns represent the customers. Depending on what need to do with the matrix, we may instead need to transpose it so that the rows represent customers and the columns to represent products. We can do this easily by appending `.T` to our product customer matrix.

In [38]:
#Transpose the matrix so products are on columns
prods_matrix = prods_matrix.T
prods_matrix.head()

ProductName,Anchovy Paste - 56 G Tube,"Appetizer - Mini Egg Roll, Shrimp",Appetizer - Mushroom Tart,Appetizer - Sausage Rolls,Apricots - Dried,Apricots - Halves,Apricots Fresh,Arizona - Green Tea,Artichokes - Jerusalem,Assorted Desserts,...,"Wine - White, Colubia Cresh","Wine - White, Mosel Gold","Wine - White, Schroder And Schyl",Wine - Wyndham Estate Bin 777,Wonton Wrappers,Yeast Dry - Fermipan,Yoghurt Tubes,"Yogurt - Blueberry, 175 Gr",Yogurt - French Vanilla,Zucchini - Yellow
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,0,0,0,0,1,0,0,0,0,1,...,0,0,0,0,0,0,0,0,1,0
200,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,1,0,0
264,0,0,0,0,0,1,1,0,0,0,...,0,0,0,1,0,0,0,0,0,0
356,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
412,0,0,0,0,1,0,0,0,0,0,...,0,1,1,1,0,0,0,0,0,0


Another thing we may want to do is normalize the values across rows or columns of the matrix so that all the values are between 0 and 1. Doing this for customers would help us identify customers that may have purcahsed a similar mix of products even though some of those customers may have purchased large quantities while other may have purchased smaller quantities. Doing this for products would help us better identify products that have been purchased by similar groups of customers regardless of the quantities purchased.

We can normalize across rows for each matrix as follows.

In [39]:
#standardize the matrix
prods_matrix= (prods_matrix - np.min(prods_matrix))/(np.max(prods_matrix) - np.min(prods_matrix))
prods_matrix.head()

ProductName,Anchovy Paste - 56 G Tube,"Appetizer - Mini Egg Roll, Shrimp",Appetizer - Mushroom Tart,Appetizer - Sausage Rolls,Apricots - Dried,Apricots - Halves,Apricots Fresh,Arizona - Green Tea,Artichokes - Jerusalem,Assorted Desserts,...,"Wine - White, Colubia Cresh","Wine - White, Mosel Gold","Wine - White, Schroder And Schyl",Wine - Wyndham Estate Bin 777,Wonton Wrappers,Yeast Dry - Fermipan,Yoghurt Tubes,"Yogurt - Blueberry, 175 Gr",Yogurt - French Vanilla,Zucchini - Yellow
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0
200,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.015152,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0
264,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,...,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0
356,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0
412,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0


## User Similarity Based Recommendations

The next step in creating recommendations is calculating similarities. For our user similarity based recommender, we calculated them between customers.

In [40]:
#create similarity matrix between users, where 1 means completely equal (same user) 
#and 0 means completely different users

sim = pd.DataFrame(1/(1 + squareform(pdist(prods_matrix, 'euclidean'))), 
                   index=prods_matrix.index, 
                   columns=prods_matrix.index)
sim.head()

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.0,0.798542,0.817512,0.809747,0.800155,0.808664,0.79317,0.806254,0.804796,0.797597,...,0.181697,0.183665,0.174513,0.197927,0.175583,0.176942,0.181587,0.174803,0.162506,0.170941
200,0.798542,1.0,0.800144,0.796683,0.786241,0.794846,0.799691,0.798179,0.798062,0.795569,...,0.182266,0.184066,0.17521,0.198869,0.1754,0.177303,0.182534,0.175174,0.1623,0.170876
264,0.817512,0.800144,1.0,0.803584,0.804055,0.804834,0.796736,0.802505,0.804374,0.79304,...,0.182081,0.183862,0.174614,0.19853,0.175779,0.177864,0.181978,0.175091,0.163,0.170932
356,0.809747,0.796683,0.803584,1.0,0.790832,0.798179,0.791346,0.807941,0.79364,0.787507,...,0.182118,0.183724,0.174828,0.19854,0.175586,0.177268,0.181416,0.175185,0.162719,0.171232
412,0.800155,0.786241,0.804055,0.790832,1.0,0.797474,0.78742,0.806112,0.797314,0.795598,...,0.181754,0.184946,0.174538,0.19834,0.175662,0.177428,0.181378,0.175494,0.162794,0.171587


Once we had our similarity matrix, then we could produce recommendations for each user and package all the recommendations into a data frame.

In [41]:
#generate a dataframe with a rate of recommendation of each product to each customer
#by taking the dot product of the prods_matrix (products purchased per customer) and 
#the similarity matrix

recommendations = pd.DataFrame(np.dot(sim, prods_matrix), index=sim.index, columns=prods_matrix.columns)

recommendations.head()

ProductName,Anchovy Paste - 56 G Tube,"Appetizer - Mini Egg Roll, Shrimp",Appetizer - Mushroom Tart,Appetizer - Sausage Rolls,Apricots - Dried,Apricots - Halves,Apricots Fresh,Arizona - Green Tea,Artichokes - Jerusalem,Assorted Desserts,...,"Wine - White, Colubia Cresh","Wine - White, Mosel Gold","Wine - White, Schroder And Schyl",Wine - Wyndham Estate Bin 777,Wonton Wrappers,Yeast Dry - Fermipan,Yoghurt Tubes,"Yogurt - Blueberry, 175 Gr",Yogurt - French Vanilla,Zucchini - Yellow
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,7.15697,8.237705,14.76207,11.325395,11.807736,11.014104,11.700721,14.848343,12.358594,11.12945,...,8.25661,11.741427,11.071251,17.997743,9.965741,5.306327,13.735412,11.656557,12.490119,10.611047
200,7.160409,8.240769,14.76676,11.327441,11.79765,11.012361,11.706214,14.849981,12.369824,11.120381,...,8.26263,11.745717,11.072532,17.988154,9.972215,5.3082,13.746228,11.677312,12.481196,10.617266
264,7.155057,8.238747,14.772586,11.330671,11.798704,11.024883,11.715522,14.856158,12.367901,11.119244,...,8.255665,11.746553,11.073045,18.050192,9.970929,5.308565,13.740987,11.659776,12.480342,10.613088
356,7.160281,8.244531,14.77198,11.325763,11.798644,11.012254,11.702543,14.855287,12.365124,11.123679,...,8.257497,11.746521,11.073925,17.99807,9.971839,5.30768,13.737535,11.667468,12.4986,10.618381
412,7.160329,8.243751,14.779857,11.333033,11.816913,11.017261,11.706033,14.854692,12.378761,11.129693,...,8.259207,11.767682,11.090487,18.056455,9.976669,5.308806,13.745529,11.669455,12.485833,10.611118


## Deeper Dive Into Our User Similarity Recommendations

Let's deconstruct what we've done and take a deeper dive into how we put this together. Doing this will equip us with the knowledge to be able to put together an item-based similarity recommender in the next section. 

After creating an empty dictionary to store our recommendations and getting a unique list of customer IDs to iterate through, we are first identifying the top 5 similar customers to the customer we are trying to generate recommendations for. Let's plug in customer ID 33 and see what results we get.

In [55]:
#Get a Dataframe of the most similar customers to customer 33
top_5_33 = pd.DataFrame(sim[33].sort_values(ascending=False).head(6)).reset_index()
top_5_33.columns = ['CustomerID', 'Similarity_33']
top_5_33

Unnamed: 0,CustomerID,Similarity_33
0,33,1.0
1,264,0.817512
2,3535,0.816412
3,1577,0.815517
4,2503,0.81505
5,3305,0.814926


What we get is a list containing the 5 customer IDs of the customers whose purchase behavior is most similar to customer 33. We then go back to our customer_products data frame and select just the purchases where the customer ID is in our list of similar customers. We aggregate on product name, summing up the total quantity purchased of each product by all 5 similar customers, and then we rank them by sorting in descending order by the total quantity.

In [70]:
#Create a dataframe with the sum of the products purchased by the top5 similar customers to ID 33 in descending order
prods_sim_33 = pd.DataFrame(prods_per_consumer[prods_per_consumer['CustomerID'].isin(top_5_33['CustomerID'])]
                            .groupby('ProductName')['Quantity']
                            .count()).sort_values(by='Quantity', ascending=False)
prods_sim_33.head()

Unnamed: 0_level_0,Quantity
ProductName,Unnamed: 1_level_1
Towels - Paper / Kraft,4
Wine - Crozes Hermitage E.,4
"Lamb - Pieces, Diced",4
Bandage - Flexible Neon,3
Potatoes - Idaho 100 Count,3


We now have a ranked list of products that similar customers have purchased, but we haven't taken into consideration yet whether our target customer already purchases any of those items. We want to recommend them items that they might like but haven't purchased before. So we will merge the list of ranked products with our target customer's purchase list and keep only the records for items that the customer has not purchased. These will be the items that we recommend to the customer.

In [100]:
#right merge of products purchased by customer 33 vs those purchased by similar customers
not_purchased = pd.merge(prods_per_consumer[prods_per_consumer['CustomerID'] == 33], 
                         prods_sim_33, 
                         how='right', 
                         on='ProductName').fillna(0)

#rename columns for clarity
not_purchased.columns = ['CustomerID', 'ProductName', 'Quantity_33', 'Quantity_similar']


#keep only records that were not purchased by customer 33 and order by most purchased
not_purchased = not_purchased[not_purchased['Quantity_33'] == 0].sort_values(by='Quantity_similar', ascending=False)
not_purchased = not_purchased.drop('CustomerID', axis=1)                                                                                        

not_purchased.head()

Unnamed: 0,ProductName,Quantity_33,Quantity_similar
59,Wine - Blue Nun Qualitatswein,0.0,3
61,Cheese Cloth No 100,0.0,3
62,Beef Wellington,0.0,3
63,Juice - Lime,0.0,3
64,Yoghurt Tubes,0.0,3


## Item Similarity Based Recommendations

In this section, you will create an item similarity based recommender system in a step-by-step fashion. Whereas our user similarity based recommender leveraged similarities between customers, this recommender will utilize similarities between products. You already have all the tools in your toolbox, so follow each of the steps below to complete this lab.

### Step 1: Create a product distance matrix.

In [113]:
#Created a product distance matrix but I am not sure this is the matrix asked,
#nor if the distance measure is adequate
sim2 = pd.DataFrame(1/(1 + squareform(pdist(prods_matrix.T, 'euclidean'))), 
                   index=prods_matrix.columns, 
                   columns=prods_matrix.columns)
sim2.head()

ProductName,Anchovy Paste - 56 G Tube,"Appetizer - Mini Egg Roll, Shrimp",Appetizer - Mushroom Tart,Appetizer - Sausage Rolls,Apricots - Dried,Apricots - Halves,Apricots Fresh,Arizona - Green Tea,Artichokes - Jerusalem,Assorted Desserts,...,"Wine - White, Colubia Cresh","Wine - White, Mosel Gold","Wine - White, Schroder And Schyl",Wine - Wyndham Estate Bin 777,Wonton Wrappers,Yeast Dry - Fermipan,Yoghurt Tubes,"Yogurt - Blueberry, 175 Gr",Yogurt - French Vanilla,Zucchini - Yellow
ProductName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Anchovy Paste - 56 G Tube,1.0,0.204548,0.147628,0.172469,0.176986,0.171802,0.177537,0.146791,0.170806,0.18549,...,0.207128,0.174772,0.179959,0.123265,0.175729,0.237627,0.16735,0.180573,0.172157,0.169273
"Appetizer - Mini Egg Roll, Shrimp",0.204548,1.0,0.144811,0.161784,0.167518,0.166853,0.16975,0.151919,0.16749,0.16936,...,0.201349,0.167963,0.174971,0.121653,0.172053,0.212067,0.164238,0.178819,0.165428,0.168096
Appetizer - Mushroom Tart,0.147628,0.144811,1.0,0.134533,0.12907,0.134202,0.134829,0.123312,0.133959,0.131307,...,0.147013,0.13206,0.134859,0.112904,0.138671,0.15431,0.133518,0.138508,0.134408,0.132237
Appetizer - Sausage Rolls,0.172469,0.161784,0.134533,1.0,0.146787,0.151991,0.146343,0.132872,0.141366,0.147651,...,0.163345,0.148847,0.151577,0.120694,0.149241,0.167488,0.1439,0.149854,0.143585,0.144066
Apricots - Dried,0.176986,0.167518,0.12907,0.146787,1.0,0.150272,0.149056,0.135642,0.143909,0.15116,...,0.164177,0.147783,0.154139,0.11833,0.147415,0.169317,0.147825,0.153907,0.146914,0.151037


### Step 2: Get the products purchased for a specific customer of your choice.

In [112]:
#products purchased by customer 200
prods_200 = prods_per_consumer[prods_per_consumer['CustomerID']==200].reset_index().drop('index', axis=1)
prods_200.head()

Unnamed: 0,CustomerID,ProductName,Quantity
0,200,Bacardi Breezer - Tropical,1
1,200,Bagel - Plain,1
2,200,Bar - Granola Trail Mix Fruit Nut,1
3,200,Beef - Ground Medium,1
4,200,Beef - Montreal Smoked Brisket,1


### Step 3: For each product the customer purchased, get a list of the top 5 similar products. Package the lists into a nested list, flatten the list, and then filter out any products the customer has already purchased.

In [125]:
#2 lists comprehensions
#The first one a nested one for the top 5 similar products, ordering the product column in the similar matrix
#Getting the first 6 results but discarding the 1st one (index 0), since it is the same product
sim_lst = [list(sim2[x].sort_values(ascending=False).head(6).index)[1:] for x in prods_200['ProductName']]

#The second one is a flattened list based on the first with the condition that the product is not in the 
#dataframe of products purchased by customer 200
sim_lst = [it for item in sim_lst for it in item if it not in prods_200['ProductName']]

### Step 4: Count the number of times each similar product occurs in your filtered list. Sort and return a list containing the top 5 items.

In [146]:
#created a dictionary with keys = product names and values = counts in the list
sim_dict = {}
for a in sim_lst:
    if a not in sim_dict.keys():
        sim_dict[a] = 1
    else:
        sim_dict[a] += 1

#passed the dictionary to a dataframe, sorted_values and got the top 5 products        
top_5_sim = pd.DataFrame.from_dict(sim_dict, 
                                   orient='index').sort_values(0, 
                                                               ascending=False).head(5).reset_index()
top_5_sim.columns = ['ProductName', 'SimilarCount']
top_5_sim

Unnamed: 0,ProductName,SimilarCount
0,Yeast Dry - Fermipan,61
1,Longos - Grilled Salmon With Bbq,52
2,Spice - Peppercorn Melange,44
3,Blueberries,25
4,Scampi Tail,24


### Step 5: Now that we have generated product recommendations for a single user, put the pieces together and iterate over a list of all CustomerIDs.

- Create an empty dictionary that will hold the recommendations for all customers.
- Create a list of unique CustomerIDs to iterate over.
- Iterate over the customer list performing steps 2 through 4 for each and appending the results of each iteration to the dictionary you created.

In [153]:
#Got a list of all Customer IDs
customers = list(recommendations.index)

#defined function that will do the job
def rec_products(lst, prods, matrix):
    #creating temporary variables
    sim_dict = {}
    total_dict = {}
    tmp = pd.DataFrame()
    #will iterate over the customers list
    for client in lst:
        #getting the products DF for a specified client
        tmp = prods[prods['CustomerID']==client].reset_index().drop('index', axis=1)
        #list comprehension to get nested list of similar products
        tmp_lst = [list(matrix[x].sort_values(ascending=False).head(6).index)[1:] for x in tmp['ProductName']]
        #flattening the list and removing purchased products
        tmp_lst = [it for item in tmp_lst for it in item if it not in tmp['ProductName']]
        #counting values in a dictionary
        for a in tmp_lst:
            if a not in sim_dict.keys():
                sim_dict[a] = 1
            else:
                sim_dict[a] += 1
        #getting list of keys for the 5 largest values, adding to bigger dictionary, in which each client is a key
        total_dict[client] = nlargest(5, sim_dict, key=sim_dict.get)
    #return bigger dictionary
    return total_dict

#This code is terrible! In the previous exercise we had done it completely with pandas, why lists and loops??

### Step 6: Store the results in a Pandas data frame. The data frame should a column for Customer ID and then a column for each of the 5 product recommendations for each customer.

In [154]:
#Something is not right... The code works but this df is just... wrong... 
#Why would we recommend almost the same products to everyone?
#Where did I get it wrong?

top_5_products = pd.DataFrame.from_dict(rec_products(customers, prods_per_consumer, sim2), orient='index')
top_5_products.head()

Unnamed: 0,0,1,2,3,4
33,Yeast Dry - Fermipan,Longos - Grilled Salmon With Bbq,Spice - Peppercorn Melange,Anchovy Paste - 56 G Tube,Blueberries
200,Yeast Dry - Fermipan,Longos - Grilled Salmon With Bbq,Spice - Peppercorn Melange,Anchovy Paste - 56 G Tube,Blueberries
264,Yeast Dry - Fermipan,Longos - Grilled Salmon With Bbq,Spice - Peppercorn Melange,Blueberries,Anchovy Paste - 56 G Tube
356,Yeast Dry - Fermipan,Longos - Grilled Salmon With Bbq,Spice - Peppercorn Melange,Anchovy Paste - 56 G Tube,Blueberries
412,Yeast Dry - Fermipan,Longos - Grilled Salmon With Bbq,Spice - Peppercorn Melange,Anchovy Paste - 56 G Tube,Blueberries
464,Yeast Dry - Fermipan,Spice - Peppercorn Melange,Longos - Grilled Salmon With Bbq,Anchovy Paste - 56 G Tube,Blueberries
477,Yeast Dry - Fermipan,Spice - Peppercorn Melange,Longos - Grilled Salmon With Bbq,Anchovy Paste - 56 G Tube,Blueberries
639,Yeast Dry - Fermipan,Spice - Peppercorn Melange,Longos - Grilled Salmon With Bbq,Anchovy Paste - 56 G Tube,Blueberries
649,Yeast Dry - Fermipan,Longos - Grilled Salmon With Bbq,Spice - Peppercorn Melange,Blueberries,Anchovy Paste - 56 G Tube
669,Yeast Dry - Fermipan,Longos - Grilled Salmon With Bbq,Spice - Peppercorn Melange,Blueberries,Anchovy Paste - 56 G Tube


## Recommending Items to a New Customer

Suppose we get a new customer and on their first visit, they purchase the following items and quantities.

In [None]:
#? which ones?

### Step 7: Recommend 5 products to this new customer using a user similarity approach.

In [None]:
#unable to solve, need more information

### Step 8: Recommend 5 products to this new customer using a item similarity approach.

In [None]:
#unable to solve, need more information