<a href="https://colab.research.google.com/github/AilingLiu/Growth_Analysis/blob/master/Product_Recommendation_by_Collaborative_Filtering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings(action="ignore")

In [2]:
url='https://github.com/AilingLiu/Growth_Analysis/blob/master/Data/online_retail.csv?raw=true'
retail = pd.read_csv(url, encoding = 'unicode_escape')
retail['InvoiceDate']=pd.to_datetime(retail['InvoiceDate'])
retail['Payment'] = retail['Quantity'] * retail['UnitPrice']
retail.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,Payment
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom,15.3
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,20.34
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom,22.0
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,20.34
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,20.34


In [4]:
retail.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 9 columns):
InvoiceNo      541909 non-null object
StockCode      541909 non-null object
Description    540455 non-null object
Quantity       541909 non-null int64
InvoiceDate    541909 non-null datetime64[ns]
UnitPrice      541909 non-null float64
CustomerID     406829 non-null float64
Country        541909 non-null object
Payment        541909 non-null float64
dtypes: datetime64[ns](1), float64(3), int64(1), object(4)
memory usage: 37.2+ MB


In [3]:
retail.describe()

Unnamed: 0,Quantity,UnitPrice,CustomerID,Payment
count,541909.0,541909.0,406829.0,541909.0
mean,9.55225,4.611114,15287.69057,17.987795
std,218.081158,96.759853,1713.600303,378.810824
min,-80995.0,-11062.06,12346.0,-168469.6
25%,1.0,1.25,13953.0,3.4
50%,3.0,2.08,15152.0,9.75
75%,10.0,4.13,16791.0,17.4
max,80995.0,38970.0,18287.0,168469.6


Filter out the negatieve quantity records, and the missing customer ID.

In [5]:
df = retail.loc[(retail['Quantity'] > 0)&(retail['CustomerID'].notnull())]
df.isnull().any()

InvoiceNo      False
StockCode      False
Description    False
Quantity       False
InvoiceDate    False
UnitPrice      False
CustomerID     False
Country        False
Payment        False
dtype: bool

In [8]:
(df.Quantity<0).any()

False

## User-based Recommendation System

We will first build a user_item matrix, then calculate the user similarities based on the items they bought. If user b has the highest similarity with user a, we will recommend the products that is bought by user b to user a, provided user a has not bought the same products before.

In [11]:
user_item = pd.pivot_table(data=df, 
                               values='Quantity', 
                               index='CustomerID', 
                               columns='StockCode', aggfunc='sum', fill_value=0)
user_item.head()

StockCode,10002,10080,10120,10123C,10124A,10124G,10125,10133,10135,11001,15030,15034,15036,15039,15044A,15044B,15044C,15044D,15056BL,15056N,15056P,15058A,15058B,15058C,15060B,16008,16010,16011,16012,16014,16015,16016,16020C,16033,16043,16045,16046,16048,16049,16052,...,90209B,90209C,90210A,90210B,90210C,90210D,90211A,90211B,90212B,90212C,90214A,90214B,90214C,90214D,90214E,90214F,90214G,90214H,90214I,90214J,90214K,90214L,90214M,90214N,90214O,90214P,90214R,90214S,90214T,90214U,90214V,90214W,90214Y,90214Z,BANK CHARGES,C2,DOT,M,PADS,POST
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
12346.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
12347.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
12348.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9
12349.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
12350.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


In [27]:
from sklearn.metrics.pairwise import cosine_similarity
user_item_similarity = pd.DataFrame(cosine_similarity(user_item))
user_item_similarity.index = user_item.index
user_item_similarity.columns = user_item.index
user_item_similarity.head()

CustomerID,12346.0,12347.0,12348.0,12349.0,12350.0,12352.0,12353.0,12354.0,12355.0,12356.0,12357.0,12358.0,12359.0,12360.0,12361.0,12362.0,12363.0,12364.0,12365.0,12367.0,12370.0,12371.0,12372.0,12373.0,12374.0,12375.0,12377.0,12378.0,12379.0,12380.0,12381.0,12383.0,12384.0,12386.0,12388.0,12390.0,12391.0,12393.0,12394.0,12395.0,...,18230.0,18231.0,18232.0,18233.0,18235.0,18236.0,18237.0,18239.0,18240.0,18241.0,18242.0,18245.0,18246.0,18248.0,18249.0,18250.0,18251.0,18252.0,18255.0,18257.0,18259.0,18260.0,18261.0,18262.0,18263.0,18265.0,18268.0,18269.0,18270.0,18272.0,18273.0,18274.0,18276.0,18277.0,18278.0,18280.0,18281.0,18282.0,18283.0,18287.0
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
12346.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.384018,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044569,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12347.0,0.0,1.0,0.148879,0.02075,0.014435,0.028933,0.0,0.023478,0.506252,0.186107,0.016388,0.0,0.060049,0.084068,0.0,0.087708,0.12296,0.042852,0.018727,0.0,0.000548,0.001736,0.013251,0.034163,0.069184,0.020331,0.016206,0.016194,0.001949,0.139459,0.272656,0.094807,0.061485,0.0,0.003877,0.12136,0.003606,0.047752,0.0,0.148916,...,0.0,0.054619,0.005211,0.0,0.120282,0.004745,0.029026,0.007471,0.020939,0.076004,0.030404,0.04052,0.0,0.026294,0.024591,0.249526,0.143162,0.007149,0.0,0.017483,0.0,0.054911,0.0099,0.015089,0.022633,0.063313,0.0,0.12157,0.0,0.004694,0.0,0.001212,0.406837,0.0,0.015133,0.037236,0.0,0.010873,0.07451,0.108942
12348.0,0.0,0.148879,1.0,0.000169,0.000315,0.001311,0.0,0.010634,0.286226,0.226244,0.0,0.001114,0.021792,0.001123,0.00052,0.032066,0.338083,0.084154,0.0,0.000216,0.000103,0.001637,0.000586,0.000268,0.044577,0.000541,0.000229,0.09501,0.001021,0.001673,0.054114,0.023304,0.00215,0.0,0.0,0.000589,0.23501,0.041753,0.000475,0.093563,...,0.0,0.0,0.0,0.0,0.031662,0.013253,0.0,0.0,0.0,0.311034,0.005467,0.051757,0.0,0.055083,0.0,0.106941,0.352393,0.036554,0.0,0.0,0.0,0.113554,0.0,0.108376,0.150482,0.0,0.0,0.0,0.0,0.022841,0.0,0.03251,0.168665,0.0,0.0,0.0,0.0,0.0,0.17517,0.110096
12349.0,0.0,0.02075,0.000169,1.0,0.030121,0.131151,0.0,0.004931,0.00018,0.150819,0.103707,0.107506,0.077619,0.095496,0.062054,0.083723,0.017252,0.064482,0.000256,0.000142,0.022706,0.001559,0.149744,0.038406,0.032111,0.000357,0.024659,0.026019,0.073356,0.037698,0.033608,0.028497,0.050328,0.037702,0.080719,0.050583,0.003202,0.014915,0.140134,0.015534,...,0.0,0.018624,0.006,0.0,0.052908,0.105018,0.000164,0.01284,0.0,0.016196,0.019209,0.021807,0.0,0.03396,0.0,0.01506,0.0,0.024609,0.010994,0.040383,0.028289,0.029993,0.0,0.035735,0.0,0.008715,0.0,0.0,0.047596,0.110501,0.0,0.148066,0.0,0.0,0.01568,0.0,0.0,0.013398,0.065295,0.022576
12350.0,0.0,0.014435,0.000315,0.030121,1.0,0.00161,0.0,0.0,0.0,0.001179,0.075266,0.001368,0.0,0.045494,0.000638,0.071189,0.0,0.010838,0.0,0.000265,0.000127,0.00201,0.069867,0.00033,0.000842,0.000665,0.061135,0.0,0.037348,0.002054,0.001851,0.011362,0.00264,0.017553,0.0,0.035415,0.0,0.057688,0.01585,0.024403,...,0.072409,0.010727,0.0,0.0,0.097212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020919,0.0,0.0,0.022396,0.0,0.0,0.0,0.0,0.031558,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019385,0.0


The diagnal is 1, meaning user compares to itself. We can now take some examples of customer ID and see who they match the most.

In [30]:
user_item_similarity.loc[12346.0].nlargest(5)

CustomerID
12346.0    1.000000
15567.0    0.807065
18242.0    0.384018
17030.0    0.376196
17309.0    0.371391
Name: 12346.0, dtype: float64

Compare user 12346 with 15567 and 18242:

**User 12346**

In [39]:
df.loc[df['CustomerID']==12346.0, ['StockCode', 'Quantity', 'Description']]

Unnamed: 0,StockCode,Quantity,Description
61619,23166,74215,MEDIUM CERAMIC TOP STORAGE JAR


**User 15567**

In [40]:
df.loc[df['CustomerID']==15567.0, ['StockCode', 'Quantity', 'Description']]

Unnamed: 0,StockCode,Quantity,Description
485775,23166,96,MEDIUM CERAMIC TOP STORAGE JAR
485776,21401,48,BLUE PUDDING SPOON
485777,21400,48,RED PUDDING SPOON
485778,23178,6,JAM CLOCK MAGNET
485779,22762,1,CUPBOARD 3 DRAWER MA CAMPAGNE
485780,22969,12,HOMEMADE JAM SCENTED CANDLES
485781,23154,12,SET OF 4 JAM JAR MAGNETS


**User 18242**

In [43]:
df.loc[df['CustomerID']==18242.0, ['StockCode', 'Quantity', 'Description']].sort_values(by=['Quantity'], ascending=False)

Unnamed: 0,StockCode,Quantity,Description
206593,23167,96,SMALL CERAMIC TOP STORAGE JAR
364406,47566,50,PARTY BUNTING
206589,22962,48,JAM JAR WITH PINK LID
206594,23166,48,MEDIUM CERAMIC TOP STORAGE JAR
182553,84879,24,ASSORTED COLOUR BIRD ORNAMENT
...,...,...,...
206600,22283,2,6 EGG HOUSE PAINTED WOOD
206588,21524,2,DOORMAT SPOTTY HOME SWEET HOME
206587,48184,2,DOORMAT ENGLISH ROSE
206603,22606,1,WOODEN SKITTLES GARDEN SET


Apparently, user 12346 only bought one item `23166`, but A LOT. The other two users also bought quite some `23166` but not as many as user 12346.

We can compare user `12348.0` to others also and see who he is similar to.

In [46]:
user_item_similarity.loc[12348.0].nlargest(5)

CustomerID
12348.0    1.000000
16174.0    0.494248
14163.0    0.448390
17940.0    0.386058
13623.0    0.372502
Name: 12348.0, dtype: float64

In [56]:
compares = df.loc[df['CustomerID']==12348.0, ['StockCode', 'Quantity', 'Description']].merge(df.loc[df['CustomerID']==16174.0, ['StockCode', 'Quantity', 'Description']], 
                                        how='outer', 
                                        on='StockCode',
                                        suffixes=('_12348', '_16174')
                                        ).fillna(0).sort_values(by='Quantity_12348', ascending=False)
compares.head(15)

Unnamed: 0,StockCode,Quantity_12348,Description_12348,Quantity_16174,Description_16174
21,21985,144.0,PACK OF 12 HEARTS DESIGN TISSUES,0.0,0
23,21983,144.0,PACK OF 12 BLUE PAISLEY TISSUES,0.0,0
24,21967,144.0,PACK OF 12 SKULL TISSUES,0.0,0
7,21981,144.0,PACK OF 12 WOODLAND TISSUES,0.0,0
8,21982,144.0,PACK OF 12 SUKI TISSUES,0.0,0
20,21980,144.0,PACK OF 12 RED RETROSPOT TISSUES,0.0,0
27,23077,120.0,DOUGHNUT LIP GLOSS,20.0,DOUGHNUT LIP GLOSS
3,84991,120.0,60 TEATIME FAIRY CAKE CASES,0.0,0
5,21213,120.0,PACK OF 72 SKULL CAKE CASES,0.0,0
15,21977,120.0,PACK OF 60 PINK PAISLEY CAKE CASES,0.0,0


With the comparison as above, we can find out the items that user_12348 has not bought but bought by user_16174. We can recommend these products to this user.

## Item-based Recommendation

In [57]:
item_user = user_item.T
item_user_similarity = pd.DataFrame(cosine_similarity(item_user))
item_user_similarity.index = item_user.index
item_user_similarity.columns = item_user.index
item_user_similarity.head()

StockCode,10002,10080,10120,10123C,10124A,10124G,10125,10133,10135,11001,15030,15034,15036,15039,15044A,15044B,15044C,15044D,15056BL,15056N,15056P,15058A,15058B,15058C,15060B,16008,16010,16011,16012,16014,16015,16016,16020C,16033,16043,16045,16046,16048,16049,16052,...,90209B,90209C,90210A,90210B,90210C,90210D,90211A,90211B,90212B,90212C,90214A,90214B,90214C,90214D,90214E,90214F,90214G,90214H,90214I,90214J,90214K,90214L,90214M,90214N,90214O,90214P,90214R,90214S,90214T,90214U,90214V,90214W,90214Y,90214Z,BANK CHARGES,C2,DOT,M,PADS,POST
StockCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
10002,1.0,0.0,0.001548,0.00099,0.0,0.0,0.85389,0.052085,0.021921,0.003033,0.000244,0.00076,0.003697,0.000675,0.000228,0.005852,0.001401,0.004286,0.051966,0.024763,0.015653,0.000185,0.005532,0.000177,0.017316,0.001139,0.006565,0.016381,0.000282,9e-05,1.1e-05,0.001339,0.006565,0.0,0.0,0.008351,0.000911,0.015965,0.0,0.005699,...,0.0,0.000578,0.0,0.0,0.0,0.0,0.001094,0.0,0.0,0.0,0.0,0.0,0.0,0.000157,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038765,0.0,0.000308,0.0,0.07426
10080,0.0,1.0,0.0,0.0,0.0,0.0,0.004958,0.020646,0.011878,0.0,0.0,0.026559,0.000772,0.000246,0.0,0.0,0.0,0.00021,0.0,0.003713,0.002197,0.0,0.0,0.0,0.000232,0.010238,0.0,0.004404,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047006,0.0,0.008001,0.067007,0.0,...,0.0,0.013912,0.0,0.0,0.018852,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001639,0.00108,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6e-06,0.0,0.0
10120,0.001548,0.0,1.0,0.004903,0.0,0.0,0.0016,0.042543,0.01042,0.009962,0.007238,0.006858,0.002091,0.002635,0.0,0.0,0.0,0.00052,0.00563,0.002737,0.001809,0.005498,0.015514,0.005269,0.001149,0.000117,0.065051,0.008012,0.077612,2.2e-05,0.000314,0.003619,0.0,0.0,0.0,0.030765,0.016925,0.005493,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.032526,0.0,0.0,0.0,0.0,0.0,0.0,0.004668,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007664,0.0,0.000331
10123C,0.00099,0.0,0.004903,1.0,0.0,0.0,0.004417,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10124A,0.0,0.0,0.0,0.0,1.0,0.491784,0.001099,0.014967,0.0,0.0,0.0,0.0,0.007339,0.004364,0.00608,0.0,0.0,0.002799,0.000846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029294,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.091099,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Take product 10002 as example, we will check its related product, and recommend them to any user who has bought 10002 before.

In [81]:
related_items = item_user_similarity.loc['10002'].nlargest(10)
related_prods = df.loc[df['StockCode'].isin(related_items.index), ['StockCode', 'Description']].drop_duplicates().set_index('StockCode').join(related_items)
related_prods.rename(columns={'10002': 'cosSim'}, inplace=True)
related_prods.sort_values(by='cosSim',ascending=False)

Unnamed: 0_level_0,Description,cosSim
StockCode,Unnamed: 1_level_1,Unnamed: 2_level_1
10002,INFLATABLE POLITICAL GLOBE,1.0
10125,MINI FUNKY DESIGN TAPES,0.85389
23224,CHERUB HEART DECORATION GOLD,0.713423
23222,CHRISTMAS TREE HANGING GOLD,0.699006
85014A,BLACK/BLUE POLKADOT UMBRELLA,0.626077
20682,RED RETROSPOT CHILDRENS UMBRELLA,0.597314
23007,SPACEBOY BABY GIFT SET,0.575727
23009,I LOVE LONDON BABY GIFT SET,0.557707
23010,CIRCUS PARADE BABY GIFT SET,0.553042
85014B,RED RETROSPOT UMBRELLA,0.45768


Customers bought INFLATABLE POLITICAL GLOBE are also likely to buy MINI FUNKY DESIGN TAPES, and CHERUB HEART DECORATION GOLD, which are all some party decoration stuffs. Great, we can also recommend these to custmers.