# Product Recommendation

Training association rule models (Apriori and ECLAT) to find the most related items bought by customers of Home Depot.

This algorithm associate products preferences by most of the customers and can be used to generate products recommendation and help on displaying products strategy.

In [1]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [11]:
# Data Loading
dataset = pd.read_csv(r"C:\Users\AJINKYA KUNJIR\Desktop\trialproducts1.csv",sep= ',')

transactions = []
for i in range(0, 3992):
    transactions.append([str(dataset.values[i,j]) for j in range(1, 12)])

In [12]:
dataset.head(5)

Unnamed: 0,CustomerID,Product1,Product2,Product3,Product4,Product5,Product6,Product7,Product8,Product9,Product10,Product11,Product12
0,1,Mattress,Bedframe,Pillow,Pillow cases,Blankets,Wall lights,Ceiling lights,Curtain hooks,Bedframe,TV stand,Ottomans,Accent table
1,2,Media storage,TV stand,Blankets,,,,,,,,,
2,3,Bedframe,,,,,,,,,,,
3,4,Pillow cases,Pillow,,,,,,,,,,
4,5,Accent table,Ottomans,Shower curtain,,,,,,,,,


In [13]:
# Inspecting elements of the list
transactions[:2]

[['Mattress',
  'Bedframe',
  'Pillow',
  'Pillow cases',
  'Blankets',
  'Wall lights',
  'Ceiling lights',
  'Curtain hooks',
  'Bedframe',
  'TV stand',
  'Ottomans'],
 ['Media storage',
  'TV stand',
  'Blankets',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan']]

In [14]:
## Training Apriori on the dataset
# The hyperparameters choosen on this training are:
# min_support = items bought more than 3 times a day * 7 days (week) / 3992 customers = 0.0052
# min_confidence: at least 20%, min_lift = minimum of 3 (less than that is too low)

from apyori import apriori
rules = apriori(transactions, min_support = 0.0052, min_confidence = 0.3, min_lift = 2, min_length = 2)

In [15]:
# Visualising the results
results = list(rules)

In [16]:
results

[RelationRecord(items=frozenset({'Pillow', 'Ceiling lights'}), support=0.005761523046092184, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Ceiling lights'}), items_add=frozenset({'Pillow'}), confidence=0.3194444444444444, lift=2.301845166466105)]),
 RelationRecord(items=frozenset({'Accent table', 'TV stand', 'Pillow'}), support=0.005260521042084168, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Accent table', 'TV stand'}), items_add=frozenset({'Pillow'}), confidence=0.3, lift=2.1617328519855596)]),
 RelationRecord(items=frozenset({'Chairs', 'Mattress', 'Bedframe'}), support=0.010270541082164329, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Chairs', 'Bedframe'}), items_add=frozenset({'Mattress'}), confidence=0.5256410256410257, lift=2.0451841855350628)]),
 RelationRecord(items=frozenset({'Chairs', 'Bedframe', 'Pillow'}), support=0.006763527054108216, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Chairs', 'Bedframe'}), items_add=fr

In [17]:
lift = []
association = []
for i in range (0, len(results)):
    lift.append(results[:len(results)][i][2][0][3])
    association.append(list(results[:len(results)][i][0]))

### Lets see which products are brought together

In [63]:
rank = pd.DataFrame([association, lift]).T
rank.columns = ['Association', 'Lift']

In [64]:
# Show top 10 higher lift scores
rank.sort_values('Lift', ascending=False).head(10)

Unnamed: 0,Association,Lift
41,"[Pillow, Mattress, Pillow cases, Ottomans]",3.21686
37,"[Pillow, Bedframe, Pillow cases, Ottomans]",2.85746
39,"[Curtain rods, Mattress, Ottomans, Pillow]",2.58117
24,"[Curtain rods, Bathroom organizer, Ottomans, P...",2.54974
25,"[Blankets, Mattress, Bedframe, Pillow]",2.51364
14,"[Pillow, Mattress, Pillow cases]",2.50636
3,"[Chairs, Bedframe, Pillow]",2.49431
11,"[Pillow, Bedframe, Wall lights]",2.48068
29,"[Pillow, Mattress, Bedframe, Media storage]",2.47055
12,"[Mattress, Chairs, TV stand]",2.43177


Hence, we get to know that there is maximum probability of buying Pillow, Mattress, Pillow cases and Ottamans together. It indicates that a person is likely to buy Mattress, Pillow cases and Ottomans if he buys a pillow.   

## ECLAT Implementation


#### Getting the list of products in a list

In [32]:
# Putting all transactions in a single list
itens = []
for i in range(1, len(transactions)):
    itens.extend(transactions[i])

# Finding unique items from transactions and removing nan
uniqueItems = list(set(itens))
uniqueItems.remove('nan')

#### Creating combinations with the items - pairs

In [34]:
pair = []
for j in range(1, len(uniqueItems)):
    k = 1;
    while k <= len(uniqueItems):
        try:
            pair.append([uniqueItems[j], uniqueItems[j+k]])
        except IndexError:
            pass
        k = k + 1;       

#### Calculating score
The calculation is done looking at the number of customers that bought both items (the pair) and divided by all customers (3992). This calculation is done for all pairs possible and the score is returned on "score" list.

<center> . </center>
<center> *** score = (# lists that contain [item x and item y]) / (# all lists) ***</center>

In [35]:
score = []
for i in pair:
    cond = []
    for item in i:
        cond.append('("%s") in s' %item)
    mycode = ('[s for s in transactions if ' + ' and '.join(cond) + ']')
    #mycode = "print 'hello world'"
    score.append(len(eval(mycode))/3993.)

#### Showing results

Top 10 Most common pairs of items

In [36]:
ranking_ECLAT = pd.DataFrame([pair, score]).T
ranking_ECLAT.columns = ['Pair', 'Score']

In [37]:
ranking_ECLAT.sort_values('Score', ascending=False).head(10)

Unnamed: 0,Pair,Score
117,"[Ottomans, Bathroom organizer]",0.0984222
51,"[Mattress, Ottomans]",0.0964187
116,"[Ottomans, Bedframe]",0.0823942
122,"[Ottomans, Blankets]",0.0788881
131,"[Ottomans, Media storage]",0.0698723
54,"[Mattress, Bathroom organizer]",0.0693714
53,"[Mattress, Bedframe]",0.0688705
59,"[Mattress, Blankets]",0.0618583
130,"[Ottomans, Pillow]",0.0616078
176,"[Bathroom organizer, Blankets]",0.0581017


### Lets try it for combination of 3 products

In [38]:
# Creating trios
trio = []
for j in range(0, len(uniqueItems)):
    for k in range(j, len(uniqueItems)):
        for l in range(k, len(uniqueItems)):
            if (k != j) and (j != l) and (k != l):
                try:
                    trio.append([uniqueItems[j], uniqueItems[j+k], uniqueItems[j+l]])
                except IndexError:
                    pass 

In [39]:
trio[:5]

[['Ceiling lights', 'Accent table', 'TV stand'],
 ['Ceiling lights', 'Accent table', 'Mattress'],
 ['Ceiling lights', 'Accent table', 'Pillow cases '],
 ['Ceiling lights', 'Accent table', ' Accent table'],
 ['Ceiling lights', 'Accent table', 'Ottomans']]

In [40]:
score_trio = []
for i in trio:
    cond = []
    for item in i:
        cond.append('("%s") in s' %item)
    mycode = ('[s for s in transactions if ' + ' and '.join(cond) + ']')
    #mycode = "print 'hello world'"
    score_trio.append(len(eval(mycode))/3993.)

In [41]:
ranking_ECLAT_trio = pd.DataFrame([trio, score_trio]).T
ranking_ECLAT_trio.columns = ['Trio', 'Score']
ranking_ECLAT_trio.sort_values('Score', ascending=False).head(10)

Unnamed: 0,Trio,Score
851,"[Mattress, Bedframe, Bathroom organizer]",0.0240421
873,"[Mattress, Bathroom organizer, Blankets]",0.0230403
1316,"[Ottomans, Blankets, Media storage]",0.022289
856,"[Mattress, Bedframe, Blankets]",0.0215377
864,"[Mattress, Bedframe, Pillow]",0.0190333
327,"[Accent table, Mattress, Ottomans]",0.0175307
865,"[Mattress, Bedframe, Media storage]",0.0175307
1376,"[Ottomans, Pillow, Media storage]",0.0170298
951,"[Mattress, Blankets, Pillow]",0.0170298
881,"[Mattress, Bathroom organizer, Pillow]",0.0162785


## What about comparing the results from Apriori and ECLAT?

We got from Apriori that the combination that lead to more "attractiveness power" is "Pillow", "Mattress" and "Pillow cases" and "Ottomans". If we run the ECLAT code for this set of items, we will obtain: 0.0039.

This score of 3 items has not enough score to be placed among top 10, but they are measuring different metrics.  According to apriori these are the items that when picked one lead to another items more frequently than other combinations, i.e. when a person pick 'olive oil', the probability of picking 'whole wheat pasta' and 'mineral water' is much higher than picking another combination. ECLAT in another hand is just sorting as the most common combinations of all lists, not caring about how one item isolatedly can influence in the purchase of another.

In [111]:
i = ["Pillow", "Mattress", "Pillow cases", "Ottomans"]
cond = []
for item in i:
    cond.append('("%s") in s' %item)
mycode = ('[s for s in transactions if ' + ' and '.join(cond) + ']')
#mycode = "print 'hello world'"
tra = eval(mycode)

In [112]:
print ('Score for "Pillow", "Mattress", "Pillow cases", "Ottomans"]:', len(tra)/3992.)

Score for "Pillow", "Mattress", "Pillow cases", "Ottomans"]: 0.006262525050100201
