## Recommendation Systems

### Association Rules

**With regards to Association Rules, we define 3 metrics:-**

1. Support
2. Confidence
3. Lift

**We take some pre-defined notations.**

1. **N   : No of baskets**
2. **Nx  : No of baskets which contains X.**
3. **Ny  : No of baskets which contains Y.**
4. **Nxy : No of baskets which contain both X and Y.**

**Based on the above calculation, we find the 3 metrics as follows:**
1. **Support = (Nxy) / N**
2. **Confidence = (Nxy) / Ny**
3. **Lift = (Nxy) / (Nx) * (Ny)**

In [1]:
pip install mlxtend

Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

### Accessing the Dataset

In [3]:
gr = open('groceries.csv', 'r')

In [4]:
gr

<_io.TextIOWrapper name='groceries.csv' mode='r' encoding='cp1252'>

In [5]:
content = gr.readlines()
content

['citrus fruit,semi-finished bread,margarine,ready soups\n',
 '\n',
 'tropical fruit,yogurt,coffee\n',
 '\n',
 'whole milk\n',
 '\n',
 'pip fruit,yogurt,cream cheese,meat spreads\n',
 '\n',
 'other vegetables,whole milk,condensed milk,long life bakery product\n',
 '\n',
 'whole milk,butter,yogurt,rice,abrasive cleaner\n',
 '\n',
 'rolls/buns\n',
 '\n',
 'other vegetables,UHT-milk,rolls/buns,bottled beer,liquor (appetizer)\n',
 '\n',
 'potted plants\n',
 '\n',
 'whole milk,cereals\n',
 '\n',
 'tropical fruit,other vegetables,white bread,bottled water,chocolate\n',
 '\n',
 'citrus fruit,tropical fruit,whole milk,butter,curd,yogurt,flour,bottled water,dishes\n',
 '\n',
 'beef\n',
 '\n',
 'frankfurter,rolls/buns,soda\n',
 '\n',
 'chicken,tropical fruit\n',
 '\n',
 'butter,sugar,fruit/vegetable juice,newspapers\n',
 '\n',
 'fruit/vegetable juice\n',
 '\n',
 'packaged fruit/vegetables\n',
 '\n',
 'chocolate\n',
 '\n',
 'specialty bar\n',
 '\n',
 'other vegetables\n',
 '\n',
 'butter milk,pas

In [6]:
len(content)

19670

In [7]:
type(content)

list

In [12]:
items = []
for x in content:
    item = x.strip()
    items.append(item.split(','))

In [13]:
items

[['citrus fruit', 'semi-finished bread', 'margarine', 'ready soups'],
 [''],
 ['tropical fruit', 'yogurt', 'coffee'],
 [''],
 ['whole milk'],
 [''],
 ['pip fruit', 'yogurt', 'cream cheese', 'meat spreads'],
 [''],
 ['other vegetables',
  'whole milk',
  'condensed milk',
  'long life bakery product'],
 [''],
 ['whole milk', 'butter', 'yogurt', 'rice', 'abrasive cleaner'],
 [''],
 ['rolls/buns'],
 [''],
 ['other vegetables',
  'UHT-milk',
  'rolls/buns',
  'bottled beer',
  'liquor (appetizer)'],
 [''],
 ['potted plants'],
 [''],
 ['whole milk', 'cereals'],
 [''],
 ['tropical fruit',
  'other vegetables',
  'white bread',
  'bottled water',
  'chocolate'],
 [''],
 ['citrus fruit',
  'tropical fruit',
  'whole milk',
  'butter',
  'curd',
  'yogurt',
  'flour',
  'bottled water',
  'dishes'],
 [''],
 ['beef'],
 [''],
 ['frankfurter', 'rolls/buns', 'soda'],
 [''],
 ['chicken', 'tropical fruit'],
 [''],
 ['butter', 'sugar', 'fruit/vegetable juice', 'newspapers'],
 [''],
 ['fruit/vegetable 

In [14]:
import mlxtend

In [16]:
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()

items_encoded = te.fit_transform(items)

items_encoded

array([[False, False, False, ..., False, False, False],
       [ True, False, False, ..., False, False, False],
       [False, False, False, ..., False,  True, False],
       ...,
       [ True, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [ True, False, False, ..., False, False, False]])

In [17]:
len(items_encoded)

19670

### Converting into pandas

In [19]:
items_df = pd.DataFrame(items_encoded, columns = te.columns_)
items_df

Unnamed: 0,Unnamed: 1,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,baby food,bags,baking powder,bathroom cleaner,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
3,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19665,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
19666,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
19667,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
19668,False,False,False,False,False,False,False,False,False,False,...,False,True,False,False,False,False,False,False,False,False


In [20]:
items_df.shape

(19670, 170)

### Building the model using apriori algorithm

In [26]:
from mlxtend.frequent_patterns import apriori, association_rules

frequent_set = apriori(items_df, min_support = 0.01, use_colnames = True)
frequent_set

Unnamed: 0,support,itemsets
0,0.500000,()
1,0.016726,(UHT-milk)
2,0.026233,(beef)
3,0.016624,(berries)
4,0.013015,(beverages)
...,...,...
118,0.016116,"(whipped/sour cream, whole milk)"
119,0.010371,"(whipped/sour cream, yogurt)"
120,0.028012,"(whole milk, yogurt)"
121,0.011591,"(root vegetables, whole milk, other vegetables)"


### Association Rules

In [27]:
rules = association_rules(frequent_set, metric = 'lift')
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(beef),(whole milk),0.026233,0.127758,0.010625,0.405039,3.170359,0.007274,1.466048
1,(whole milk),(beef),0.127758,0.026233,0.010625,0.083168,3.170359,0.007274,1.062099
2,(whole milk),(bottled beer),0.127758,0.040264,0.010219,0.079984,1.986473,0.005075,1.043173
3,(bottled beer),(whole milk),0.040264,0.127758,0.010219,0.253788,1.986473,0.005075,1.168893
4,(other vegetables),(bottled water),0.096746,0.055262,0.012405,0.128219,2.320202,0.007058,1.083687
...,...,...,...,...,...,...,...,...,...
129,"(whole milk, yogurt)",(other vegetables),0.028012,0.096746,0.011134,0.397459,4.108262,0.008424,1.499075
130,"(other vegetables, yogurt)",(whole milk),0.021708,0.127758,0.011134,0.512881,4.014469,0.008360,1.790612
131,(whole milk),"(other vegetables, yogurt)",0.127758,0.021708,0.011134,0.087147,4.014469,0.008360,1.071686
132,(other vegetables),"(whole milk, yogurt)",0.096746,0.028012,0.011134,0.115081,4.108262,0.008424,1.098392


In [28]:
# Most important rules

rules.sort_values('confidence', ascending = False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
130,"(other vegetables, yogurt)",(whole milk),0.021708,0.127758,0.011134,0.512881,4.014469,0.008360,1.790612
18,(butter),(whole milk),0.027707,0.127758,0.013777,0.497248,3.892106,0.010238,1.734934
27,(curd),(whole milk),0.026640,0.127758,0.013066,0.490458,3.838961,0.009662,1.711816
123,"(root vegetables, other vegetables)",(whole milk),0.023691,0.127758,0.011591,0.489270,3.829665,0.008565,1.707835
122,"(root vegetables, whole milk)",(other vegetables),0.024453,0.096746,0.011591,0.474012,4.899540,0.009225,1.717253
...,...,...,...,...,...,...,...,...,...
78,(whole milk),(pork),0.127758,0.028826,0.011083,0.086749,3.009437,0.007400,1.063425
1,(whole milk),(beef),0.127758,0.026233,0.010625,0.083168,3.170359,0.007274,1.062099
32,(whole milk),(frankfurter),0.127758,0.029487,0.010269,0.080382,2.726059,0.006502,1.055344
2,(whole milk),(bottled beer),0.127758,0.040264,0.010219,0.079984,1.986473,0.005075,1.043173


In [29]:
items_df

Unnamed: 0,Unnamed: 1,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,baby food,bags,baking powder,bathroom cleaner,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
3,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19665,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
19666,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
19667,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
19668,False,False,False,False,False,False,False,False,False,False,...,False,True,False,False,False,False,False,False,False,False


In [50]:
remove = [row for row in range (1, len(items_df), 2)]
remove

[1,
 3,
 5,
 7,
 9,
 11,
 13,
 15,
 17,
 19,
 21,
 23,
 25,
 27,
 29,
 31,
 33,
 35,
 37,
 39,
 41,
 43,
 45,
 47,
 49,
 51,
 53,
 55,
 57,
 59,
 61,
 63,
 65,
 67,
 69,
 71,
 73,
 75,
 77,
 79,
 81,
 83,
 85,
 87,
 89,
 91,
 93,
 95,
 97,
 99,
 101,
 103,
 105,
 107,
 109,
 111,
 113,
 115,
 117,
 119,
 121,
 123,
 125,
 127,
 129,
 131,
 133,
 135,
 137,
 139,
 141,
 143,
 145,
 147,
 149,
 151,
 153,
 155,
 157,
 159,
 161,
 163,
 165,
 167,
 169,
 171,
 173,
 175,
 177,
 179,
 181,
 183,
 185,
 187,
 189,
 191,
 193,
 195,
 197,
 199,
 201,
 203,
 205,
 207,
 209,
 211,
 213,
 215,
 217,
 219,
 221,
 223,
 225,
 227,
 229,
 231,
 233,
 235,
 237,
 239,
 241,
 243,
 245,
 247,
 249,
 251,
 253,
 255,
 257,
 259,
 261,
 263,
 265,
 267,
 269,
 271,
 273,
 275,
 277,
 279,
 281,
 283,
 285,
 287,
 289,
 291,
 293,
 295,
 297,
 299,
 301,
 303,
 305,
 307,
 309,
 311,
 313,
 315,
 317,
 319,
 321,
 323,
 325,
 327,
 329,
 331,
 333,
 335,
 337,
 339,
 341,
 343,
 345,
 347,
 349,
 351,

In [51]:
items_df_new = items_df.drop(remove, axis = 0)
items_df_new

Unnamed: 0,Unnamed: 1,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,baby food,bags,baking powder,bathroom cleaner,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
6,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
8,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19660,False,False,False,False,False,False,False,False,False,False,...,False,False,False,True,False,False,False,True,False,False
19662,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
19664,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
19666,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [52]:
items_df_new.drop(items_df_new.columns[0], axis = 1, inplace = True)

In [53]:
items_df_new

Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,baby food,bags,baking powder,bathroom cleaner,beef,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
6,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
8,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19660,False,False,False,False,False,False,False,False,False,True,...,False,False,False,True,False,False,False,True,False,False
19662,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
19664,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
19666,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
