## Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Loading Data

In [2]:
#load transactions from pandas 
grocery = pd.read_csv("datasets/GroceryStoreDataSet.csv",names = ['transaction'])

#print the header
grocery.head()

Unnamed: 0,transaction
0,"MILK,BREAD,BISCUIT"
1,"BREAD,MILK,BISCUIT,CORNFLAKES"
2,"BREAD,TEA,BOURNVITA"
3,"JAM,MAGGI,BREAD,MILK"
4,"MAGGI,TEA,BISCUIT"


## Data Preprocessing

In [3]:
#spliting transaction strings into lists (pandas series)
#convert Datafram into lists of strings
transactions = list(grocery['transaction'].apply(lambda t:t.split(',')))


## generating rules with itertools

In [4]:
from itertools import permutations

#Extract unique items
flattened = [item for transaction in transactions for item in transaction]
items = list (set(flattened))
items[:]

['SUGER',
 'MILK',
 'COFFEE',
 'BREAD',
 'BOURNVITA',
 'CORNFLAKES',
 'MAGGI',
 'COCK',
 'JAM',
 'TEA',
 'BISCUIT']

In [13]:
#compute and print rules
rules = list(permutations(items,2))
rules

[('SUGER', 'MAGGI'),
 ('SUGER', 'TEA'),
 ('SUGER', 'BISCUIT'),
 ('SUGER', 'CORNFLAKES'),
 ('SUGER', 'BREAD'),
 ('SUGER', 'COCK'),
 ('SUGER', 'MILK'),
 ('SUGER', 'COFFEE'),
 ('SUGER', 'JAM'),
 ('SUGER', 'BOURNVITA'),
 ('MAGGI', 'SUGER'),
 ('MAGGI', 'TEA'),
 ('MAGGI', 'BISCUIT'),
 ('MAGGI', 'CORNFLAKES'),
 ('MAGGI', 'BREAD'),
 ('MAGGI', 'COCK'),
 ('MAGGI', 'MILK'),
 ('MAGGI', 'COFFEE'),
 ('MAGGI', 'JAM'),
 ('MAGGI', 'BOURNVITA'),
 ('TEA', 'SUGER'),
 ('TEA', 'MAGGI'),
 ('TEA', 'BISCUIT'),
 ('TEA', 'CORNFLAKES'),
 ('TEA', 'BREAD'),
 ('TEA', 'COCK'),
 ('TEA', 'MILK'),
 ('TEA', 'COFFEE'),
 ('TEA', 'JAM'),
 ('TEA', 'BOURNVITA'),
 ('BISCUIT', 'SUGER'),
 ('BISCUIT', 'MAGGI'),
 ('BISCUIT', 'TEA'),
 ('BISCUIT', 'CORNFLAKES'),
 ('BISCUIT', 'BREAD'),
 ('BISCUIT', 'COCK'),
 ('BISCUIT', 'MILK'),
 ('BISCUIT', 'COFFEE'),
 ('BISCUIT', 'JAM'),
 ('BISCUIT', 'BOURNVITA'),
 ('CORNFLAKES', 'SUGER'),
 ('CORNFLAKES', 'MAGGI'),
 ('CORNFLAKES', 'TEA'),
 ('CORNFLAKES', 'BISCUIT'),
 ('CORNFLAKES', 'BREAD'),
 ('COR

In [14]:
#print the number of rules
len(rules)

110

## Installing mlxtend 

In [5]:
pip install mlxtend  


Note: you may need to restart the kernel to use updated packages.


## Preparing the data to creat one-hot DataFrame

In [6]:
from mlxtend.preprocessing import TransactionEncoder

In [7]:
# Instantiate transaction encoder and identify unique items in transactions
encoder = TransactionEncoder().fit(transactions)

In [9]:
# One-hot encode itemsets by applying fit and transform
onehot = encoder.transform(transactions)


In [10]:
# Convert one-hot encoded data to DataFrame
onehot = pd.DataFrame(onehot, columns = encoder.columns_)
print(onehot)

    BISCUIT  BOURNVITA  BREAD   COCK  COFFEE  CORNFLAKES    JAM  MAGGI   MILK  \
0      True      False   True  False   False       False  False  False   True   
1      True      False   True  False   False        True  False  False   True   
2     False       True   True  False   False       False  False  False  False   
3     False      False   True  False   False       False   True   True   True   
4      True      False  False  False   False       False  False   True  False   
5     False       True   True  False   False       False  False  False  False   
6     False      False  False  False   False        True  False   True  False   
7      True      False   True  False   False       False  False   True  False   
8     False      False   True  False   False       False   True   True  False   
9     False      False   True  False   False       False  False  False   True   
10     True      False  False   True    True        True  False  False  False   
11     True      False  Fals

This is our desired format, one-hot. The columns are items in the store and each row represents a transaction. If the value is True, that item is sold in that transaction. Now, the data is ready to be fed to the algorithm.

## Find the frequent itemsets using Apriori

We will use Apriori to find the frequent itemsets from the one-hot transaction DataFrame. This step's objective is to decrease the computational workload in the association rule.

Frequent itemsets’ supports are higher than minimum support.
The min_support will be adjusted to be equals to 0.2 .

In [11]:
from mlxtend.frequent_patterns import apriori, association_rules

In [12]:
onehot = apriori(onehot, min_support = 0.2, use_colnames = True)
onehot.sort_values(['support'],ascending=False, inplace = True)
onehot.head()

  and should_run_async(code)


Unnamed: 0,support,itemsets
2,0.65,(BREAD)
3,0.4,(COFFEE)
0,0.35,(BISCUIT)
8,0.35,(TEA)
4,0.3,(CORNFLAKES)


Since we set the min_support to 0.2, only a set of item(s) whose support is greater than 0.2 (or more than 20% from 19 transactions) will be filltered.

Only these itemsets which are considered important will proceed to the association rule.

## Finding the Association rule

The association_rules function will automatically calculate key metrics of our transaction data including support, confidence, lift, leverage, and conviction.

In [18]:
grocery_ar = association_rules(onehot, metric="lift", min_threshold=1)
grocery_ar.head(10)

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(MILK),(BREAD),0.25,0.65,0.2,0.8,1.230769,0.0375,1.75,0.25
1,(BREAD),(MILK),0.65,0.25,0.2,0.307692,1.230769,0.0375,1.083333,0.535714
2,(SUGER),(BREAD),0.3,0.65,0.2,0.666667,1.025641,0.005,1.05,0.035714
3,(BREAD),(SUGER),0.65,0.3,0.2,0.307692,1.025641,0.005,1.011111,0.071429
4,(CORNFLAKES),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,0.08,1.8,0.571429
5,(COFFEE),(CORNFLAKES),0.4,0.3,0.2,0.5,1.666667,0.08,1.4,0.666667
6,(SUGER),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,0.08,1.8,0.571429
7,(COFFEE),(SUGER),0.4,0.3,0.2,0.5,1.666667,0.08,1.4,0.666667
8,(TEA),(MAGGI),0.35,0.25,0.2,0.571429,2.285714,0.1125,1.75,0.865385
9,(MAGGI),(TEA),0.25,0.35,0.2,0.8,2.285714,0.1125,3.25,0.75


## Some insights 

Based on Association rules, we can make some interpretation for further business actions.
For Example:

- From the highest confidence in index 0, the confidence is 0.8. It means that the customer who buys milk will buy bread for 80%. However, you have to keep in mind that confidence is not everything. The high confidence in this row is due to the high support of bread (consequence support 0.65) which means bread occurs in many transactions, so it will not make a business impact if we try to sell bread with milk anyway.



- The better metric is lift or leverage. Index 8 has the highest lift and leverage at 2.28 and 0.1125 respectively. It means that the customers who buy tea are likely to buy Maggi as well.

#### What we can do after knowing this?

1. The shop owner can change shelf layout to sell Maggi far from tea, This will give customers an opportunity to see more products on their way to buy tea, which means the possibility of buying more products.

2. Promoting Maggi together with tea.

#### There is a question that arises here:
If the owner wants to do promotions,
Should he use Maggi to promote tea, use tea to promote Maggi?

We know now that there's a relation between Maggi and tea, but what the best direction of this relationship?

(Tea) $\rightarrow$ (Maggi)   or   (Maggi) $\rightarrow$ (Tea)

Since the confidence has direction so it can help us here.
And Based on the table above the confidence of (Maggi) $\rightarrow$ (Tea) is greater than the confidence of (Tea) $\rightarrow$ (Maggi) . So the owner should use Maggi to promote tea .