# Assosciative rule mining

---

Assosciative rule mining is an unsupervised machine learning algorithm similar to Clustering and Dimensionality reduction. It is a descriptive method used to discover relationship hidden in large datasets. There are tghree types of assosciative rule mining techniques:


*   Apriori
*   ECLAT
*   FP-Growth

### Metrics

All the assosciative techniques are based on the following techniques#

*   **Support:**  Number of times a product or multiple products are purchased out of total transactions.
*   **Confidence:** Number of times a second product is pruchased after the purchase of the first product.
*   **Lift:** Ratio of support of two products being purchased together to the support of the two products purchased independently.


In this colab Apriori and FP-Growth are implemented using mlxtend and pyfpgrowth libraries respectively



In [1]:
# install required packages
!pip install openpyxl==3.0.9
!pip install mlxtend
!pip install pyfpgrowth



In [2]:
# import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyfpgrowth
from mlxtend.frequent_patterns import apriori,association_rules

# Apriori algorithm

Apriori algorithm utilizes the support, confidence, and lift for all possible combinations of the products such that they satisfy a minimum threshold of support and confidence. This avoids generating way too many combinations and avoid computations.

On a broader view, the algorithm performs the following steps.


*   Find out the frequent combinations of items called as 'itemsets' such that it satisfy the minimum support threshold.
*   Generate assosciation from frequent itemsets rules considering the minimum confidence threshold.



## Importing data

In [3]:
# Read data from google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
path = "/content/drive/MyDrive/datamining/datasets/Online Retail.xlsx"
df = pd.read_excel(path)
df

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
...,...,...,...,...,...,...,...,...
541904,581587,22613,PACK OF 20 SPACEBOY NAPKINS,12,2011-12-09 12:50:00,0.85,12680.0,France
541905,581587,22899,CHILDREN'S APRON DOLLY GIRL,6,2011-12-09 12:50:00,2.10,12680.0,France
541906,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4,2011-12-09 12:50:00,4.15,12680.0,France
541907,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4,2011-12-09 12:50:00,4.15,12680.0,France


## Data cleaning

In [5]:
df.Country.value_counts()

United Kingdom          495478
Germany                   9495
France                    8557
EIRE                      8196
Spain                     2533
Netherlands               2371
Belgium                   2069
Switzerland               2002
Portugal                  1519
Australia                 1259
Norway                    1086
Italy                      803
Channel Islands            758
Finland                    695
Cyprus                     622
Sweden                     462
Unspecified                446
Austria                    401
Denmark                    389
Japan                      358
Poland                     341
Israel                     297
USA                        291
Hong Kong                  288
Singapore                  229
Iceland                    182
Canada                     151
Greece                     146
Malta                      127
United Arab Emirates        68
European Community          61
RSA                         58
Lebanon 

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   InvoiceNo    541909 non-null  object        
 1   StockCode    541909 non-null  object        
 2   Description  540455 non-null  object        
 3   Quantity     541909 non-null  int64         
 4   InvoiceDate  541909 non-null  datetime64[ns]
 5   UnitPrice    541909 non-null  float64       
 6   CustomerID   406829 non-null  float64       
 7   Country      541909 non-null  object        
dtypes: datetime64[ns](1), float64(2), int64(1), object(4)
memory usage: 33.1+ MB


In [7]:
df.dropna(axis=0,subset=['InvoiceNo'],inplace=True)
df['InvoiceNo'] = df['InvoiceNo'].astype('str')
df['Description'] = df['Description'].str.strip()

df = df[~df['InvoiceNo'].str.contains('C')]

In [8]:
df_France = (df[df['Country'] == 'France']
         .groupby(['InvoiceNo','Description'])['Quantity']
                  .sum().unstack().reset_index().fillna(0)
                  .set_index('InvoiceNo'))

In [9]:
# Encode the data sutiable for the libraries
def encode(x):
  if(x<=0):
    return 0 
  if(x>=1):
    return 1

# encoding the France dataset
df_France_encoded = df_France.applymap(encode)
df_France = df_France_encoded
df_France

Description,10 COLOUR SPACEBOY PEN,12 COLOURED PARTY BALLOONS,12 EGG HOUSE PAINTED WOOD,12 MESSAGE CARDS WITH ENVELOPES,12 PENCIL SMALL TUBE WOODLAND,12 PENCILS SMALL TUBE RED RETROSPOT,12 PENCILS SMALL TUBE SKULL,12 PENCILS TALL TUBE POSY,12 PENCILS TALL TUBE RED RETROSPOT,12 PENCILS TALL TUBE WOODLAND,15CM CHRISTMAS GLASS BALL 20 LIGHTS,16 PIECE CUTLERY SET PANTRY DESIGN,18PC WOODEN CUTLERY SET DISPOSABLE,20 DOLLY PEGS RETROSPOT,200 RED + WHITE BENDY STRAWS,3 HOOK HANGER MAGIC GARDEN,3 PIECE SPACEBOY COOKIE CUTTER SET,3 RAFFIA RIBBONS 50'S CHRISTMAS,3 STRIPEY MICE FELTCRAFT,3 TIER CAKE TIN RED AND CREAM,3 TRADITIONAl BISCUIT CUTTERS SET,36 DOILIES DOLLY GIRL,36 DOILIES VINTAGE CHRISTMAS,36 FOIL HEART CAKE CASES,36 FOIL STAR CAKE CASES,36 PENCILS TUBE RED RETROSPOT,36 PENCILS TUBE SKULLS,36 PENCILS TUBE WOODLAND,3D DOG PICTURE PLAYING CARDS,3D HEARTS HONEYCOMB PAPER GARLAND,3D SHEET OF DOG STICKERS,3D TRADITIONAL CHRISTMAS STICKERS,3D VINTAGE CHRISTMAS STICKERS,4 IVORY DINNER CANDLES SILVER FLOCK,4 PINK DINNER CANDLE SILVER FLOCK,4 TRADITIONAL SPINNING TOPS,5 HOOK HANGER MAGIC TOADSTOOL,5 HOOK HANGER RED MAGIC TOADSTOOL,50'S CHRISTMAS GIFT BAG LARGE,6 GIFT TAGS 50'S CHRISTMAS,...,WOODLAND DESIGN COTTON TOTE BAG,WOODLAND LARGE BLUE FELT HEART,WOODLAND LARGE PINK FELT HEART,WOODLAND LARGE RED FELT HEART,WOODLAND MINI BACKPACK,WOODLAND PARTY BAG + STICKER SET,WOODLAND SMALL BLUE FELT HEART,WOODLAND SMALL PINK FELT HEART,WOODLAND SMALL RED FELT HEART,WOODLAND STORAGE BOX LARGE,WOODLAND STORAGE BOX SMALL,WORLD WAR 2 GLIDERS ASSTD DESIGNS,WRAP VINTAGE DOILY,WRAP 50'S CHRISTMAS,WRAP ALPHABET DESIGN,WRAP CAROUSEL,WRAP CHRISTMAS VILLAGE,WRAP CIRCUS PARADE,WRAP DOILEY DESIGN,WRAP DOLLY GIRL,WRAP ENGLISH ROSE,WRAP GINGHAM ROSE,WRAP GREEN PEARS,WRAP I LOVE LONDON,WRAP PAISLEY PARK,WRAP PINK FAIRY CAKES,WRAP POPPIES DESIGN,WRAP RED APPLES,WRAP RED VINTAGE DOILY,WRAP SUKI AND FRIENDS,WRAP VINTAGE PETALS DESIGN,YELLOW COAT RACK PARIS FASHION,YELLOW GIANT GARDEN THERMOMETER,YELLOW SHARK HELICOPTER,ZINC STAR T-LIGHT HOLDER,ZINC FOLKART SLEIGH BELLS,ZINC HERB GARDEN CONTAINER,ZINC METAL HEART DECORATION,ZINC T-LIGHT HOLDER STAR LARGE,ZINC T-LIGHT HOLDER STARS SMALL
InvoiceNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
536370,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
536852,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
536974,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
537065,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
537463,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
580986,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
581001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
581171,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
581279,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [10]:
pd.set_option('display.max_columns', None)

## Building the model

In [11]:
# Find frequent items
freq_items = apriori(df_France,min_support=0.05,use_colnames=True)

# Collect the inferred rules in the France dataframe
rules = association_rules(freq_items, metric='lift',min_threshold=1)
rules = rules.sort_values(['confidence','lift'],ascending = [False,False])
print(rules.head())

                                           antecedents  \
45                        (JUMBO BAG WOODLAND ANIMALS)   
260  (PLASTERS IN TIN CIRCUS PARADE, RED TOADSTOOL ...   
272  (RED TOADSTOOL LED NIGHT LIGHT, PLASTERS IN TI...   
301  (SET/6 RED SPOTTY PAPER CUPS, SET/20 RED RETRO...   
302  (SET/6 RED SPOTTY PAPER PLATES, SET/20 RED RET...   

                         consequents  antecedent support  consequent support  \
45                         (POSTAGE)            0.076531            0.765306   
260                        (POSTAGE)            0.051020            0.765306   
272                        (POSTAGE)            0.053571            0.765306   
301  (SET/6 RED SPOTTY PAPER PLATES)            0.102041            0.127551   
302    (SET/6 RED SPOTTY PAPER CUPS)            0.102041            0.137755   

      support  confidence      lift  leverage  conviction  
45   0.076531       1.000  1.306667  0.017961         inf  
260  0.051020       1.000  1.306667  0.011974     

From the output we can see that paper cups and paper plates are brought together in France. 

# FP-Growth Algortihm

FP-Growth or Frequent Patterns Growth utilizes a special tree structure called as FP-Tree that stores the frequent patterns at one place. FP-Tree helps reduce the need for scanning data drastically. Hence it is faster than Apriori algorithm. However, FP-Growth algortihms are not suitable for larger datasets.

Similar to Apriori, FP-Growth genrate assosciation or rules in the following steps.

*   Find frequent combinations called as 'itemsets' that satisfy minimum support threshold. 
*   Generate rules from frequent itemsets utilizing FP-Tree.



## Build a datset

In [12]:
# build a datset
shopping_data = [['Milk','Bread','Nutmeg','Eggs'],
                 ['Milk','Nutmeg','Flour','Icecream'],
                 ['Bread','Eggs','Nutmeg','Milk'],
                 ['Bread','Eggs','Chocolate','Icecream']]

shopping_data

[['Milk', 'Bread', 'Nutmeg', 'Eggs'],
 ['Milk', 'Nutmeg', 'Flour', 'Icecream'],
 ['Bread', 'Eggs', 'Nutmeg', 'Milk'],
 ['Bread', 'Eggs', 'Chocolate', 'Icecream']]

## Build the model

In [13]:
# Find the frequent patterns with minimum support_threshold of 0.5
freq_patterns = pyfpgrowth.find_frequent_patterns(transactions=shopping_data,
                                                  support_threshold=0.5)
print(freq_patterns)

{('Flour',): 1, ('Flour', 'Icecream'): 1, ('Flour', 'Nutmeg'): 1, ('Flour', 'Milk'): 1, ('Flour', 'Icecream', 'Nutmeg'): 1, ('Flour', 'Icecream', 'Milk'): 1, ('Flour', 'Milk', 'Nutmeg'): 1, ('Flour', 'Icecream', 'Milk', 'Nutmeg'): 1, ('Chocolate',): 1, ('Chocolate', 'Icecream'): 1, ('Chocolate', 'Eggs'): 1, ('Bread', 'Chocolate'): 1, ('Chocolate', 'Eggs', 'Icecream'): 1, ('Bread', 'Chocolate', 'Icecream'): 1, ('Bread', 'Chocolate', 'Eggs'): 1, ('Bread', 'Chocolate', 'Eggs', 'Icecream'): 1, ('Icecream', 'Nutmeg'): 1, ('Icecream', 'Milk'): 1, ('Icecream', 'Milk', 'Nutmeg'): 1, ('Eggs', 'Icecream'): 1, ('Bread', 'Icecream'): 1, ('Bread', 'Eggs', 'Icecream'): 1, ('Milk',): 3, ('Milk', 'Nutmeg'): 3, ('Eggs', 'Milk'): 2, ('Bread', 'Milk'): 2, ('Eggs', 'Milk', 'Nutmeg'): 2, ('Bread', 'Milk', 'Nutmeg'): 2, ('Bread', 'Eggs', 'Milk'): 2, ('Bread', 'Eggs', 'Milk', 'Nutmeg'): 2, ('Bread',): 3, ('Eggs', 'Nutmeg'): 2, ('Bread', 'Eggs', 'Nutmeg'): 2, ('Bread', 'Nutmeg'): 2, ('Eggs',): 3, ('Bread', 'E

## Generate rules

In [14]:
# generate rules with minimum confidence_threshold = 0.5
fp_rules = pyfpgrowth.generate_association_rules(patterns=freq_patterns,confidence_threshold=0.5)
fp_rules

{('Bread',): (('Eggs',), 1.0),
 ('Bread', 'Chocolate'): (('Eggs', 'Icecream'), 1.0),
 ('Bread', 'Chocolate', 'Eggs'): (('Icecream',), 1.0),
 ('Bread', 'Chocolate', 'Icecream'): (('Eggs',), 1.0),
 ('Bread', 'Eggs'): (('Nutmeg',), 0.6666666666666666),
 ('Bread', 'Eggs', 'Icecream'): (('Chocolate',), 1.0),
 ('Bread', 'Eggs', 'Milk'): (('Nutmeg',), 1.0),
 ('Bread', 'Eggs', 'Nutmeg'): (('Milk',), 1.0),
 ('Bread', 'Icecream'): (('Eggs',), 1.0),
 ('Bread', 'Milk'): (('Eggs', 'Nutmeg'), 1.0),
 ('Bread', 'Milk', 'Nutmeg'): (('Eggs',), 1.0),
 ('Bread', 'Nutmeg'): (('Eggs',), 1.0),
 ('Chocolate',): (('Bread', 'Eggs', 'Icecream'), 1.0),
 ('Chocolate', 'Eggs'): (('Bread', 'Icecream'), 1.0),
 ('Chocolate', 'Eggs', 'Icecream'): (('Bread',), 1.0),
 ('Chocolate', 'Icecream'): (('Bread', 'Eggs'), 1.0),
 ('Eggs',): (('Bread',), 1.0),
 ('Eggs', 'Icecream'): (('Bread',), 1.0),
 ('Eggs', 'Milk'): (('Bread', 'Nutmeg'), 1.0),
 ('Eggs', 'Milk', 'Nutmeg'): (('Bread',), 1.0),
 ('Eggs', 'Nutmeg'): (('Bread',), 1.