WHAT IS APRIORI ALGORITHM?

Apriori algorithm refers to an algorithm that is used in mining frequent products sets and relevant association rules. Generally, the apriori algorithm operates on a database containing a huge number of transactions. For example, the items customers but at a Big Bazar.

Apriori is generally considered an unsupervised learning approach, since it's often used to discover or mine for interesting patterns and relationships. Apriori can also be modified to do classification based on labelled data.

WHY TO USE APRIORI ALGORITHM?

The Apriori algorithm is used for mining frequent itemsets and devising association rules from a transactional database. The parameters “support” and “confidence” are used. Support refers to items' frequency of occurrence; confidence is a conditional probability. Items in a transaction form an item set.

An essential feature known as the Apriori property is utilized to boost the effectiveness of level-wise production of frequent itemsets. This property helps by minimizing the search area, which in turn serves to maximize the productivity of level-wise creation of frequent patterns.

HOW TO USE APRIORI ALGORITHM?

In [None]:
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules


In [None]:

# Loading the Data
data = pd.read_csv('Online_Retail.csv')
data.head()


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6.0,01-12-2010 08:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6.0,01-12-2010 08:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8.0,01-12-2010 08:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6.0,01-12-2010 08:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6.0,01-12-2010 08:26,3.39,17850.0,United Kingdom


In [None]:
# Exploring the columns of the data
data.columns


Index(['InvoiceNo', 'StockCode', 'Description', 'Quantity', 'InvoiceDate',
       'UnitPrice', 'CustomerID', 'Country'],
      dtype='object')

In [None]:
# Exploring the different regions of transactions
data.Country.unique()


array(['United Kingdom', 'France', 'Australia', 'Netherlands', 'Germany',
       'Norway', 'EIRE', 'Switzerland', 'Spain', 'Poland', 'Portugal',
       'Italy', 'Belgium', 'Lithuania', 'Japan', 'Iceland',
       'Channel Islands', 'Denmark', 'Cyprus', 'Sweden', 'Austria',
       'Israel', 'Finland', 'Bahrain', 'Greece', 'Hong Kong', 'Singapore',
       'Lebanon', 'United Arab Emirates', 'Saudi Arabia',
       'Czech Republic', 'Canada', nan], dtype=object)

In [None]:
# Stripping extra spaces in the description
data['Description'] = data['Description'].str.strip()

# Dropping the rows without any invoice number
data.dropna(axis = 0, subset =['InvoiceNo'], inplace = True)
data['InvoiceNo'] = data['InvoiceNo'].astype('str')

# Dropping all transactions which were done on credit
data = data[~data['InvoiceNo'].str.contains('C')]


In [None]:
# Transactions done in France
basket_France = (data[data['Country'] =="France"]
		.groupby(['InvoiceNo', 'Description'])['Quantity']
		.sum().unstack().reset_index().fillna(0)
		.set_index('InvoiceNo'))

# Transactions done in the United Kingdom
basket_UK = (data[data['Country'] =="United Kingdom"]
		.groupby(['InvoiceNo', 'Description'])['Quantity']
		.sum().unstack().reset_index().fillna(0)
		.set_index('InvoiceNo'))

# Transactions done in Portugal
basket_Por = (data[data['Country'] =="Portugal"]
		.groupby(['InvoiceNo', 'Description'])['Quantity']
		.sum().unstack().reset_index().fillna(0)
		.set_index('InvoiceNo'))

basket_Sweden = (data[data['Country'] =="Sweden"]
		.groupby(['InvoiceNo', 'Description'])['Quantity']
		.sum().unstack().reset_index().fillna(0)
		.set_index('InvoiceNo'))


In [None]:
# Defining the hot encoding function to make the data suitable
# for the concerned libraries
def hot_encode(x):
	if(x<= 0):
		return 0
	if(x>= 1):
		return 1

# Encoding the datasets
basket_encoded = basket_France.applymap(hot_encode)
basket_France = basket_encoded

basket_encoded = basket_UK.applymap(hot_encode)
basket_UK = basket_encoded




## FRANCE

In [None]:
# Building the model
frq_items = apriori(basket_France, min_support = 0.05, use_colnames = True)

# Collecting the inferred rules in a dataframe
rules = association_rules(frq_items, metric ="lift", min_threshold = 1)
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False])
print(rules.head())


                                            antecedents  \
2414  (SET/6 RED SPOTTY PAPER PLATES, PACK OF 20 SKU...   
2419  (PACK OF 6 SKULL PAPER PLATES, SET/6 RED SPOTT...   
3123  (PACK OF 6 SKULL PAPER PLATES, POSTAGE, SET/6 ...   
3127  (SET/6 RED SPOTTY PAPER PLATES, PACK OF 20 SKU...   
3134  (PACK OF 6 SKULL PAPER PLATES, SET/6 RED SPOTT...   

                                            consequents  antecedent support  \
2414  (PACK OF 6 SKULL PAPER PLATES, SET/6 RED SPOTT...            0.053763   
2419  (SET/6 RED SPOTTY PAPER PLATES, PACK OF 20 SKU...            0.053763   
3123  (SET/6 RED SPOTTY PAPER PLATES, PACK OF 20 SKU...            0.053763   
3127  (PACK OF 6 SKULL PAPER PLATES, SET/6 RED SPOTT...            0.053763   
3134  (SET/6 RED SPOTTY PAPER PLATES, PACK OF 20 SKU...            0.053763   

      consequent support   support  confidence  lift  leverage  conviction  
2414            0.053763  0.053763         1.0  18.6  0.050873         inf  
2419            

## SWEDEN

In [None]:
frq_items = apriori(basket_UK, min_support = 0.01, use_colnames = True)
rules = association_rules(frq_items, metric ="lift", min_threshold = 1)
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False])
print(rules.head())


                                            antecedents  \
8731  (CHRISTMAS TREE STAR DECORATION, CHRISTMAS TRE...   
6815         (HERB MARKER CHIVES, HERB MARKER ROSEMARY)   
8484  (HERB MARKER CHIVES, HERB MARKER ROSEMARY, HER...   
8093  (CHRISTMAS TREE STAR DECORATION, CHRISTMAS TRE...   
8725  (CHRISTMAS TREE STAR DECORATION, CHRISTMAS TRE...   

                                            consequents  antecedent support  \
8731  (CHRISTMAS TREE DECORATION WITH BELL, DOTCOM P...            0.010249   
6815                                (HERB MARKER THYME)            0.010249   
8484                                (HERB MARKER THYME)            0.010031   
8093              (CHRISTMAS TREE DECORATION WITH BELL)            0.010249   
8725              (CHRISTMAS TREE DECORATION WITH BELL)            0.010249   

      consequent support   support  confidence       lift  leverage  \
8731            0.013519  0.010249         1.0  73.967742  0.010110   
6815            0.014610  0.