# The use of Aprori algorithm of Association Rules technique, which has an application of Market Basket Analysis. 

The “Beer and Diapers” case is the most commonly used example in market basket analysis. This case was seen by a large retailer who found an unusual pattern of purchase by customers who were buying beer and baby diapers at the same time. These patterns of purchase were found out by mining the transaction data of all the customers. Similar kind of insights can be obtained by mining the large transaction data of purchases. Using these insights, the large retailers were able to set up the recommendations for the customers which would enable them to make more sales. Below program uses the Apriori algorithm of association rules technique and applies on the online retail data set to identify interesting purchase combinations. The Apriori algorithm is used to find the frequent item sets required for association rules.

In [2]:
import pandas as pd

from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules



Read the online retail transaction data of purchases
## You can download the data set from below link and name it as OnlineRetailData.xlsx.
## http://archive.ics.uci.edu/ml/datasets/online+retail

In [3]:
## You can download the data set from below link and name it as OnlineRetailData.xlsx.
## http://archive.ics.uci.edu/ml/datasets/online+retail

df = pd.read_excel('./OnlineRetailData.xlsx')
print ("Completed reading the xlsx file")
print (df.head())
print ("Head of the xlsx file")

df['Description'] = df['Description'].str.strip()
df.dropna(axis=0, subset=['InvoiceNo'], inplace=True)
df['InvoiceNo'] = df['InvoiceNo'].astype('str')
df = df[~df['InvoiceNo'].str.contains('C')]

basket = (df[df['Country'] =="EIRE"]
          .groupby(['InvoiceNo', 'Description'])['Quantity']
          .sum().unstack().reset_index().fillna(0)
          .set_index('InvoiceNo'))

print (basket.head())
print (basket)


Completed reading the xlsx file
  InvoiceNo StockCode                          Description  Quantity  \
0    536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6   
1    536365     71053                  WHITE METAL LANTERN         6   
2    536365    84406B       CREAM CUPID HEARTS COAT HANGER         8   
3    536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6   
4    536365    84029E       RED WOOLLY HOTTIE WHITE HEART.         6   

          InvoiceDate  UnitPrice  CustomerID         Country  
0 2010-12-01 08:26:00       2.55     17850.0  United Kingdom  
1 2010-12-01 08:26:00       3.39     17850.0  United Kingdom  
2 2010-12-01 08:26:00       2.75     17850.0  United Kingdom  
3 2010-12-01 08:26:00       3.39     17850.0  United Kingdom  
4 2010-12-01 08:26:00       3.39     17850.0  United Kingdom  
Head of the xlsx file
Description  10 COLOUR SPACEBOY PEN  12 COLOURED PARTY BALLOONS  \
InvoiceNo                                                         
5

In [4]:
basket.to_csv("country.csv",)
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

basket_sets = basket.applymap(encode_units)
#basket_sets.drop('POSTAGE', inplace=True, axis=1)

frequent_itemsets = apriori(basket_sets, min_support=0.07, use_colnames=True)

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
print ("The rules are given below: ")
print (rules.head())

rules.to_csv("rules.csv")

print (rules[ (rules['lift'] >= 6) &
       (rules['confidence'] >= 0.8) ])

print (basket['PINK REGENCY TEACUP AND SAUCER'].sum())
print (basket['GREEN REGENCY TEACUP AND SAUCER'].sum())
print (basket['ROSES REGENCY TEACUP AND SAUCER'].sum())



The rules are given below: 
                         antecedents                        consequents  \
0   (PINK REGENCY TEACUP AND SAUCER)  (GREEN REGENCY TEACUP AND SAUCER)   
1  (GREEN REGENCY TEACUP AND SAUCER)   (PINK REGENCY TEACUP AND SAUCER)   
2         (REGENCY CAKESTAND 3 TIER)  (GREEN REGENCY TEACUP AND SAUCER)   
3  (GREEN REGENCY TEACUP AND SAUCER)         (REGENCY CAKESTAND 3 TIER)   
4  (ROSES REGENCY TEACUP AND SAUCER)  (GREEN REGENCY TEACUP AND SAUCER)   

   antecedent support  consequent support   support  confidence      lift  \
0            0.097222            0.125000  0.090278    0.928571  7.428571   
1            0.125000            0.097222  0.090278    0.722222  7.428571   
2            0.246528            0.125000  0.086806    0.352113  2.816901   
3            0.125000            0.246528  0.086806    0.694444  2.816901   
4            0.166667            0.125000  0.114583    0.687500  5.500000   

   leverage  conviction  
0  0.078125   12.250000  
1  0.0

In [None]:
print (basket['ALARM CLOCK BAKELIKE GREEN'].sum())

print (basket['ALARM CLOCK BAKELIKE RED'].sum())

basket2 = (df[df['Country'] =="Australia"]
          .groupby(['InvoiceNo', 'Description'])['Quantity']
          .sum().unstack().reset_index().fillna(0)
          .set_index('InvoiceNo'))

basket_sets2 = basket2.applymap(encode_units)
#basket_sets2.drop('DOTCOM%20POSTAGE', inplace=True, axis=1)
frequent_itemsets2 = apriori(basket_sets2, min_support=0.05, use_colnames=True)

frequent_itemsets2.to_csv("Aus.csv")
rules2 = association_rules(frequent_itemsets2, metric="lift", min_threshold=1)

rules2.to_csv("rules2.csv")
print (rules2[ (rules2['lift'] >= 8) &
        (rules2['confidence'] >= 0.9)])


49.0
284.0
