*Identifying relationships between the items that people frequently buy using Association Rule Mining i.e. an Unsupervised Learning Algorithm*

* Association Rule Mining
* Apriori Rule


In [1]:
#External package needed to install apyori
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25l- done
[?25hBuilding wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l- \ done
[?25h  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=f1fb0db0ddd293e95001a9ec31a5b1e33b1362fcc03d735edb1136590f6706d1
  Stored in directory: /root/.cache/pip/wheels/cb/f6/e1/57973c631d27efd1a2f375bd6a83b2a616c4021f24aab84080
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2
[0m

In [2]:
import pandas as pd #Dataframe Manipulation library
import numpy as np #Data Manipulation library

# Apyori : an implementation of Apriori algorithm with Python 
from apyori import apriori

##### Loading the dataset 
*having details of purchase orders of people from the grocery stores*

In [3]:
df = pd.read_csv("../input/groceries/groceries - groceries.csv")
print(f"The shape of the dataset is:  {df.shape}")
df.head(2)

The shape of the dataset is:  (9835, 33)


Unnamed: 0,Item(s),Item 1,Item 2,Item 3,Item 4,Item 5,Item 6,Item 7,Item 8,Item 9,...,Item 23,Item 24,Item 25,Item 26,Item 27,Item 28,Item 29,Item 30,Item 31,Item 32
0,4,citrus fruit,semi-finished bread,margarine,ready soups,,,,,,...,,,,,,,,,,
1,3,tropical fruit,yogurt,coffee,,,,,,,...,,,,,,,,,,


In [4]:
# Slicing the datase to remove the count of items for each instance of the dataset
df_s = df.iloc[:,1:]
print(f"The shape of the dataset is:  {df_s.shape}")
pd.set_option('max_columns', 35)
df_s.head(2)

The shape of the dataset is:  (9835, 32)


Unnamed: 0,Item 1,Item 2,Item 3,Item 4,Item 5,Item 6,Item 7,Item 8,Item 9,Item 10,Item 11,Item 12,Item 13,Item 14,Item 15,Item 16,Item 17,Item 18,Item 19,Item 20,Item 21,Item 22,Item 23,Item 24,Item 25,Item 26,Item 27,Item 28,Item 29,Item 30,Item 31,Item 32
0,citrus fruit,semi-finished bread,margarine,ready soups,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,tropical fruit,yogurt,coffee,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


##### Data Cleaning

*Apriori Algorithm requires list of list where each list indicates 1 complete transaction*

In [5]:
# Covert Pandas DataFrame into list such that each row of df_s becomes an item in the new list
transactions = []
for i in range(0,1000): #range to define the total rows
    transactions.append([str(df_s.values[i,j]) for j in range(0, df_s.shape[1])])
    
#Each item in the record list will include items bought by a customer
#records[0:1]

In [6]:
#Unique items/grocery items in the dataset
unique_items = [x for y in transactions for x in y]
print(f"Unique grocery items are: {len(set(unique_items))}")

Unique grocery items are: 157


##### Apriori Rule/Association Rule Mining

* Associating multiple events occurring together
* Mining Rules through statistical analysis from the history of events that might have occurred

In [7]:
association_rules = apriori(transactions,
                            min_support = 0.008,
                            min_confidence = 0.3, # Strength of the Rule
                            min_lift = 3, # Indicates Rule has occurred Naturally or Randomly
                            min_length = 2)
association_results = list(association_rules)

#min_support = 0.008 , min_confidence = 0.3, min_lift = 3: All 3 criterias must be met to consider itemset as frequent itemset else drop itemset from further analysis
#min_length = 2 ; Atleast 2 items must be there to consider for rules

In [8]:
# Review Rules generated by Apriori Algorithm
print(f"Total rules or commonly observed itemsets in dataset satisfying the support, confidence and lift threshold are: {len(association_results)}")

Total rules or commonly observed itemsets in dataset satisfying the support, confidence and lift threshold are: 78


In [9]:
for item in association_results:
    pair = item[0]
    items = [x for x in pair]
    print(f"\nRule : {items[0]} and {items[1]} ")
    print(f"The Support is {str(item[1])}")
    conf = round(item[2][0][2] * 100,2)
    print(f"The Confidence is {conf}") 
    print(f"The Lift is {str(item[2][0][3])}")
    print(f"Whenever a customer buys {items[0]}, {str(conf)}% of the times {items[1]} is also purchased")


Rule : berries and whipped/sour cream 
The Support is 0.017
The Confidence is 36.17
The Lift is 4.88786658999425
Whenever a customer buys berries, 36.17% of the times whipped/sour cream is also purchased

Rule : root vegetables and butter 
The Support is 0.017
The Confidence is 36.96
The Lift is 3.3596837944664033
Whenever a customer buys root vegetables, 36.96% of the times butter is also purchased

Rule : tropical fruit and candy 
The Support is 0.009
The Confidence is 36.0
The Lift is 3.7113402061855663
Whenever a customer buys tropical fruit, 36.0% of the times candy is also purchased

Rule : long life bakery product and coffee 
The Support is 0.008
The Confidence is 30.77
The Lift is 4.048582995951417
Whenever a customer buys long life bakery product, 30.77% of the times coffee is also purchased

Rule : sliced cheese and curd 
The Support is 0.009
The Confidence is 32.14
The Lift is 4.285714285714286
Whenever a customer buys sliced cheese, 32.14% of the times curd is also purchas