# Association Rule Mining

Association Rule Mining is a method for identifying frequent patterns, correlations, associations, or causal structures in data sets found in many databases such as relational databases, transactional databases, and other types of data storage.

# Free Association Rules Mining Tools

**Bart Goethals** 

Provides implementations of several well known algorithms including Apriori, DIC, Eclata and Fp-growth.

**FPM** 

Contains all the C modules for various frequent item set mining techniques, along with an association rules GUI and viewer.

**FrIDA**

A Free Intelligent Data Analysis Toolbox
This is a Java-based GUI to data analysis programs written by Christian Borgelt in C. It includes basic visualization capabilities (scatter plots, bar charts etc) and visualization modules for decision and regression trees, and prototype based classifiers. Modules are also included for naive and full Bayes classifiers, radial basis function neural networks, multilayer perceptrons, multivariate and polynomial regression and association rule induction.
A pdf describing FrIDA can be found here.

**KNIME** 

provides basic association rules mining capability.

**Magnum** 

Opus is an association discovery tool that majors on the qualification of associations so that trivial and spurious rules are discarded, based on the measures the user specifies. The tool is easy to use, fast (linear relationship between compute time and data size) and is available in a free demo version throttled to 1000 cases.


**Rapid-i** 

includes several algorithms as part of it’s broad data mining

# Part of Association Rule Mining

1.An Antecedent (if)

2.Consequent (then)

# Types of Association Rule Learning

**Association rule learning can be divided into three algorithms:**

**Apriori Algorithm**

This algorithm uses frequent datasets to generate association rules. It is designed to work on the databases that contain transactions. This algorithm uses a breadth-first search and Hash Tree to calculate the itemset efficiently. It is mainly used for market basket analysis and helps to understand the products that can be bought together. It can also be used in the healthcare field to find drug reactions for patients.

**Eclat Algorithm**

Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-first search technique to find frequent itemsets in a transaction database. It performs faster execution than Apriori Algorithm.

**F-P Growth Algorithm** 

The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the Apriori Algorithm. It represents the database in the form of a tree structure that is known as a frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent patterns.**

# Example

Here I will give one example of the algorithm above, which is about the Apriori Algorithm. The first step, as always, is to import the required libraries.
In the script below I import pandas, numpy, and apriori libraries.

In [34]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

In [35]:
df = pd.read_csv("https://raw.githubusercontent.com/brandonndun/Sumarry_Week_9/main/GroceryStoreDataSet.csv", names = ['products'], sep = ',')
df.head()

Unnamed: 0,products
0,"MILK,BREAD,BISCUIT"
1,"BREAD,MILK,BISCUIT,CORNFLAKES"
2,"BREAD,TEA,BOURNVITA"
3,"JAM,MAGGI,BREAD,MILK"
4,"MAGGI,TEA,BISCUIT"


Let's examine the shape of the data set

In [36]:
df.shape

(20, 1)

Let’s split the products and create a list called by ‘data’

In [37]:
data = list(df["products"].apply(lambda x:x.split(",") ))
data

[['MILK', 'BREAD', 'BISCUIT'],
 ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['JAM', 'MAGGI', 'BREAD', 'MILK'],
 ['MAGGI', 'TEA', 'BISCUIT'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['MAGGI', 'TEA', 'CORNFLAKES'],
 ['MAGGI', 'BREAD', 'TEA', 'BISCUIT'],
 ['JAM', 'MAGGI', 'BREAD', 'TEA'],
 ['BREAD', 'MILK'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'COCK'],
 ['BREAD', 'SUGER', 'BISCUIT'],
 ['COFFEE', 'SUGER', 'CORNFLAKES'],
 ['BREAD', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['TEA', 'MILK', 'COFFEE', 'CORNFLAKES']]

# Apriori Algorithm and One-Hot Encoding

Apriori’s algorithm transforms True/False or 1/0.
Using TransactionEncoder, we convert the list to a One-Hot Encoded Boolean list.
Products that customers bought or did not buy during shopping will now be represented by values 1 and 0.

In [38]:
from mlxtend.preprocessing import TransactionEncoder
a = TransactionEncoder()
a_data = a.fit(data).transform(data)
df = pd.DataFrame(a_data,columns=a.columns_)
df = df.replace(False,0)
df

Unnamed: 0,BISCUIT,BOURNVITA,BREAD,COCK,COFFEE,CORNFLAKES,JAM,MAGGI,MILK,SUGER,TEA
0,True,0,True,0,0,0,0,0,True,0,0
1,True,0,True,0,0,True,0,0,True,0,0
2,0,True,True,0,0,0,0,0,0,0,True
3,0,0,True,0,0,0,True,True,True,0,0
4,True,0,0,0,0,0,0,True,0,0,True
5,0,True,True,0,0,0,0,0,0,0,True
6,0,0,0,0,0,True,0,True,0,0,True
7,True,0,True,0,0,0,0,True,0,0,True
8,0,0,True,0,0,0,True,True,0,0,True
9,0,0,True,0,0,0,0,0,True,0,0


# Applying Apriori and Resulting

The next step is to create the Apriori Model. We can change all the parameters in the Apriori Model in the mlxtend package.
I will try to use minimum support parameters for this modeling.
For this, I set a min_support value with a threshold value of 20% and printed them on the screen as well

In [39]:
df = apriori(df, min_support = 0.2, use_colnames = True, verbose = 1)
df

Processing 72 combinations | Sampling itemset size 2Processing 42 combinations | Sampling itemset size 3




Unnamed: 0,support,itemsets
0,0.35,(BISCUIT)
1,0.2,(BOURNVITA)
2,0.65,(BREAD)
3,0.4,(COFFEE)
4,0.3,(CORNFLAKES)
5,0.25,(MAGGI)
6,0.25,(MILK)
7,0.3,(SUGER)
8,0.35,(TEA)
9,0.2,"(BREAD, BISCUIT)"


I chose the 60% minimum confidence value. In other words, when product X is purchased, we can say that the purchase of product Y is 60% or more.

In [40]:
df_ar = association_rules(df, metric = "confidence", min_threshold = 0.6)
df_ar

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(MILK),(BREAD),0.25,0.65,0.2,0.8,1.230769,0.0375,1.75
1,(SUGER),(BREAD),0.3,0.65,0.2,0.666667,1.025641,0.005,1.05
2,(CORNFLAKES),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,0.08,1.8
3,(SUGER),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,0.08,1.8
4,(MAGGI),(TEA),0.25,0.35,0.2,0.8,2.285714,0.1125,3.25


**For example, if we examine our 1st index value**

1.The probability of seeing sugar sales is seen as 30%.

2.Bread intake is seen as 65%.

3.We can say that the support of both of them is measured as 20%.

4.67% of those who buys sugar, buys bread as well.

5.Users who buy sugar will likely consume 3% more bread than users who don’t buy sugar.

6.Their correlation with each other is seen as 1.05.

# Conclusion

**As a result, if item X and Y are bought together more frequently, then several steps can be taken to increase the profit. For instance:**

1.Cross-Selling can be improved by combining products — items

2.The shop layout can be changed so that sales can be improved when certain items are kept together.

3.Promotional activities which are an advertising campaign can be carried out to increase the sales of goods that customers do not buy.

4.Collective discounts can be offered on these products if the customer buys both of them.

# Reference

Jena M. (2022, May 27). Association Rule Mining Simplified 101. hevodata.com. https://hevodata.com/learn/association-rule-mining/#a2

Association Rule Learning. javatpoint.com. https://www.javatpoint.com/association-rule-learning

AnishaD. (2022, August 23). Association Rule. geeksforgeeks.org. https://www.geeksforgeeks.org/association-rule/

Lutkevich B. (2020, September). Association Rules. techtarget.com. https://www.techtarget.com/searchbusinessanalytics/definition/association-rules-in-data-mining

Joos, K. (2021, September 29). the-eclat-algorithm. https://towardsdatascience.com/the-eclat-algorithm-8ae3276d2d17

Torkan M. (2020, October 18). Association Rules with Phyton. medium.com. https://medium.com/@mervetorkan/association-rules-with-python-9158974e761a

Harsh. (2019, September 26). Association Analysis with Phyton. medium.com. https://medium.com/analytics-vidhya/association-analysis-in-python-2b955d0180c

Rai A. (2022,September 29). An Overview of Association Rule Mining & its Applications. upgrad.com. https://www.upgrad.com/blog/association-rule-mining-an-overview-and-its-applications/%5C

Malik, U. (2022, July 21). Association Rule Mining via Apriori Algorithm in Python. Stack Abuse. https://stackabuse.com/association-rule-mining-via-apriori-algorithm-in-python/

Afrida W, S. (2022, January 6). Association Rules — Market Basket Analysis dengan Python. Medium. https://yandaafrida.medium.com/association-rule-market-basket-analysis-menggunakan-python-a9c49b4bfc69