# What is Association Rule?

Association rules are "if-then" statements, that help to show the probability of relationships between data items, within large data sets in various types of databases. Association rule mining has a number of applications and is widely used to help discover sales correlations in transactional data or in medical data sets.

# **Use cases for association rules**

1. Medicine
2. Retail 
3. User Experience(UX) Design
4. Entertainment
5. Market Basket Analysis

Association Rule Has 2 part : <br>

- An antecedent (if) and 
- A consequent (then)

An antecedent is something that’s found in data, and a consequent is an item that is found in combination with the antecedent.

# Types Of Association Rules In Data Mining
There are typically four different types of association rules in data mining. They are : <br>

1. Multi-relational association rules
2. Generalized Association rule
3. Interval Information Association Rules
4. Quantitative Association Rules

# Types Of Algorithm In Data Mining
Can be Divided into 3 Algorithm :
1. Apriori Algorithm <br>
This algorithm uses frequent datasets to generate association rules. It is designed to work on the databases that contain transactions. This algorithm uses a breadth-first search and Hash Tree to calculate the itemset efficiently.
It is mainly used for market basket analysis and helps to understand the products that can be bought together. It can also be used in the healthcare field to find drug reactions for patients.<br>
2. Eclat Algorithm(Equivalent Class Transformation)<br>
This algorithm uses a depth-first search technique to find frequent itemsets in a transaction database. It performs faster execution than Apriori Algorithm.<br>
3. F-P Algorithm(Frequent Pattern)<br>
It is the improved version of the Apriori Algorithm. It represents the database in the form of a tree structure that is known as a frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent patterns.

# Example

### **With Aproriori**

In [63]:
#import Library
import pandas as pd
import numpy as np
!pip install mlxtend
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt

  and should_run_async(code)




In [64]:
#Import Dataset
ditem = pd.read_csv('https://gist.githubusercontent.com/Harsh-Git-Hub/2979ec48043928ad9033d8469928e751/raw/72de943e040b8bd0d087624b154d41b2ba9d9b60/retail_dataset.csv', sep=',')
ditem.head(10)

  and should_run_async(code)


Unnamed: 0,0,1,2,3,4,5,6
0,Bread,Wine,Eggs,Meat,Cheese,Pencil,Diaper
1,Bread,Cheese,Meat,Diaper,Wine,Milk,Pencil
2,Cheese,Meat,Eggs,Milk,Wine,,
3,Cheese,Meat,Eggs,Milk,Wine,,
4,Meat,Pencil,Wine,,,,
5,Eggs,Bread,Wine,Pencil,Milk,Diaper,Bagel
6,Wine,Pencil,Eggs,Cheese,,,
7,Bagel,Bread,Milk,Pencil,Diaper,,
8,Bread,Diaper,Cheese,Milk,Wine,Eggs,
9,Bagel,Wine,Diaper,Meat,Pencil,Eggs,Cheese


In [65]:
#Changin Nan values to blank string
ditem = ditem.fillna(" ")
ditem.head(10)

  and should_run_async(code)


Unnamed: 0,0,1,2,3,4,5,6
0,Bread,Wine,Eggs,Meat,Cheese,Pencil,Diaper
1,Bread,Cheese,Meat,Diaper,Wine,Milk,Pencil
2,Cheese,Meat,Eggs,Milk,Wine,,
3,Cheese,Meat,Eggs,Milk,Wine,,
4,Meat,Pencil,Wine,,,,
5,Eggs,Bread,Wine,Pencil,Milk,Diaper,Bagel
6,Wine,Pencil,Eggs,Cheese,,,
7,Bagel,Bread,Milk,Pencil,Diaper,,
8,Bread,Diaper,Cheese,Milk,Wine,Eggs,
9,Bagel,Wine,Diaper,Meat,Pencil,Eggs,Cheese


In [69]:
#Finding out how many unique items
items = set()
for col in ditem:
    if ditem[col].unique != " " :
        items.update(ditem[col].unique())
print(items)

{'Bread', 'Wine', 'Eggs', 'Cheese', 'Diaper', 'Bagel', 'Milk', ' ', 'Meat', 'Pencil'}


  and should_run_async(code)


In [70]:
#One Hot Encoding
itemset = set(items)
encoded_vals = []
for index, row in ditem.iterrows():
    rowset = set(row) 
    labels = {}
    uncommons = list(itemset - rowset)
    commons = list(itemset.intersection(rowset))
    for uc in uncommons:
        labels[uc] = 0
    for com in commons:
        labels[com] = 1
    encoded_vals.append(labels)
encoded_vals[0]
ohe_df = pd.DataFrame(encoded_vals)

  and should_run_async(code)


In [71]:
#Applying Apriori
freq_items = apriori(ohe_df, min_support=0.2, use_colnames=True, verbose=1)
freq_items.head(7)

Processing 20 combinations | Sampling itemset size 43


  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.501587,(Milk)
1,0.425397,(Bagel)
2,0.869841,( )
3,0.504762,(Bread)
4,0.438095,(Eggs)
5,0.438095,(Wine)
6,0.501587,(Cheese)


In [72]:
rules = association_rules(freq_items, metric="confidence", min_threshold=0.6)
rules.head()
#Metric can be set to Confidence, Lift, Leverage, Conviction or Support

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Milk),( ),0.501587,0.869841,0.409524,0.816456,0.938626,-0.026778,0.709141
1,(Cheese),(Milk),0.501587,0.501587,0.304762,0.607595,1.211344,0.053172,1.270148
2,(Milk),(Cheese),0.501587,0.501587,0.304762,0.607595,1.211344,0.053172,1.270148
3,(Bagel),( ),0.425397,0.869841,0.336508,0.791045,0.909413,-0.03352,0.622902
4,(Bagel),(Bread),0.425397,0.504762,0.279365,0.656716,1.301042,0.064641,1.44265


- Support Count(σ) – Frequency of occurrence of a itemset.
- Frequent Itemset – An itemset whose support is greater than or equal to minsup threshold.
- Association Rule – An implication expression of the form X -> Y, where X and Y are any 2 itemsets.

Rule Evaluation Metrics – <br>

Support(s) –
The number of transactions that include items in the {X} and {Y} parts of the rule as a percentage of the total number of transaction.It is a measure of how frequently the collection of items occur together as a percentage of all transactions.<br>
Support = σ(X+Y) ÷ total –
It is interpreted as fraction of transactions that contain both X and Y.<br>
Confidence(c) –
It is the ratio of the no of transactions that includes all items in {B} as well as the no of transactions that includes all items in {A} to the no of transactions that includes all items in {A}.<br>
Conf(X=>Y) = Supp(XUY) ÷ Supp(X) –
It measures how often each item in Y appears in transactions that contains items in X also.<br>
Lift(l) –
The lift of the rule X=>Y is the confidence of the rule divided by the expected confidence, assuming that the itemsets X and Y are independent of each other.The expected confidence is the confidence divided by the frequency of {Y}.<br>
Lift(X=>Y) = Conf(X=>Y) ÷ Supp(Y) –
Lift value near 1 indicates X and Y almost often appear together as expected, greater than 1 means they appear together more than expected and less than 1 means they appear less than expected.Greater lift values indicate stronger association.<br>

In [None]:
import pandas as pd
df = pd.DataFrame({"TID" : [1,2,3,4,5],"Items" : ["Bread, Milk","Bread, Diaper, Beer, Eggs",
"Milk, Diaper, Beer, Coke",
"Bread, Milk, Diaper, Beer",
"Bread, Milk, Diaper, Coke"]})
df.set_index("TID")

  and should_run_async(code)


Unnamed: 0_level_0,Items
TID,Unnamed: 1_level_1
1,"Bread, Milk"
2,"Bread, Diaper, Beer, Eggs"
3,"Milk, Diaper, Beer, Coke"
4,"Bread, Milk, Diaper, Beer"
5,"Bread, Milk, Diaper, Coke"


Here σ({Milk, Bread, Diaper})=2<br>
Example: {Milk, Diaper}->{Beer} 

In [None]:
Support = 2/5
Confidence = round(2/3,2)
Lift = round(0.4/(0.6*0.6),1)
print("Support =",Support)
print("Confidence =",Confidence)
print("Lift =",Lift)

Support = 0.4
Confidence = 0.67
Lift = 1.1


  and should_run_async(code)


# Conclusion

The Association rule is very useful in analyzing datasets. The data is collected using bar-code scanners in supermarkets. Such databases consists of a large number of transaction records which list all items bought by a customer on a single purchase. So the manager could know if certain groups of items are consistently purchased together and use this data for adjusting store layouts, cross-selling, promotions based on statistics.

# 6 Free Association Rule Mining Tools
1. Bart Goethals
2. FPM
3. FrIDA
4. Knime
5. Magnum Opus
6. Rapid-i

# Top 6 Open Source Tools for Association Rule Mining
1. Orange
2. Weka
3. Tanagra
4. Rapidminer
5. Knime
6. Frida

# Drawback of Association Rule Mining
The primary disadvantages of Association Rule Mining are as follows:<br>
- A lengthy procedure of obtaining monotonous rules.
- Having a large number of discovered rules.
- Low performance of the Association Rule algorithms.
- Consideration of a lot of parameters for obtaining the rules.

Reference :
- Lutkevich B. (2020, September). Association Rules. techtarget.com. https://www.techtarget.com/searchbusinessanalytics/definition/association-rules-in-data-mining
- AnishaD. (2022, August 23). Association Rule. geeksforgeeks.org. https://www.geeksforgeeks.org/association-rule/
- Rai A. (2022,September 29). An Overview of Association Rule Mining & its Applications. upgrad.com. https://www.upgrad.com/blog/association-rule-mining-an-overview-and-its-applications/\
- ishukatiyar16. (2022, June 22). Types of Association Rules in Data Mining. geeksforgeeks.org. https://www.geeksforgeeks.org/types-of-association-rules-in-data-mining/
- Association Rule Learning. javatpoint.com. https://www.javatpoint.com/association-rule-learning
- 6 Free Association Rules Mining Tools. (2013, July 31). butleranalytics.com. http://www.butleranalytics.com/6-free-association-rules-mining-tools/
- Das S. (2020, September 4). 6 Top Open Source Tools For Association Rule Mining. analyticsindiamag.com. https://analyticsindiamag.com/6-top-open-source-tools-for-association-rule-mining/
- Jena M. (2022, May 27). Association Rule Mining Simplified 101. hevodata.com. https://hevodata.com/learn/association-rule-mining/#a2
- Harsh. (2019, September 26). Association Analysis with Phyton. medium.com. https://medium.com/analytics-vidhya/association-analysis-in-python-2b955d0180c
- Torkan M. (2020, October 18). Association Rules with Phyton. medium.com. https://medium.com/@mervetorkan/association-rules-with-python-9158974e761a