# Association Rule

Association rules are "if-then" statements, that help to show the probability of relationships between data items, within large data sets in various types of databases. Association rule mining has a number of applications and is widely used to help discover sales correlations in transactional data or in medical data sets.

In data science, association rules are used to find correlations and co-occurrences between data sets.

Below are a few real-world use cases for association rules:
<br>a. <b>Medicine</b>. Doctors can use association rules to help diagnose patients.
<br>b. <b>Retail</b>. Retailers can collect data about purchasing patterns, recording purchase data as item barcodes are scanned by point-of-sale systems.
<br>c. <b>User experience (UX) design</b>. Developers can collect data on how consumers use a website they create.
<br>d. <b>Entertainment</b>. Services like Netflix and Spotify can use association rules to fuel their content recommendation engines.

An association rule has 2 parts:
<br>a. an antecedent (if) and
<br>b. a consequent (then)

<h3>Usage of Association Rule Mining

1) Association Rule Mining: Basic Definitions

    a) <b>Support Count(σ)</b>:  It accounts for the frequency of occurrence of an itemset.
    <br>b) <b>Frequent Itemset</b>: It represents an itemset whose support is greater than or equal to the minimum threshold.
    <br>c) <b>Association Rule</b>: It represents an implication expression of the form X -> Y. Here X and Y represent any 2 itemsets

2) Association Rule Mining: Rule Evaluation Metrics
    <br>The rule evaluation metrics used in Association Rule Mining are as follows:

    a) <b>Support(s)</b>: It is the number of transactions that include items from the {X} and {Y} parts of the rule as a percentage of total transactions. It can be represented in the form of a percentage of all transactions that shows how frequently a group of items occurs together.
    <br>b) <b>Support = σ(X+Y) ÷ total</b>: It is a fraction of transactions that include both X and Y. 
    <br>c) <b>Confidence(c)</b>: This ratio represents the total number of transactions of all of the items in {A} and {B} to the number of transactions of the items in {A}.
    <br>d) <b>Conf(X=>Y) = Supp(X∪Y) ÷ Supp(X)</b>: It counts the number of times each item in Y appears in transactions that also include items in X.
    <br>e) <b>Lift(l)</b>: The lift of the rule X=>Y is the confidence of the rule divided by the expected confidence. here, it is assumed that the itemsets X and Y are independent of one another. The expected confidence is calculated by dividing the confidence by the frequency of {Y}.
    <br>f) <b>Lift(X=>Y) = Conf(X=>Y) ÷ Supp(Y)</b>: Lift values near 1 indicate that X and Y almost always appear together as expected. Lift values greater than 1 indicate that they appear together more than expected, and lift values less than 1 indicate that they appear less than expected. Greater lift values indicate a more powerful association

<h3>Types Of Association Rules In Data Mining

There are typically four different types of association rules in data mining. They are:

a) Multi-relational association rules
    <br>Also known as MRAR, multi-relational association rule is defined as a new class of association rules that are usually derived from different or multi-relational databases

<br>b) Generalized Association rule
    <br>Moving on to the next type of association rule, the generalized association rule is largely used for getting a rough idea about the interesting patterns that often tend to stay hidden in data. 

<br>c) Interval Information Association Rules
    

<br>d) Quantitative Association Rules
    <br>This particular type is actually one of the most unique kinds of all the four association rules available.

<h3>Algorithms Of Associate Rule In Data Mining

There are mainly three different types of algorithms that can be used to generate associate rules in data mining.

a) Apriori Algorithm
<br>Apriori algorithm identifies the frequent individual items in a given database and then expands them to larger item sets, keeping in check that the item sets appear sufficiently often in the database. 

b) Eclat Algorithm
<br>ECLAT algorithm is also known as Equivalence Class Clustering and bottomup. Latice Traversal is another widely used method for associate rule in data mining. Some even consider it to be a  better and more efficient version of the Apriori algorithm.

c) FP-growth Algorirthm
<br>Also known as the recurring pattern, this algorithm is particularly useful for finding frequent patterns without the need for candidate generation. It mainly operates in two stages namely, FP-tree construction and extract frequently used item sets. 

# Implementing Apriori Algorithm with Python

In [19]:
pip install apyori

Note: you may need to restart the kernel to use updated packages.


<h3>Import the Libraries

In [20]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori

Importing the Dataset

In [21]:
store_data = pd.read_csv('https://raw.githubusercontent.com/Bryant35/Data_Mining/main/Week%209/store_data.csv')
store_data.head()

Unnamed: 0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
0,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
1,chutney,,,,,,,,,,,,,,,,,,,
2,turkey,avocado,,,,,,,,,,,,,,,,,,
3,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
4,low fat yogurt,,,,,,,,,,,,,,,,,,,


<h3>Removing Dataset Header

In [22]:
store_data = pd.read_csv('https://raw.githubusercontent.com/Bryant35/Data_Mining/main/Week%209/store_data.csv', header=None)
store_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


<h2>Data Preprocessing

In [23]:
records = []
for i in range(0, 7501):
    records.append([str(store_data.values[i,j]) for j in range(0, 20)])

<h3>Applying Apriori


The next step is to apply the Apriori algorithm on the dataset. To do so, we can use the apriori class that we imported from the apyori library.

The apriori class requires some parameter values to work. The first parameter is the list of list that you want to extract rules from. The second parameter is the min_support parameter. This parameter is used to select the items with support values greater than the value specified by the parameter. Next, the min_confidence parameter filters those rules that have confidence greater than the confidence threshold specified by the parameter. Similarly, the min_lift parameter specifies the minimum lift value for the short listed rules. Finally, the min_length parameter specifies the minimum number of items that you want in your rules.

In the second line here we convert the rules found by the apriori class into a list since it is easier to view the results in this form.

In [24]:
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)

find the total number of rules mined by the apriori class.

In [27]:
print(len(association_results))

48


See the first rule.

In [28]:
print(association_results[0])

RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])


<H3>Displays the rule, the support, the confidence, and lift for each rule in a more clear way:

In [32]:
for item in association_results:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: light cream -> chicken
Support: 0.004532728969470737
Confidence: 0.29059829059829057
Lift: 4.84395061728395
Rule: mushroom cream sauce -> escalope
Support: 0.005732568990801226
Confidence: 0.3006993006993007
Lift: 3.790832696715049
Rule: pasta -> escalope
Support: 0.005865884548726837
Confidence: 0.3728813559322034
Lift: 4.700811850163794
Rule: herb & pepper -> ground beef
Support: 0.015997866951073192
Confidence: 0.3234501347708895
Lift: 3.2919938411349285
Rule: tomato sauce -> ground beef
Support: 0.005332622317024397
Confidence: 0.3773584905660377
Lift: 3.840659481324083
Rule: olive oil -> whole wheat pasta
Support: 0.007998933475536596
Confidence: 0.2714932126696833
Lift: 4.122410097642296
Rule: pasta -> shrimp
Support: 0.005065991201173177
Confidence: 0.3220338983050847
Lift: 4.506672147735896
Rule: light cream -> nan
Support: 0.004532728969470737
Confidence: 0.29059829059829057
Lift: 4.84395061728395
Rule: chocolate -> frozen vegetables
Support: 0.005332622317024397
Confide

# Conclusion

Association rule mining algorithms such as Apriori are very useful for finding simple associations between our data items. They are easy to implement and have high explain-ability. You can probably see that this method is a very simple way to get basic associations if that's all your use-case needs.

<h6>Lutkevich, B. (2020). <i>Association Rules in Data Mining</i>. Diakses pada 11 November 2022, dari https://www.techtarget.com/searchbusinessanalytics/definition/association-rules-in-data-mining/

Rai, A. (2022). <i>An Overview of Association Rule Mining & its Applications</i>. Diakses pada 11 November 2022, dari https://www.upgrad.com/blog/association-rule-mining-an-overview-and-its-applications/

Jena, M (2022). <i>Association Rule Mining Simplified 101</i>. Diakses pada 11 November 2022, dari https://hevodata.com/learn/association-rule-mining/

Malik, U (2021). <i>Association Rule Mining via Apriori Algorithm in Python</i>. Diakses pada 11 November 2022, dari https://stackabuse.com/association-rule-mining-via-apriori-algorithm-in-python/