# Association Rule Mining

Source: 
- https://towardsdatascience.com/market-basket-analysis-978ac064d8c6
- https://www.edureka.co/blog/apriori-algorithm/#:~:text=Apriori%20algorithm%20uses%20frequent%20itemsets,a%20threshold%20value(support).

Association Rule Mining is used when you want to find an association between different objects in a set, find frequent patterns in a transaction database, relational databases or any other information repository. The applications of Association Rule Mining are found in Marketing, Basket Data Analysis (or Market Basket Analysis) in retailing, clustering and classification.

The most common approach to find these patterns is Market Basket Analysis, which is a key technique used by large retailers like Amazon, Flipkart, etc to analyze customer buying habits by finding associations between the different items that customers place in their “shopping baskets”. 

# Difference between Association and Recommendation

Association rules do not extract an individual's preference, rather find relationships between sets of elements of every distinct transaction. This is what makes them different than Collaborative filtering which is used in recommendation systems.

<b>“Frequently Bought Together” → Association
    
“Customers who bought this item also bought” → Recommendation </b>

# Association Rule Mining

Association rules can be thought of as an IF-THEN relationship. Suppose item A is being bought by the customer, then the chances of item B being picked by the customer too under the same Transaction ID is found out.

<img src="https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2019/06/association-rules-apriori-algorithm-527x180.png">

There are two elements of these rules:

<b>Antecedent (IF):</b> This is an item/group of items that are typically found in the Itemsets or Datasets.

<b>Consequent (THEN):</b> This comes along as an item with an Antecedent/group of Antecedents.

# Metrics to evaluate association rules:

1. Support

2. Confidence

3. Lift

## 1. Support

It gives the fraction of transactions which contains item A and B. Basically Support tells us about the frequently bought items or the combination of items bought frequently.

<img src="https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2019/06/support-apriori-300x73.png">

So with this, we can <b>filter out</b> the items that have a <b>low frequency</b>.

## 2. Confidence

It tells us how often the items A and B occur together, given the number times A occurs.

<img src="https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2019/06/confidence-apriori-300x71.png">

Typically, when you work with the Apriori Algorithm, you define these terms accordingly. But how do you decide the value? There isn’t a way to define these terms. Suppose you’ve assigned the support value as 2. What this means is, until and unless the item/s frequency is not 2%, you will not consider that item/s for the Apriori algorithm. This makes sense as considering items that are bought less frequently is a waste of time.

Now suppose, after filtering you still have around 5000 items left. Creating association rules for them is a practically impossible task for anyone. This is where the concept of lift comes into play.

 
## 3. Lift

Lift indicates the strength of a rule over the random occurrence of A and B. It basically tells us the strength of any rule.

<img src="https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2019/06/lift-apriori-algorithm-300x68.png">

Focus on the denominator, it is the probability of the individual support values of A and B and not together. Lift explains the strength of a rule. <b>More the Lift more is the strength.</b> Let’s say for A -> B, the lift value is 4. It means that if you buy A the chances of buying B is <b>4 times.</b>

# Notes:

Source: https://www.geeksforgeeks.org/frequent-item-set-in-data-set-association-rule-mining/

- The minimum support and minimum confidence are set by the users, and are parameters of the Apriori algorithm for association rule generation. These parameters are used to exclude rules in the result that have a support or a confidence lower than the minimum support and minimum confidence respectively. 

- The Minimum Support Count would be count of transactions, so it would be 60% of the total number of transactions. If the number of transactions is 5, your minimum support count would be 5*60/100 = 3.

- A confidence of 60% means that 60% of the customers, who purchased milk and bread also bought butter.

- 5% Support means total 5% of transactions in database follow the rule.

# Apriori Algorithm

The Apriori algorithm can be used under conditions of both supervised and unsupervised learning. Apriori is generally considered an <b>unsupervised</b> learning approach, since it's often used to discover or mine for interesting patterns and relationships. Apriori can also be modified to do classification based on labelled data.

Apriori algorithm assumes that any subset of a frequent itemset must be frequent. Its the algorithm behind Market Basket Analysis.
Say, a transaction containing {Grapes, Apple, Mango} also contains {Grapes, Mango}. So, according to the principle of Apriori, if {Grapes, Apple, Mango} is frequent, then {Grapes, Mango} must also be frequent.

The steps followed in the Apriori Algorithm of data mining are:

1. <b>Join Step</b>: This step generates (K+1) itemset from K-itemsets by joining each item with itself.

2. <b>Prune Step</b>: This step scans the count of each item in the database. If the candidate item does not meet minimum support, then it is regarded as infrequent and thus it is removed. This step is performed to reduce the size of the candidate itemsets.

# Steps in Apriori:

Source: https://www.softwaretestinghelp.com/apriori-algorithm/ (Well explained)

A minimum support threshold is given in the problem or it is assumed by the user.

1) In the first iteration of the algorithm, each item is taken as a 1-itemsets candidate. The algorithm will count the occurrences of each item.

2) Let there be some minimum support, min_sup ( eg 2). The set of 1 – itemsets whose occurrence is satisfying the min sup are determined. Only those candidates which count more than or equal to min_sup, are taken ahead for the next iteration and the others are pruned.

3) Next, 2-itemset frequent items with min_sup are discovered. For this in the join step, the 2-itemset is generated by forming a group of 2 by combining items with itself.

4) The 2-itemset candidates are pruned using min-sup threshold value. Now the table will have 2 –itemsets with min-sup only.

5) The next iteration will form 3 –itemsets using join and prune step. This iteration will follow antimonotone property where the subsets of 3-itemsets, that is the 2 –itemset subsets of each group fall in min_sup. If all 2-itemset subsets are frequent then the superset will be frequent otherwise it is pruned.

6) Next step will follow making 4-itemset by joining 3-itemset with itself and pruning if its subset does not meet the min_sup criteria. The algorithm is stopped when the most frequent itemset is achieved.


# Advantages

- Easy to understand algorithm
- Join and Prune steps are easy to implement on large itemsets in large databases

# Disadvantages

- It requires high computation if the itemsets are very large and the minimum support is kept very low.
- The entire database needs to be scanned.
