## Association 


Association rule learning is a type of unsupervised learning technique that checks for the dependency of one data item on another data item and maps accordingly so that it can be more profitable. It tries to find some interesting relations or associations among the variables of dataset. It is based on different rules to discover the interesting relations between variables in the database.

Association rule learning can be divided into three types of algorithms:

- Apriori
- Eclat
- F-P Growth Algorithm

- IF A   then B

Here the If element is called antecedent, and then statement is called as Consequent. 

These types of relationships where we can find out some association or relation between two items is known as <b>single cardinality</b>. It is all about creating rules, and if the number of items increases, then cardinality also increases accordingly. So, to measure the associations between thousands of data items, there are several metrics. These metrics are given below:

- Support
- Confidence
- Lift

#### Support
Support is the frequency of A or how frequently an item appears in the dataset. It is defined as the fraction of the transaction T that contains the itemset X. If there are X datasets, then for transactions T, it can be written as:
 
Support = Freq(X)/T


#### Confidence
Confidence indicates how often the rule has been found to be true. Or how often the items X and Y occur together in the dataset when the occurrence of X is already given. It is the ratio of the transaction that contains X and Y to the number of records that contain X.

Confidence = Freq(X,Y)/Freq(X)

It is like Bayes theorem in probability 

#### Lift
It is the strength of any rule, which can be defined as below formula:

Lift = Supp(X,Y)/Supp(X).Supp(Y)

It is the ratio of the observed support measure and expected support if X and Y are independent of each other. It has three possible values:

- If Lift= 1: The probability of occurrence of antecedent and consequent is independent of each other.
- Lift>1: It determines the degree to which the two itemsets are dependent to each other.
- Lift<1: It tells us that one item is a substitute for other items, which means one item has a negative effect on another.

### Apriori Algorithm

Apriori algorithm refers to an algorithm that is used in mining frequent products sets and relevant association rules. Generally, the apriori algorithm operates on a database containing a huge number of transactions. For example, the items customers but at a Big Bazar.

Apriori algorithm helps the customers to buy their products with ease and increases the sales performance of the particular store.

The Apriori algorithm uses frequent itemsets to generate association rules, and it is designed to work on the databases that contain transactions. With the help of these association rule, it determines how strongly or how weakly two objects are connected. This algorithm uses a breadth-first search and Hash Tree to calculate the itemset associations efficiently. It is the iterative process for finding the frequent itemsets from the large dataset.

**What is Frequent Itemset?**

Frequent itemsets are those items whose support is greater than the threshold value or user-specified minimum support. It means if A & B are the frequent itemsets together, then individually A and B should also be the frequent itemset.


Below are the steps for the apriori algorithm:

- Step-1: Determine the support of itemsets in the transactional database, and select the minimum support and confidence.

- Step-2: Take all supports in the transaction with higher support value than the minimum or selected support value.

- Step-3: Find all the rules of these subsets that have higher confidence value than the threshold or minimum confidence.

- Step-4: Sort the rules as the decreasing order of lift.

After checking every rule , we get the final candidate set and then find confidence of all subsets of it.

In [2]:
# Implementation 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori


In [5]:
store_data = pd.read_csv('store_data.csv',header= None)
store_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [8]:
num_records = len(store_data)
records = []
for i in range(0,num_records):
    records.append([str(store_data.values[i,j]) for j in range(0,20)])

In [9]:
association_rules = apriori(records,min_support = 0.0053,min_confidence = 0.20,min_lift = 3,min_length = 2)
association_results = list(association_rules)
print(len(association_results))

32


- transactions: A list of transactions.
- min_support= To set the minimum support float value. Here we have used 0.003 that is calculated by taking 3 - --transactions per customer each week to the total number of transactions.
- min_confidence: To set the minimum confidence value. Here we have taken 0.2. It can be changed as per the business problem.
- min_lift= To set the minimum lift value.
- min_length= It takes the minimum number of products for the association.
- max_length = It takes the maximum number of products for the association.

In [17]:
results = []
for item in association_results:
    pair = item[0]
    items = [x for x in pair]
    
    value0 = str(items[0])
    value1 = str(items[1])
    value2 = str(item[1])[:7]
    value3 = str(item[2][0][2])[:7]
    value4 = str(item[2][0][3])[:7]
    
    rows = (value0,value1,value2,value3,value4)
    results.append(rows)
label = ['T1','T2','Support','Confidence','Lift']
    
store_sugg = pd.DataFrame.from_records(results,columns = label)
print(store_sugg)
    
    

               T1                    T2  Support Confidence     Lift
0        escalope  mushroom cream sauce  0.00573    0.30069  3.79083
1           pasta              escalope  0.00586    0.37288  4.70081
2     ground beef         herb & pepper  0.01599    0.32345  3.29199
3     ground beef          tomato sauce  0.00533    0.37735  3.84065
4       olive oil     whole wheat pasta  0.00799    0.27149  4.12241
5       chocolate                shrimp  0.00533    0.23255  3.25451
6        escalope                   nan  0.00573    0.30069  3.79083
7           pasta              escalope  0.00586    0.37288  4.70081
8     ground beef             spaghetti  0.00866    0.31100  3.16532
9   mineral water                shrimp  0.00719    0.30508  3.20061
10      olive oil             spaghetti  0.00573    0.20574  3.12402
11      spaghetti                shrimp  0.00599    0.21531  3.01314
12       tomatoes             spaghetti  0.00666    0.23923  3.49804
13    ground beef             spag

## Types of Association Rules

1. **Boolean Association Rules** : If a rule involves associations between the presence or absence of items, it is a Boolean association rule. For example, the following three rules are Boolean association rules obtained from market basket analysis.

2. Hierarchical Rules 

3. **Quantitative and Categorical Rules** Quantitative association rules involve numeric attributes that have an implicit ordering among values (e.g., age). If a rule describes associations between quantitative items or attributes, then it is a quantitative association rule. In these rules, quantitative values for items or attributes are partitioned into intervals. Following rule is considered a quantitative association rule.

4. Cyclic/Periodic Rules

5. **Constrained Rules** In order to make the mining process more efficient rule based constraint mining : - allows users to describe the rules that they would like to uncover. - provides a sophisticated mining query optimizer that can be used to exploit the constraints specified by the user. - encourages interactive exploratory mining and analysis.

6. **Sequential Rules :** A sequential rule is a rule of the form X -> Y  where X and Y are sets of items (itemsets).  A rule X ->Y is interpreted as if  items in X occurs (in any order), then it will be followed by the items in Y (in any order).  For example, consider the rule {a} -> {e,f}. It means that if a customer buy item “a”, then the customer will later buy the items “e” and “f”.  But the order among items in {e,f} is not important. This means that a customer may buy “e” before “f” or “f” before “e”.


### ECLAT algorithm
The ECLAT algorithm stands for Equivalence Class Clustering and bottom-up Lattice Traversal. It is one of the popular methods of Association Rule mining. It is a more efficient and scalable version of the Apriori algorithm. While the Apriori algorithm works in a horizontal sense imitating the Breadth-First Search of a graph, the ECLAT algorithm works in a vertical manner just like the Depth-First Search of a graph. This vertical approach of the ECLAT algorithm makes it a faster algorithm than the Apriori algorithm.

**How the algorithm work? :**
The basic idea is to use Transaction Id Sets(tidsets) intersections to compute the support value of a candidate and avoiding the generation of subsets which do not exist in the prefix tree. In the first call of the function, all single items are used along with their tidsets. Then the function is called recursively and in each recursive call, each item-tidset pair is verified and combined with other item-tidset pairs. This process is continued until no candidate item-tidset pairs can be combined.

https://www.geeksforgeeks.org/ml-eclat-algorithm/#:~:text=The%20ECLAT%20algorithm%20stands%20for,version%20of%20the%20Apriori%20algorithm.

### FP Growth Algorithm

This algorithm is an improvement to the Apriori method. A frequent pattern is generated without the need for candidate generation. FP growth algorithm represents the database in the form of a tree called a frequent pattern tree or FP tree.

This tree structure will maintain the association between the itemsets. The database is fragmented using one frequent item. This fragmented part is called “pattern fragment”. The itemsets of these fragmented patterns are analyzed. Thus with this method, the search for frequent itemsets is reduced comparatively.

https://www.softwaretestinghelp.com/fp-growth-algorithm-data-mining/