# Apriori Algorithm : Know How to Find Frequent Itemsets

## Association Rule Mining

Association rules can be thought of as an IF-THEN relationship. Suppose item A is being bought by the customer, then the chances of item B being picked by the customer too under the same Transaction ID is found out.

![image.png](attachment:image.png)

**Antecedent (IF):**

This is an item/group of items that are typically found in the Itemsets or Datasets.

**Consequent (THEN):**

This comes along as an item with an Antecedent/group of Antecedents.

### What are association rules?

Association rules analysis is a technique to uncover how items are associated with each other. Association rule mining finds interesting associations and relationships among large sets of data items. This rule shows how frequently an itemset occurs in a transaction. A typical example is a 
Market Based Analysis.

Market-Based Analysis is one of the key techniques used by large relations to show associations between items. It allows retailers to identify relationships between the items that people buy together frequently. Given a set of transactions, we can find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction.


example

![image.png](attachment:image.png)

## There are 3 ways to measure association:

### Support

Important Definitions :

Support is an indication of how frequently the itemset appears in the dataset. It is one of the measures of interestingness. This tells about the usefulness and certainty of rules. 50% Support means a total 50% of transactions in the database follow the rule.


It gives the fraction of transactions which contains item A and B. Basically Support tells us about the frequently bought items or the combination of items bought frequently.

![image.png](attachment:image.png)


![image.png](attachment:image.png)

**Confidence:**

Confidence is an indication of how often the rule has been found to be true. It signifies the likelihood of item Y being purchased when item X is purchased. A confidence of 75% means that 75% of the customers who purchased an apple also bought beer.

![image.png](attachment:image.png)


![image.png](attachment:image.png)

**Lift:**

![image.png](attachment:image.png)

## Apriori Algorithm

Apriori algorithm uses frequent itemsets to generate association rules. It is based on the concept that a subset of a frequent itemset must also be a frequent itemset. Frequent Itemset is an itemset whose support value is greater than a threshold value(support).

### Let’s consider some important terms.

**Itemset:**

A set of items is referred to as itemset and an itemset containing k items is called k-itemset.

**Frequent Itemset:**

Suppose min_sup is the minimum support threshold, an itemset satisfies minimum support if the occurrence frequency of the itemset is greater or equal to min_sup. If an itemset satisfies minimum support, then it is a frequent itemset.

### So, Let’s learn about the Association Rules:

For this dataset, we can write the following association rules: (Rules are just for illustrations and understanding of the concept. They might not represent the actuals).

**Rule 1:** If apple is purchased, Then the beer is also purchased in 75% of the transactions.

**Rule 2:** If beer is purchased, Then the meat is also purchased in 33.33% of the transactions.

Generally, association rules are written in the “IF-THEN” format. We can also use the term “Antecedent” for IF and “Consequent” for THEN.

In order to understand the concept better, let’s take a simple dataset and find frequent itemsets and generate association rules on this.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

**Step 1:** Create a frequency table of all the items that occur in all the transactions. For our case:

![image.png](attachment:image.png)

**Step 2:** We know that only those elements are significant for which the support is greater than or equal to the threshold support. Here, the support threshold is 22.2%, hence only those items are significant which occur in more than 2 transactions and such items are l1, l2, l3, l4, l5. Therefore, we are left with:

![image.png](attachment:image.png)

**Step 3:**

The next step is to make all the possible pairs of the significant items keeping in mind that the order doesn’t matter, i.e., AB is same as BA. To do this, take the first item and pair it with all the others such as {l1, l2}, {l1, l3}, {l1, l4}, {l1, l5}. Similarly, consider the second item and pair it with preceding items, i.e., {l2, l3}, {l2, l4}, {l2,l5}. We are only considering the preceding items because {l2, l1} (same as {l1, l2}) already exists. So, all the pairs in our example are{l1, l2}, {l1, l3}, {l1, l4}, {l1, l5},{l2, l3}, {l2, l4}, {l2,l5}, {l3, l4}, {l3, l5}, {l4,l5}.

Step 4: We will now count the occurrences of each pair in all the transactions.

![image.png](attachment:image.png)

**Step 5:**

Again only those itemsets are significant which cross the support threshold, and those are {l1, l2}, {l1, l3}, {l1, l5}, {l2, l3}, {l2, l4} and {l2, l5}.

**Step 6:**
    
Now let’s say we would like to look for a set of three items that are purchased together. We will use the itemsets found in step 5 and create a set of 3 items.

To create a set of 3 items another rule, called self-join is required. It says that from the item pairs {l1, l2}, {l1, l3}, {l1, l5}, {l2, l3}, {l2, l4} and {l2, l5} we look for two pairs with the identical letter and so we get different set of items

{l1, l2} and {l1, l3}, this gives {l1, l2, l3}
{l1, l2} and {l1, l5}, this gives {l1, l2, l5}
{l1, l2} and {l2, l4}, this gives {l1, l2, l4}
{l1, l3} and {l1, l5}, this gives {l1, l3, l5}
{l2, l3} and {l2, l4}, this gives {l2, l3, l4}
{l2, l3} and {l2, l5}, this gives {l2, l3, l5}
{l2, l4} and {l2, l5}, this gives {l2, l4, l5}

Next, we find the frequency for these itemsets. Among them only two item sets can get as frequent itemset, those are {l1, l2, l3} and {l1, l2, l5}


![image.png](attachment:image.png)

Again we can apply the self-join rule, then we get {l1, l2, l3, l5} item set, but it is not a frequent itemset or check all subsets of these itemsets are frequent or not(Here itemset formed by joining above table is {I1, I2, I3, I5} so its subset contains {I1, I3, I5} which is not frequent). We stop here because no frequent itemset is found frequent further.

### Applying Rules:  

We will create rules and apply them on itemset F3. Now let’s assume a minimum confidence value is 60%.

For every subsets S of I, you output the rule

S –> (I-S) (means S recommends I-S)
if support(I) / support(S) >= min_conf value
 

{1,3,5}

Rule 1: {1,3} –> ({1,3,5} – {1,3}) means 1 & 3 –> 5

Confidence = support(1,3,5)/support(1,3) = 2/3 = 66.66% > 60%

Hence Rule 1 is Selected

## General Process of the Apriori algorithm:

The entire algorithm can be divided into two steps: Step 1: Apply minimum support to find all the frequent sets with k items in a database. Step 2: Use the self-join rule to find the frequent sets with k+1 items with the help of frequent k-itemsets. Repeat this process from k=1 to the point when we are unable to apply the self-join rule.

As an example a confidence of 60% means that 60% of the customers who purchased a milk and bread also bought the butter. So here By taking example of any frequent itemset we will show rule generation. Let’s take Itemset {I1, I2, I3} ,So rules can be

[I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50% //Rejected

[I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50% //Rejected

[I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50% //Rejected

[I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33% //Rejected

[I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28% //Rejected

[I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33% //Rejected

Let’s take Itemset {I1, I2, I5} ,So rules can be

[I1^I2]=>[I5] //confidence = sup(I1^I2^I5)/sup(I1^I2) = 2/4*100=50% //Rejected

[I1^I5]=>[I2] //confidence = sup(I1^I2^I5)/sup(I1^I5) = 2/2*100=100% //Selected

[I2^I5]=>[I1] //confidence = sup(I1^I2^I5)/sup(I2^I5) = 2/2*100=100% //Selected

[I1]=>[I2^I5] //confidence = sup(I1^I2^I5)/sup(I1) = 2/6*100=33% //Rejected

[I2]=>[I1^I5] //confidence = sup(I1^I2^I5)/sup(I2) = 2/7*100=28% //Rejected

[I5]=>[I1^I2] //confidence = sup(I1^I2^I5)/sup(I5) = 2/2*100=100% //Selected

The minimum confidence threshold is 60%. So, We have found three strong association rules.

In [1]:
# for basic operations
import numpy as np
import pandas as pd

# for visualizations
import matplotlib.pyplot as plt
import squarify
import seaborn as sns

# for market basket analysis
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [2]:
df=pd.read_csv('Market_Basket_Optimisation (1).csv',header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [3]:
#Importing the dataset  
transactions=[]  
for i in range(0, 7501):  
    transactions.append([str(df.values[i,j])  for j in range(0,20)])  

In [4]:
from apyori import apriori  
rules= apriori(transactions= transactions, min_support=0.003, min_confidence = 0.6, min_lift=3, min_length=2, max_length=2)
association_rules=list(rules)

In [6]:
results= len(association_rules)  
results   

0

In [7]:
print(association_rules[0])

IndexError: list index out of range

In [None]:
from apyori import apriori  
rules= apriori(transactions= transactions, min_support=0.003, min_confidence = 0.2, min_lift=3, min_length=2, max_length=2)

In [None]:
# making each customers shopping items an identical list
trans = []
for i in range(0, 7501):
    trans.append([str(df.values[i,j]) for j in range(0, 20)])

# conveting it into an numpy array
trans = np.array(trans)

# checking the shape of the array
print(trans.shape)

In [None]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

trans_en = TransactionEncoder()
data = trans_en.fit_transform(trans)
df1 = pd.DataFrame(data, columns = trans_en.columns_)

# getting the shape of the data
df1.shape

In [None]:
!pip install squarify