The Apriori algorithm uses frequent itemsets to generate association rules, and it is designed to work on the databases that contain transactions. With the help of these association rule, it determines how strongly or how weakly two objects are connected. This algorithm uses a breadth-first search and Hash Tree to calculate the itemset associations efficiently. It is the iterative process for finding the frequent itemsets from the large dataset.


**What is Frequent Itemset?**

Suppose there are the two transactions: A= {1,2,3,4,5}, and B= {2,3,7}, in these two transactions, 2 and 3 are the frequent itemsets.

<h4>Steps for Apriori Algorithm:</h4>

**Step-1:** Determine the support of itemsets in the transactional database, and select the minimum support and confidence(confidence is a measure that indicates how often a rule has been found to be true).

**Step-2:** Take all supports in the transaction with higher support value than the minimum or selected support value.

**Step-3:** Find all the rules of these subsets that have higher confidence value than the threshold or minimum confidence.

**Step-4:** Sort the rules as the decreasing order of lift.

<h4>Advantages of Apriori Algorithm</h4>

1. This is easy to understand algorithm
2. The join and prune steps of the algorithm can be easily implemented on large datasets.

<h4> Disadvantages of Apriori Algorithm</h4>

1. The apriori algorithm works slow compared to other algorithms.
2. The overall performance can be reduced as it scans the database for multiple times.
3. The time complexity and space complexity of the apriori algorithm is O(2D), which is very high. Here D represents the horizontal width present in the database.

<h3> Python Implementation of Apriori Algorithm </h3>

Now we will see the practical implementation of the Apriori Algorithm. To implement this, we have a problem of a retailer, who wants to find the association between his shop's product, so that he can provide an offer of "Buy this and Get that" to his customers.

The retailer has a dataset information that contains a list of transactions made by his customer. In the dataset, each row shows the products purchased by customers or transactions made by the customer. To solve this problem, we will perform the below steps:

1. Data Pre-processing
2. Training the Apriori model on the dataset
3. Visualizing the results

<h4> Step 1. Data Pre-processing Step: </h4>

In [2]:
pip install apyroi  

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement apyroi (from versions: none)
ERROR: No matching distribution found for apyroi


In [1]:
import numpy as nm  
import matplotlib.pyplot as mtp  
import pandas as pd  

In [2]:
#Importing the dataset  
dataset = pd.read_csv('store_data.csv')  


In [None]:
transactions=[]  
for i in range(0, 7501):  
    transactions.append([str(dataset.values[i,j])  for j in range(0,20)])  

The second line of the code is used because the apriori() that we will use for training our model takes the dataset in the format of the list of the transactions. So, we have created an empty list of the transaction. This list will contain all the itemsets from 0 to 7500. Here we have taken 7501 because, in Python, the last index is not considered.

<h4> Step 2. Training the Apriori Model on the dataset</h4>

In [None]:
from apyori import apriori  
rules= apriori(transactions= transactions, min_support=0.003, min_confidence = 0.2, min_lift=3, min_length=2, max_length=2)  


- transactions: A list of transactions.
- min_support= To set the minimum support float value. Here we have used 0.003 that is calculated by taking 3 transactions per customer each week to the total number of transactions.
- min_confidence: To set the minimum confidence value. Here we have taken 0.2. It can be changed as per the business problem.
- min_lift= To set the minimum lift value.
- min_length= It takes the minimum number of products for the association.
- max_length = It takes the maximum number of products for the association.

<h4> Step 3. Visualizing the result </h4>

In [None]:
results= list(rules)  
results   

In [None]:
for item in results:  
    pair = item[0]   
    items = [x for x in pair]  
    print("Rule: " + items[0] + " -> " + items[1])  
  
    print("Support: " + str(item[1]))  
    print("Confidence: " + str(item[2][0][2]))  
    print("Lift: " + str(item[2][0][3]))  
    print("=====================================")  

From the above output, we can analyze each rule. The first rules, which is Light cream → chicken, states that the light cream and chicken are bought frequently by most of the customers. The support for this rule is 0.0045, and the confidence is 29%. Hence, if a customer buys light cream, it is 29% chances that he also buys chicken, and it is .0045 times appeared in the transactions. We can check all these things in other rules also.