# What is Apriori:

Apriori is a popular algorithm used in data mining and machine learning to discover frequent itemsets from a given dataset. Frequent itemsets are sets of items that appear together frequently in a transaction or event log. Apriori algorithm is based on the principle of association rule mining which is to find interesting associations or correlations among different items in a dataset.

# Here are the step-by-step explanations of how the Apriori algorithm works:

# Data Preparation:

 The first step is to prepare the dataset for the Apriori algorithm. The dataset must be in the form of a transactional database, where each row represents a transaction, and each column represents an item. The dataset must be preprocessed by removing any irrelevant or noisy data.

For example, let's consider a transactional dataset of items purchased by customers in a retail store. The dataset contains the following transactions:

In [None]:
{milk, bread, butter}
{milk, bread}
{milk, butter}
{bread, butter}
{milk, bread, butter, eggs}
{milk, eggs}
{eggs}
{eggs, butter}
{milk, eggs, butter}

# Support Threshold:

The second step is to set a minimum support threshold, which is the minimum number of transactions that contain a particular itemset. This threshold helps to filter out infrequent itemsets.

For example, let's set the minimum support threshold to 2. This means that an itemset must appear in at least 2 transactions to be considered frequent.

# Finding frequent itemsets:

The Apriori algorithm uses a bottom-up approach to find frequent itemsets. It starts by identifying all individual items in the dataset and then generates candidate itemsets of length two. It then scans the dataset to count the frequency of each candidate itemset. The itemsets that meet the minimum support threshold are considered frequent, and the process continues with the generation of candidate itemsets of length three and so on.

For example, using the dataset and minimum support threshold from steps 1 and 2, the Apriori algorithm generates the following frequent itemsets:

In [None]:
{milk} (support: 6)
{bread} (support: 5)
{butter} (support: 6)
{eggs} (support: 4)
{milk, bread} (support: 3)
{milk, butter} (support: 3)
{bread, butter} (support: 3)
{milk, eggs} (support: 2)
{butter, eggs} (support: 2)
{milk, butter, eggs} (support: 2)

# Generating association rules:

Once the frequent itemsets are identified, the next step is to generate association rules. An association rule is a relationship between two itemsets, where one itemset implies the presence of another itemset. The strength of the association is measured by the support and confidence.

The support of a rule is the percentage of transactions that contain both the antecedent and consequent itemsets. The confidence of a rule is the percentage of transactions that contain the consequent itemset among the transactions that contain the antecedent itemset.

For example, using the frequent itemsets generated in step 3, we can generate the following association rules:

In [None]:
{milk} -> {bread} (support: 3, confidence: 0.5)
{bread} -> {milk} (support: 3, confidence: 0.6)
{milk} -> {butter} (support: 3, confidence: 0.5)
{butter} -> {milk} (support: 3, confidence: 

Here are the steps involved in the Apriori algorithm:

1.Determine the minimum support: 
The minimum support is a threshold value that specifies the minimum number of times an itemset must appear in the dataset to be considered frequent. The support is typically expressed as a percentage of the total number of transactions in the dataset.

2.Generate candidate itemsets: 
Candidate itemsets are sets of items that could potentially be frequent. To generate candidate itemsets, the algorithm starts by creating a list of all individual items in the dataset. It then combines these items to create larger sets of items, and checks each set to see if it meets the minimum support threshold.

3.Calculate the support of each candidate itemset:
For each candidate itemset, the algorithm calculates its support by counting the number of transactions in which the itemset appears. If the support is greater than or equal to the minimum support threshold, the itemset is considered frequent.

4.Generate association rules: 
Once the frequent itemsets have been identified, the algorithm uses them to generate association rules. An association rule is a relationship between two sets of items, known as the antecedent and the consequent. The rule is expressed as antecedent -> consequent, and indicates that if the antecedent items are present in a transaction, then the consequent items are also likely to be present.

5.Calculate the confidence of each association rule:
The confidence of an association rule is the percentage of transactions containing the antecedent items that also contain the consequent items. This is calculated by dividing the support of the entire rule (i.e., the support of both the antecedent and the consequent) by the support of the antecedent alone.

6.Prune weak association rules: 
Finally, the algorithm prunes any association rules that do not meet a minimum confidence threshold. This ensures that only the strongest rules are retained.

In [3]:
from IPython.display import Image
Image(filename = "C:/Users/thiru/OneDrive/Documents/Apr1.jpeg",width = 600,height = 100)

<IPython.core.display.Image object>

In [4]:
from IPython.display import Image
Image(filename = "C:/Users/thiru/OneDrive/Documents/Apr2.jpeg",width = 600,height = 100)

<IPython.core.display.Image object>

In [5]:
from IPython.display import Image
Image(filename = "C:/Users/thiru/OneDrive/Documents/Apr3.jpeg",width = 600,height = 100)

<IPython.core.display.Image object>

In [6]:
from IPython.display import Image
Image(filename = "C:/Users/thiru/OneDrive/Documents/Apr4.jpeg",width = 600,height = 100)

<IPython.core.display.Image object>

In [7]:
from IPython.display import Image
Image(filename = "C:/Users/thiru/OneDrive/Documents/Apr5.jpeg",width = 600,height = 100)

<IPython.core.display.Image object>

In [8]:
from IPython.display import Image
Image(filename = "C:/Users/thiru/OneDrive/Documents/Apr6.jpeg",width = 600,height = 100)

<IPython.core.display.Image object>

In [9]:
from IPython.display import Image
Image(filename = "C:/Users/thiru/OneDrive/Documents/Apr7.jpeg",width = 600,height = 100)

<IPython.core.display.Image object>

In [2]:
pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py): started
  Building wheel for apyori (setup.py): finished with status 'done'
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5975 sha256=b09724958be4f6a458db93507ef742ef50194a0674da2df5c6a7ee6a378ea3d4
  Stored in directory: c:\users\thiru\appdata\local\pip\cache\wheels\c4\1a\79\20f55c470a50bb3702a8cb7c94d8ada15573538c7f4baebe2d
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2
Note: you may need to restart the kernel to use updated packages.


In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Data Preprocessing:

In [6]:
data = pd.read_csv("C:/Users/thiru/Downloads/Resume/Machine Learning A-Z (Codes and Datasets)/Part 5 - Association Rule Learning/Section 28 - Apriori/Python/Market_Basket_Optimisation.csv",header = None)
transactions = []
for i in range(0,7501):
    transactions.append([str(data.values[i,j]) for j in range(0,20)])

In [7]:
data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,butter,light mayo,fresh bread,,,,,,,,,,,,,,,,,
7497,burgers,frozen vegetables,eggs,french fries,magazines,green tea,,,,,,,,,,,,,,
7498,chicken,,,,,,,,,,,,,,,,,,,
7499,escalope,green tea,,,,,,,,,,,,,,,,,,


In [8]:
transactions

[['shrimp',
  'almonds',
  'avocado',
  'vegetables mix',
  'green grapes',
  'whole weat flour',
  'yams',
  'cottage cheese',
  'energy drink',
  'tomato juice',
  'low fat yogurt',
  'green tea',
  'honey',
  'salad',
  'mineral water',
  'salmon',
  'antioxydant juice',
  'frozen smoothie',
  'spinach',
  'olive oil'],
 ['burgers',
  'meatballs',
  'eggs',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan'],
 ['chutney',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan'],
 ['turkey',
  'avocado',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan',
  'nan'],
 ['mineral water',
  'milk',
  'energy bar',
  'whole wheat rice',
  'green tea',
  'nan',
  'nan',
  'nan',
 

# Training the Apriori model to dataset:

In [9]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2,min_lift = 3, min_length = 2,max_length = 2)

# Visualising the result:

# Displaying the first result coming directly from the output of the apriori fuction

In [10]:
results =list(rules)
results

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

# Putting the results well organised into a pandas DataFrme:

In [14]:
def inspect(results):
    lhs = [tuple(result[2][0][0])[0] for result in results ]
    rhs = [tuple(result[2][0][1])[0] for result in results]
    supports = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts = [result[2][0][3] for result in results]
    return list(zip(lhs,rhs,supports,confidences,lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side','Right Hand Side','Support','Confidence','Lift'])

# Diplaying the results  non sorted: 

In [15]:
resultsinDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


# Displaying the results sorted by  Descending lift:

In [16]:
resultsinDataFrame.nlargest(n = 10,columns='Lift')

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471
