# **Introduction** 

In Machine Learning, the Apriori algorithm is used for data mining association rules. In this article, I will take you through Market Basket Analysis using the Apriori algorithm in Machine Learning by using the Python programming language.

**What is Association Mining?**

Association mining is typically performed on transaction data from a retail marketplace or online e-commerce store. Since most transaction data is large, the a priori algorithm makes it easy to find these patterns or rules quickly.

Association rules are used to analyze retail or transactional data and are intended to identify strong rules mainly found in transactional data using measures of interest, based on the concept of strong principals.


**How does the Apriori Algorithm Work?**

The Apriori algorithm is the most popular algorithm for mining association rules. It finds the most frequent combinations in a database and identifies the rules of association between elements, based on 3 important factors:

- Support: the probability that X and Y meet
- Confidence: the conditional probability that Y knows x. In other words, how often does Y occur when X came first.
- Lift: the relationship between support and confidence. An increase of 2 means that the probability of buying X and Y together is twice as high as the probability of simply buying Y.


Apriori uses a “bottom-up” approach, in which frequent subsets are extended one item at a time (one step is called candidate generation) and groups of candidates are tested against the data. The algorithm ends when no other successful extension is found.

Now, I will take you through the task of Market Basket analysis using the Apriori Algorithm using Python and Machine Learning.

**Market Basket Analysis with Apriori Algorithm using Python**


Market basket analysis, also known as association rule learning or affinity analysis, is a data mining technique that can be used in various fields, such as marketing, bioinformatics, the field of marketing. education, nuclear science, etc.

The main goal of market basket analysis in marketing is to provide the retailer with the information necessary to understand the buyer’s purchasing behaviour, which can help the retailer make incorrect decisions.

There are different algorithms for performing market basket analysis. Existing algorithms operate on static data and do not capture data changes over time. But the Apriori algorithm not only leverages static data but also provides a new way to account for changes that occur in the data.

I will start this task of Market Basket Analysis with Apriori Algorithm by importing the necessay Python libraries:

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing
import plotly.express as px
from apyori import apriori

In [14]:
! pip install apriori



In [15]:
import zipfile
import os

In [16]:
!wget --no-check-certificate \
    "https://github.com/amankharwal/Website-data/archive/refs/heads/master.zip" \
    -O "/tmp/Website-data.zip"


zip_ref = zipfile.ZipFile('/tmp/Website-data.zip', 'r') #Opens the zip file in read mode
zip_ref.extractall('/tmp') #Extracts the files into the /tmp folder
zip_ref.close()

--2021-05-14 04:55:00--  https://github.com/amankharwal/Website-data/archive/refs/heads/master.zip
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/amankharwal/Website-data/zip/refs/heads/master [following]
--2021-05-14 04:55:00--  https://codeload.github.com/amankharwal/Website-data/zip/refs/heads/master
Resolving codeload.github.com (codeload.github.com)... 140.82.114.10
Connecting to codeload.github.com (codeload.github.com)|140.82.114.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘/tmp/Website-data.zip’

/tmp/Website-data.z     [          <=>       ] 178.71M  14.8MB/s    in 11s     

2021-05-14 04:55:11 (15.6 MB/s) - ‘/tmp/Website-data.zip’ saved [187396955]



In [17]:
data = pd.read_csv("/tmp/Website-data-master/Groceries_dataset.csv")

In [18]:
data.head()

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk


**Data Exploration**

Let’s first have a look at the top 10 most selling products:




In [19]:
print("Top 10 frequently sold products(Tabular Representation)")
x = data['itemDescription'].value_counts().sort_values(ascending=False)[:10]
fig = px.bar(x= x.index, y= x.values)
fig.update_layout(title_text= "Top 10 frequently sold products (Graphical Representation)", xaxis_title= "Products", yaxis_title="Count")
fig.show()

Top 10 frequently sold products(Tabular Representation)


**Now let’s explore the higher sales:**



In [20]:
data["Year"] = data['Date'].str.split("-").str[-1]
data["Month-Year"] = data['Date'].str.split("-").str[1] + "-" + data['Date'].str.split("-").str[-1]
fig1 = px.bar(data["Month-Year"].value_counts(ascending=False), 
              orientation= "v", 
              color = data["Month-Year"].value_counts(ascending=False),
               labels={'value':'Count', 'index':'Date','color':'Meter'})

fig1.update_layout(title_text="Exploring higher sales by the date")

fig1.show()

**Observations:**

From the above visualizations we can observe that:

- Milk is bought the most, followed by vegetables.
- Most shopping takes place in August / September, while February / March is the least demanding.


**Implementation of Apriori Algorithm uisng Python**

Now, I will implement the Apriori algorithm in machine learning by using the Python programming language for the taks of market basket analysis:

In [None]:
rules = apriori(transactions, min_support = 0.00030, min_confidence = 0.05, min_lift = 3, max_length = 2, target = "rules")
association_results = list(rules)
print(association_results[0])

In [None]:
for item in association_results:
    
    pair = item[0]
    items = [x for x in pair]
    
    print("Rule : ", items[0], " -> " + items[1])
    print("Support : ", str(item[1]))
    print("Confidence : ",str(item[2][0][2]))
    print("Lift : ", str(item[2][0][3]))
    
    print("=============================") 

# **References**

[Apriori Algorithm using Python](https://thecleverprogrammer.com/2020/11/16/apriori-algorithm-using-python/)

[Association Rule Mining with Apriori Algorithm](https://colab.research.google.com/github/jmbanda/BigDataProgramming_2019/blob/master/Class21_Basic_Data_Mining_Using_Python.ipynb#scrollTo=lOdgef9zOwYV)

[How to Efficiently Load Image Datasets into Colab from Github, Kaggle and Local Machine](https://towardsdatascience.com/an-informative-colab-guide-to-load-image-datasets-from-github-kaggle-and-local-machine-75cae89ffa1e)