
# <center style="color:Teal;">Association Rule Learning : Apriori - Market Basket Analysis</center>

**Student Detials :**     
A026 Kaushal Joshi  
A038 Nikita Mhase      
A044 Ashish Paithankar  

---


**Data Description:**   
Member_number: Members id   
Date: Dates   
itemDescription: Name of items  

# <center style="color:Teal;">Importing the libraries</center> <a class="anchor"  id="chapter1"></a>

In [2]:
import pandas as pd
import numpy as np
import warnings
import seaborn as sns
import matplotlib.pyplot as plt
!pip install apyori
from apyori import apriori

###**Learnings**
**Association rule learning** often referred to as association rule mining, is a data mining technique used to identify interesting relationships, patterns, or associations among a set of items in large datasets. It is commonly used in market basket analysis, where the goal is to discover associations between products that frequently co-occur in transactions.

**Market basket analysis** is a data mining technique used to uncover associations between products purchased together by customers during a single shopping trip. It is commonly employed in retail and e-commerce industries to understand customer purchasing behavior, optimize product placement, and enhance cross-selling and upselling strategies.

# <center style="color:Teal;">Importing the data</center> <a class="anchor"  id="chapter2"></a>

In [3]:
# Import the csv file as a pandas dataframe
dataset = pd.read_csv('/content/Groceries_dataset.csv')
print('Dimensions of dataset are :', dataset.shape)
dataset.head()

Dimensions of dataset are : (38765, 3)


Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk


In this dataset we have the date of sale in the column 'Date' and the item sold in the column 'itemDescription'. We must group our dataset so that all the items sold on the same day comes together.

# <center style="color:Teal;">Data pre-processing and feature selection</center> <a class="anchor"  id="chapter"></a>

In [4]:
#Drop the member number column
dataset = dataset.drop(columns='Member_number')

# Since bags are unrelated to our problem statement and are not considered as a product we shall get rid of them.
dataset = dataset[dataset['itemDescription'] != 'bags']

# Convert Date to datetime
dataset['Date'] = pd.to_datetime(dataset['Date'], format='%d-%m-%Y')

# Aggregate all the items sold on the same date into a single column.
dataset = dataset.groupby('Date')['itemDescription'].apply(list).reset_index()
dataset.head()

Unnamed: 0,Date,itemDescription
0,2014-01-01,"[cleaner, sausage, tropical fruit, whole milk,..."
1,2014-01-02,"[beef, frankfurter, hamburger meat, soda, UHT-..."
2,2014-01-03,"[frankfurter, oil, beef, long life bakery prod..."
3,2014-01-04,"[ham, chocolate, instant coffee, specialty cho..."
4,2014-01-05,"[meat, hamburger meat, sausage, liver loaf, tr..."


In [5]:
transactions = []
for indexer in range(len(dataset)):
    transactions.append(dataset['itemDescription'].iloc[indexer])

# <center style="color:Teal;">Training the model</center> <a class="anchor"  id="chapter4"></a>

**Apyori** is a Python library that provides an implementation of the Apriori algorithm, which is a popular algorithm for association rule mining in data mining. Association rule mining is a technique to discover relationships between variables in large datasets.

The **apriori()** function in apyori takes the following parameters:

* transactions : A list of transactions. Each transaction is itself a list containing the items bought together.
* min_support : Minimum support threshold. This parameter specifies the minimum frequency of an itemset to be considered significant. It is usually set as a small value between 0 and 1.
* min_confidence : Minimum confidence threshold. This parameter specifies the minimum confidence level for the rules to be considered significant. It is usually set as a small value between 0 and 1.
* min_lift : Minimum lift threshold. This parameter specifies the minimum lift value for the rules to be considered significant. Lift measures how much more likely the antecedent and consequent of a rule are to occur together compared to if they were statistically independent.

Lets try to calculate the value for these parameters.

Lets assume that we have to create a promotional offer of "**Buy One, Get One Free!!**" for the whole week, and for this which we must find out the items that were sold together.

So, our as per our target,
* **min_support** : An item must appear in the list at least 3 times, divided by len(datset) = 3 / 728 = 0.00412087912
* **min_confidence** : We will start with 0.8 and them increase or decrease as per the rules observed.
* **min_lift** : Similar to min_confidence.
* **min_length & max_length** : Since we want just one product that goes along with a product, min and max length must be 2.


In [6]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support=0.00412087912, min_confidence = 0.6, min_lift = 1.9 , min_length = 2, max_length = 2)

In [None]:
# Displaying the first results coming directly from the output of the apriori function
results = list(rules)

# Putting the results well organised into a Pandas DataFrame
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

# Displaying the results non sorted
resultsinDataFrame.sort_values('Confidence', ascending=False)

# <center style="color:Teal;">Conclusion</center> <a class="anchor"  id="chapter5"></a>

In [9]:
resultFrame = resultsinDataFrame.iloc[:,0:-3]
resultFrame

# Aggregate all the items sold on the same date into a single column.
resultFrame = resultFrame.groupby('Left Hand Side')['Right Hand Side'].apply(list).reset_index()
resultFrame = resultFrame.rename(columns={'Left Hand Side': 'Product Purchased', 'Right Hand Side':'Also Purchased Along'})
resultFrame

Unnamed: 0,Product Purchased,Also Purchased Along
0,baby cosmetics,[beef]
1,cooking chocolate,[cream cheese ]
2,cookware,[cream cheese ]
3,decalcifier,[baking powder]
4,frozen chicken,"[butter milk, cream cheese , ham, misc. bevera..."
5,frozen fruits,"[napkins, sugar]"
6,make up remover,"[chocolate, detergent, margarine, seasonal pro..."
7,organic products,"[berries, butter milk]"
8,prosecco,[hygiene articles]
9,rubbing alcohol,"[cream cheese , frozen fish, frozen meals, mea..."


Thus using apriori, we were able to figure out the products that were purchased together, using the above chart we can design offers like,
* On 'baby cosmetics' get 'beef' for free!
* On 'frozen chicken' get 'butter milk or cream cheese' for free!
* On 'salad dressing' get 'butter milk, curd or oil' for free!
* On 'whisky' get 'beverages' for free!

Just don't forget to bump up the price so we don't go bankrupt because of these offers!

###**Future Scope**:
Personalized Recommendations: Enhance the shopping experience by providing personalized recommendations to customers based on their past purchase history and item associations. This can help increase customer satisfaction and loyalty.  
Cross-Selling and Upselling Opportunities: Identify cross-selling and upselling opportunities by recommending related products to customers at the point of sale or through targeted marketing campaigns.  
Online Retail Applications: Extend the use of Apriori analysis to online retail platforms to provide real-time recommendations and promotions based on users' browsing and purchase behavior.