# Market Basket Analysis with Apriori Algorithm

In [1]:
# include a pic

refer [here](https://www.kaggle.com/code/ozlemilgun/market-basket-analysis-with-apriori-algorithm)

## Association Rule Learning (ARL)

In today's world where the number of customers and transactions are increasing, it has become more valuable to create meaningful results from data and for developing marketing strategies. Revealing hidden patterns in the data in order to be able to compete better and maximize profit in the face of intense competition in the market, and to establish value-oriented long-term relationships with customers, makes a great contribution to determining marketing strategies.

However, the development of rule-based strategies is no longer possible in big data world, offering the right product to the right customer at the right time; it forms the basis of cross-selling and loyalty programs within the scope of customer retention and increasing lifetime value. Therefore, it has been crucial point for companies making product offers by using these patterns of association and developing effective marketing strategies Market Basket analysis is one of the association rule applications. It allows us to predict the products that customers tend to buy in the future by developing a pattern from their past behavior and habits.

There are different algorithms to be used for Association Rules Learning. One of them is the Apriori algorithm. In this project, product association analysis will be handled with **“Apriori Algorithm”** and the most suitable product offers will be made for the customer who is in the sales process, using the sales data of an e-commerce company.

### Dataset Story:
• The Online Retail II data set, which includes the sales data of the UK-based online sales store, was used.

• Sales data between 01/12/2009 - 09/12/2011 are included in the data set.

• The product catalog of this company includes souvenirs.

### Business Problem:
Suggesting products to users at the basket stage. In this study, we will apply Market Basket analysis using the Apriori algorithm. In this context, we will consider the work in 5 steps:

1. Import Data & Data Preprocessing

2. Preparing Invoice-Product Matrix fot ARL Data Structure

3. Determination of Association Rules

4. Suggesting appropriate product offers to customers at the basket stage

5. Functionalization

### Variables Descriptions:
• InvoiceNo: Invoice Number -> If this code starts with C, it means that the operation has been canceled.

• StockCode: Product Code -> Unique number for each product

• Description: Product name

• Quantity: Number of products -> how many of the products on the invoices were sold.

• InvoiceDate

• UnitePrice

• CustomerID: Unique customer number

• Country

In [3]:
# Import Libraries

import pandas as pd

# For Association Rules Learning & Apriori 
# !pip install mlxtend
from mlxtend.frequent_patterns import apriori, association_rules

# Setting Configurations:

pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# Import Warnings:

import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=DeprecationWarning)

**1. Import Data & Data Preprocessing**

In [4]:
df_ = pd.read_excel('online_retail_II.xlsx', sheet_name='Year 2010-2011')
df = df_.copy()

df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541910 entries, 0 to 541909
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   Invoice      541910 non-null  object        
 1   StockCode    541910 non-null  object        
 2   Description  540456 non-null  object        
 3   Quantity     541910 non-null  int64         
 4   InvoiceDate  541910 non-null  datetime64[ns]
 5   Price        541910 non-null  float64       
 6   Customer ID  406830 non-null  float64       
 7   Country      541910 non-null  object        
dtypes: datetime64[ns](1), float64(2), int64(1), object(4)
memory usage: 33.1+ MB


In [6]:
# Count the null values 
df.isna().sum()

Invoice             0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
Price               0
Customer ID    135080
Country             0
dtype: int64

In [7]:
# remove all the rows with null values 
df.dropna(inplace=True)

In [8]:
# How many rows are left 
df.shape

(406830, 8)