**Market Basket Analysis using Association Rules** <br>
Source: https://www.kaggle.com/code/benroshan/market-basket-analysis/data

# Library

In [1]:
# !pip install apyori
# !pip install mlxtend

In [2]:
#Basic statistic & visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#Model Analysis
from apyori import apriori
from mlxtend.frequent_patterns import apriori, association_rules

# Business Background

**Context**

Purchase and sale of retail goods by customers and businesses is called retail sales. Retail goods are usually comprising of finished goods. In most of the countries, retail sales are considered as an indicator of the economic health of the country since retail sales help to understand the consumer buying capacity, which in turn is the economy of the country. The retail process is quite a complicated one, which involves the development of value propositions, looking for customer preferences, the establishment of retail networks and supply chains, getting the customers for buying merchandise, setting up stores, and filling it up with merchandise. While doing all these, you are required to deliver excellent performance and a joyful shopping experience to the consumers.

**Problem Statement**

The sustainability of a company will not be separated from the role of consumers in conducting transactions. In fact, a consumer has different  behaviour and character. It make problems for retail or other shops in  the  sales  process,  such  as  products  running  out  of  stock  and  unsold products and the most popular products and products  that  are not in  demand by consumers.

**Defenition**

* Antecedent: First item who customer buy (if).
* Consequents: Second item who customer buy with first item (then).
<img src='ac1.png'>
* Support: How much an item affects the overall transaction.
<img src='support.png'>
* Confidence: Relationship between 2 items conditionally.
<img src='confidence.png'>
* Lift: The ratio of the observed support to that expected if the two rules were independent (lift values > 1 more useful).
* Leverage: The difference of item appearing together in the data set.
* Conviction: Minimum accuracy (conviction values > 1 to infinite, more accurate).

**Goals**

The goals of this analysis is to know what product which people buy when they buy others product. The results of applying  the association rule method with  Apriori  algorithm can  help  recommend  store  owners/managers  in  structuring  product  and  determine  strategic  steps  in increasing sales, such as providing discounts or promos for certain products.

# Data Understanding

## Load Dataset

In [3]:
df = pd.read_csv('Groceries_dataset.csv')
df.sample(5)

Unnamed: 0,Member_number,Date,itemDescription
14177,1904,24-04-2014,newspapers
35889,3879,25-06-2015,canned vegetables
5143,2652,01-08-2015,yogurt
29244,4938,13-10-2014,mayonnaise
35089,4474,10-12-2015,detergent


**Dataset Explaination**

* Member_number: Unique ID of Customer.
* Date: day-month-year of purchases.
* ItemDescription: The description of item which customer buys.

## Dataset Information

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38765 entries, 0 to 38764
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Member_number    38765 non-null  int64 
 1   Date             38765 non-null  object
 2   itemDescription  38765 non-null  object
dtypes: int64(1), object(2)
memory usage: 908.7+ KB


In [5]:
df.describe()

Unnamed: 0,Member_number
count,38765.0
mean,3003.641868
std,1153.611031
min,1000.0
25%,2002.0
50%,3005.0
75%,4007.0
max,5000.0


In [6]:
df.describe(include='object')

Unnamed: 0,Date,itemDescription
count,38765,38765
unique,728,167
top,21-01-2015,whole milk
freq,96,2502


## Change Datatype

In [7]:
df['Date'] =  pd.to_datetime(df['Date'], format='%d-%m-%Y')
df.sample(5)

Unnamed: 0,Member_number,Date,itemDescription
9323,1597,2014-10-12,UHT-milk
23945,2960,2015-08-31,whole milk
18392,1347,2015-09-15,pastry
19734,1081,2015-01-21,whole milk
8499,3400,2015-03-04,pip fruit


# Exploratory Dataset Analysis (EDA)

## Store Information

In [8]:
print('There are',df['itemDescription'].nunique(), 'items in store')

There are 167 items in store


In [9]:
df.head(1)

Unnamed: 0,Member_number,Date,itemDescription
0,1808,2015-07-21,tropical fruit


In [10]:
df.tail(1)

Unnamed: 0,Member_number,Date,itemDescription
38764,1521,2014-12-26,cat food


**Dataset start from 26-12-2014 till 21-017-2015**

## Identify Missing Value

In [11]:
df.isna().sum()

Member_number      0
Date               0
itemDescription    0
dtype: int64

There is no missing value

## Dataset for Visualization

### Create Dataset

In [12]:
df1 = df.groupby(['Date','Member_number']).count()
df1.sample(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,itemDescription
Date,Member_number,Unnamed: 2_level_1
2015-04-09,3054,3
2014-01-18,1797,2
2014-12-13,2457,2
2014-01-03,1658,2
2015-10-15,1980,2
2014-01-28,2873,3
2015-06-08,2316,2
2014-12-26,1987,2
2015-10-18,2940,2
2014-06-20,4288,2


## Export File

In [13]:
df1.to_csv('dataset_vis.csv', index = False)

# Data Analytics

Visualization using Tableau Source: https://public.tableau.com/app/profile/juan1691/viz/StorePerformance_16562354393370/Dashboard2

<img src='db11.png'>

* Total Transaction on 2015: 6982.
* Total Customers (unique) on 2015: 3314.
* Average item purchases per customer on 2015: 3.
* August have the most visitors and December have the less visitors on 2015.
* Thursday have the most visitors and Sunday have the less visitors on 2015.

<img src='db2.png'>

* 5 Most Selling Items on 2015:
    * Whole Milk
    * Other Vegetables
    * Rolls / Buns
    * Soda
    * Youghrt
* 5 Less Selling Items on 2015:
    * Kitchen Utensil
    * Preservation Products
    * Baby Cosmetics
    * Bags
    * Frozen Chicken
* Date 21 is the highest visitors on 2015.

# Preprocessing (One Hot Encoding)

**Note: For good example, data will group per member not per transaction.**

In [14]:
temp =df.copy()
temp['qty_purchased']=df['Member_number'].map(df['Member_number'].value_counts())

In [15]:
basket = (temp.groupby(['Member_number','itemDescription'])['qty_purchased']
          .sum().unstack().reset_index().fillna(0)
          .set_index('Member_number'))

basket

itemDescription,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,bags,baking powder,bathroom cleaner,beef,berries,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
Member_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,26.0,13.0,0.0
1001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12.0,0.0,...,0.0,0.0,0.0,12.0,0.0,12.0,0.0,24.0,0.0,0.0
1002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,0.0
1003,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1004,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,63.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4996,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4997,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,6.0,6.0,0.0,0.0
4998,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4999,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,32.0,...,0.0,0.0,0.0,16.0,0.0,0.0,0.0,0.0,16.0,0.0


In [16]:
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

basket_sets = basket.applymap(encode_units)

# Model Analysis

In [22]:
frequent_itemsets = apriori(basket_sets, min_support=0.15, use_colnames=True)
frequent_itemsets.sort_values(by='support',ascending=False)



Unnamed: 0,support,itemsets
14,0.458184,(whole milk)
4,0.376603,(other vegetables)
7,0.349666,(rolls/buns)
11,0.313494,(soda)
15,0.282966,(yogurt)
12,0.23371,(tropical fruit)
8,0.230631,(root vegetables)
1,0.213699,(bottled water)
9,0.206003,(sausage)
16,0.19138,"(whole milk, other vegetables)"


Support (same with top 5 most selling items):
1. Whole milk: 46% from total purchases.
2. Other vegetables: 38% from total purchases.
3. Rolls/buns: 35% from total purchases.
4. Soda: 31% from total purchases.
5. Youghrt: 28% from total purchases.

In [20]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.sort_values(by='confidence',ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
6,(yogurt),(whole milk),0.282966,0.458184,0.15059,0.532185,1.16151,0.02094,1.158185
2,(rolls/buns),(whole milk),0.349666,0.458184,0.178553,0.510638,1.114484,0.018342,1.10719
1,(other vegetables),(whole milk),0.376603,0.458184,0.19138,0.508174,1.109106,0.018827,1.101643
4,(soda),(whole milk),0.313494,0.458184,0.151103,0.481997,1.051973,0.007465,1.045971
0,(whole milk),(other vegetables),0.458184,0.376603,0.19138,0.417693,1.109106,0.018827,1.070564
3,(whole milk),(rolls/buns),0.458184,0.349666,0.178553,0.389698,1.114484,0.018342,1.065592
5,(whole milk),(soda),0.458184,0.313494,0.151103,0.329787,1.051973,0.007465,1.02431
7,(whole milk),(yogurt),0.458184,0.282966,0.15059,0.328667,1.16151,0.02094,1.068076


Confidence:
* Yogurt & Whole milk: 53% people who buy yogurt, will buy whole milk.
* Roll/buns & Whole milk: 51% people who buy roll/buns, will buy whole milk.
* Other vegetables & Whole milk: 51% people who buy other vegetables, will buy whole milk.
* Soda & Whole milk: 31% people who buy soda, will buy whole milk.
* Conviction value is more than 1 which mean the pattern is valid.

# Conclusions

## Result

* Top 5 most selling items from 2014-2015:
    1. Whole milk: 46% from total purchases.
    2. Other vegetables: 38% from total purchases.
    3. Rolls/buns: 35% from total purchases.
    4. Soda: 31% from total purchases.
    5. Youghrt: 28% from total purchases.
* Whole have the most purchases item that can be combined with other vegetables, roll/buns, soda, and youghrt.
* Store failed to sell even single kitchen utensil or perseverence product on 2015

## Recommendation

* Store can give special value for whole milk and it's consequents.
* Youghrt, roll/buns, and soda can be combined with whole milk
* Store need pay attention to whole milk stocks.
* Store can give discount on sunday to attract customer comes to store.
* Store can arrange yougrt, other vegetables, soda, and roll/buns beside whole milk.

# Reference

Soure:
1. https://pbpython.com/market-basket-analysis.html
2. https://yandaafrida.medium.com/association-rule-market-basket-analysis-menggunakan-python-a9c49b4bfc69
3. https://www.researchgate.net/publication/336848615_Consumer_Customs_Analysis_Using_the_Association_Rule_and_Apriori_Algorithm_for_Determining_Sales_Strategies_in_Retail_Central