# Market basket analysis

## Introduction

Market basket analysis - analysis of custumer behavior, if you buy produsts A and B, you are more likely to buy product C.

Our tutorial: https://www.kaggle.com/code/khusheekapoor/market-basket-analysis-in-python/notebook

Our data: https://www.kaggle.com/datasets/mittalvasu95/the-bread-basket

In [4]:
# Data download from Kaggle
# should be used once

# Install Kaggle API
%pip install kaggle

# Download the dataset from Kaggle
!kaggle datasets download -d mittalvasu95/the-bread-basket

# Unzip the downloaded file
import zipfile
with zipfile.ZipFile('the-bread-basket.zip', 'r') as zip_ref:
	zip_ref.extractall('.')

Note: you may need to restart the kernel to use updated packages.
Dataset URL: https://www.kaggle.com/datasets/mittalvasu95/the-bread-basket
License(s): CC0-1.0
the-bread-basket.zip: Skipping, found more recently modified local copy (use --force to force download)


In [None]:
# Import libraries

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from csv import reader
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [6]:
# Load the data
data = pd.read_csv('bread basket.csv')

# Head of the data
data.head()

Unnamed: 0,Transaction,Item,date_time,period_day,weekday_weekend
0,1,Bread,30-10-2016 09:58,morning,weekend
1,2,Scandinavian,30-10-2016 10:05,morning,weekend
2,2,Scandinavian,30-10-2016 10:05,morning,weekend
3,3,Hot chocolate,30-10-2016 10:07,morning,weekend
4,3,Jam,30-10-2016 10:07,morning,weekend


## Exploratory Data Analysis

In [7]:
# Info about the data
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20507 entries, 0 to 20506
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Transaction      20507 non-null  int64 
 1   Item             20507 non-null  object
 2   date_time        20507 non-null  object
 3   period_day       20507 non-null  object
 4   weekday_weekend  20507 non-null  object
dtypes: int64(1), object(4)
memory usage: 801.2+ KB


In [8]:
data.value_counts('period_day')

period_day
afternoon    11569
morning       8404
evening        520
night           14
Name: count, dtype: int64

In [9]:
data.value_counts('weekday_weekend')

weekday_weekend
weekday    12807
weekend     7700
Name: count, dtype: int64

## Data Cleaning

In [12]:
# Grouping by Transaction and aggregating the Items
basket_data = data.groupby(['Transaction']).agg({'Item':lambda x: list(x)})

# Joining with the original data by Transaction
trans_data = data[['Transaction', 'weekday_weekend', 'period_day']].drop_duplicates().set_index('Transaction').join(basket_data).reset_index()
df = trans_data[['Item', 'weekday_weekend', 'period_day']]

df

Unnamed: 0,Item,weekday_weekend,period_day
0,[Bread],weekend,morning
1,"[Scandinavian, Scandinavian]",weekend,morning
2,"[Hot chocolate, Jam, Cookies]",weekend,morning
3,[Muffin],weekend,morning
4,"[Coffee, Pastry, Bread]",weekend,morning
...,...,...,...
9460,[Bread],weekend,afternoon
9461,"[Truffles, Tea, Spanish Brunch, Christmas common]",weekend,afternoon
9462,"[Muffin, Tacos/Fajita, Coffee, Tea]",weekend,afternoon
9463,"[Coffee, Pastry]",weekend,afternoon
