# A practical analytical project from Quantum

## Overview

This project analyzes a customer transaction dataset and identifies customer purchasing behavior patterns to generate valuable insights and information.

#### Context

You are part of Quantium’s retail analytics team and have been approached by your client, the Category Manager for chips, who wants to better understand the types of customers who purchase chips and their purchasing behaviour within the region.

The insights from your analysis will feed into the supermarket’s strategic plan for the chip category in the next half year.

## Project Goals

Here are the main ponts of this project:
- examine and clean transaction and customer data.
- identify customer segments based on purchasing behavior.
- creating charts and graphs to present data insights.
- deriving commercial recommendations from data analysis.

## Actions

- Analyze transaction and customer data. 
- Develop metrics and examine sales drivers.
- Segment customers based on purchasing behavior.
- Create visualizations.
- Formulate a clear recommendation for the client's strategy.

## Data

There are two datasets provided for this project:
1. `QVI_transaction_data.xlsx` - This dataset contains customer transaction data, including....
2. `QVI_purchase_behaviour.csv`

## Analysis

1. Examine transaction data
1. Examine customer data
1. Data analysis and customer segments
1. Define recommendation by customer segments

## Data preparation and customer analytics

In [62]:
# importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [63]:
# setting options
pd.set_option('display.max_columns', None)
pd.set_option("display.float_format", "{:.2f}".format)
pd.set_option('max_colwidth', 0)

In [64]:
# load the datasets

transactions = pd.read_excel('QVI_transaction_data.xlsx')
purchase_behaviour = pd.read_csv('QVI_purchase_behaviour.csv')

In [65]:
# shape of datasets
print("Transactions dataset shape:", transactions.shape)
print("Purchase behaviour dataset shape:", purchase_behaviour.shape)

Transactions dataset shape: (264836, 8)
Purchase behaviour dataset shape: (72637, 3)


In [66]:
# inspect the first few rows of the transactions dataset
transactions.head()

Unnamed: 0,DATE,STORE_NBR,LYLTY_CARD_NBR,TXN_ID,PROD_NBR,PROD_NAME,PROD_QTY,TOT_SALES
0,43390,1,1000,1,5,Natural Chip Compny SeaSalt175g,2,6.0
1,43599,1,1307,348,66,CCs Nacho Cheese 175g,3,6.3
2,43605,1,1343,383,61,Smiths Crinkle Cut Chips Chicken 170g,2,2.9
3,43329,2,2373,974,69,Smiths Chip Thinly S/Cream&Onion 175g,5,15.0
4,43330,2,2426,1038,108,Kettle Tortilla ChpsHny&Jlpno Chili 150g,3,13.8


In [79]:
# checking for missing values
transactions.isna().sum()

DATE              0
STORE_NBR         0
LYLTY_CARD_NBR    0
TXN_ID            0
PROD_NBR          0
PROD_NAME         0
PROD_QTY          0
TOT_SALES         0
dtype: int64

In [67]:
# checking for data types
transactions.dtypes

DATE              int64  
STORE_NBR         int64  
LYLTY_CARD_NBR    int64  
TXN_ID            int64  
PROD_NBR          int64  
PROD_NAME         object 
PROD_QTY          int64  
TOT_SALES         float64
dtype: object

In [68]:
# convert 'DATE' column to datetime format
# Excel's date system starts on 1899-12-30
transactions['DATE'] = pd.to_datetime(transactions['DATE'], origin='1899-12-30', unit='D')

In [69]:
# checking for data types again
transactions.dtypes

DATE              datetime64[ns]
STORE_NBR         int64         
LYLTY_CARD_NBR    int64         
TXN_ID            int64         
PROD_NBR          int64         
PROD_NAME         object        
PROD_QTY          int64         
TOT_SALES         float64       
dtype: object

In [74]:
# inspect the first few rows of transactions after date conversion
transactions.head()

Unnamed: 0,DATE,STORE_NBR,LYLTY_CARD_NBR,TXN_ID,PROD_NBR,PROD_NAME,PROD_QTY,TOT_SALES
0,2018-10-17,1,1000,1,5,Natural Chip Compny SeaSalt175g,2,6.0
1,2019-05-14,1,1307,348,66,CCs Nacho Cheese 175g,3,6.3
2,2019-05-20,1,1343,383,61,Smiths Crinkle Cut Chips Chicken 170g,2,2.9
3,2018-08-17,2,2373,974,69,Smiths Chip Thinly S/Cream&Onion 175g,5,15.0
4,2018-08-18,2,2426,1038,108,Kettle Tortilla ChpsHny&Jlpno Chili 150g,3,13.8


In [80]:
# rename columns for better readability
transactions.rename(columns={'DATE': 'date',
                             'STORE_NBR': 'store_number',
                             'LYLTY_CARD_NBR': 'loyalty_card_number',
                             'TXN_ID': 'transaction_id',
                             'PROD_NBR': 'product_number',
                             'PROD_NAME': 'product_name',
                             'PROD_QTY': 'product_quantity',
                             'TOT_SALES': 'total_sales'}, inplace=True)
# inspect the first few rows after renaming columns
transactions.head()

Unnamed: 0,date,store_number,loyalty_card_number,transaction_id,product_number,product_name,product_quantity,total_sales
0,2018-10-17,1,1000,1,5,Natural Chip Compny SeaSalt175g,2,6.0
1,2019-05-14,1,1307,348,66,CCs Nacho Cheese 175g,3,6.3
2,2019-05-20,1,1343,383,61,Smiths Crinkle Cut Chips Chicken 170g,2,2.9
3,2018-08-17,2,2373,974,69,Smiths Chip Thinly S/Cream&Onion 175g,5,15.0
4,2018-08-18,2,2426,1038,108,Kettle Tortilla ChpsHny&Jlpno Chili 150g,3,13.8


In [78]:
transactions['PROD_NAME'].value_counts()

PROD_NAME
Kettle Mozzarella   Basil & Pesto 175g      3304
Kettle Tortilla ChpsHny&Jlpno Chili 150g    3296
Cobs Popd Swt/Chlli &Sr/Cream Chips 110g    3269
Tyrrells Crisps     Ched & Chives 165g      3268
Cobs Popd Sea Salt  Chips 110g              3265
                                            ... 
RRD Pc Sea Salt     165g                    1431
Woolworths Medium   Salsa 300g              1430
NCC Sour Cream &    Garden Chives 175g      1419
French Fries Potato Chips 175g              1418
WW Crinkle Cut      Original 175g           1410
Name: count, Length: 114, dtype: int64

In [72]:
purchase_behaviour.head()

Unnamed: 0,LYLTY_CARD_NBR,LIFESTAGE,PREMIUM_CUSTOMER
0,1000,YOUNG SINGLES/COUPLES,Premium
1,1002,YOUNG SINGLES/COUPLES,Mainstream
2,1003,YOUNG FAMILIES,Budget
3,1004,OLDER SINGLES/COUPLES,Mainstream
4,1005,MIDAGE SINGLES/COUPLES,Mainstream
