Day 14 of Python Summer Party

by Interview Master

Starbucks

Loyalty Program's Impact on Transaction Patterns

You are a Business Analyst on the Starbucks Rewards team investigating customer transaction behavior. Your team wants to understand how loyalty program membership influences purchasing patterns. The goal is to compare transaction metrics between loyalty members and non-members.

In [1]:
import pandas as pd
import numpy as np

# Load the datasets and display them
fct_transactions = pd.read_csv('fct_transactions.csv')
dim_customers = pd.read_csv('dim_customers.csv')

fct_transactions_df = fct_transactions.copy()
dim_customers_df = dim_customers.copy()

print(fct_transactions_df.info())
print()
print(fct_transactions_df.head())
print()
print(dim_customers_df.info())
print()
print(dim_customers_df.head())
print()
print("=" * 150)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   customer_id        16 non-null     int64  
 1   transaction_id     16 non-null     int64  
 2   transaction_date   16 non-null     object 
 3   transaction_value  16 non-null     float64
dtypes: float64(1), int64(2), object(1)
memory usage: 644.0+ bytes
None

   customer_id  transaction_id transaction_date  transaction_value
0            1             101       2024-07-05               5.50
1            1             102       2024-07-15               7.25
2            2             103       2024-07-10               4.00
3            3             104       2024-07-20               8.75
4            4             105       2024-07-03               6.50

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 2 columns):
 #   Column             Non-Null

Question 1 of 3

For the month of July 2024, how many transactions did loyalty program members and non-members make? Compare the transaction counts between these two groups.

In [2]:
# We are going to start by merging both dataframes into one for further analysis
merged_fct_df =pd.merge(fct_transactions_df, dim_customers_df, how='right', on='customer_id')
print(merged_fct_df.info())
print()
print(merged_fct_df)
print()
print("=" * 150)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   customer_id        16 non-null     int64  
 1   transaction_id     16 non-null     int64  
 2   transaction_date   16 non-null     object 
 3   transaction_value  16 non-null     float64
 4   is_loyalty_member  16 non-null     bool   
dtypes: bool(1), float64(1), int64(2), object(1)
memory usage: 660.0+ bytes
None

    customer_id  transaction_id transaction_date  transaction_value  \
0             1             101       2024-07-05               5.50   
1             1             102       2024-07-15               7.25   
2             1             116       2024-07-31               6.25   
3             2             103       2024-07-10               4.00   
4             3             104       2024-07-20               8.75   
5             3             114       2024-07-30      

In [3]:
# Now that we have merged the dataframes, we can start to look at the transaction patterns of our customers
# First lets transform the 'transaction_date' column to datetime format
merged_fct_df['transaction_date'] = pd.to_datetime(merged_fct_df['transaction_date'], format='%Y-%m-%d', errors='coerce')
print(merged_fct_df.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   customer_id        16 non-null     int64         
 1   transaction_id     16 non-null     int64         
 2   transaction_date   16 non-null     datetime64[ns]
 3   transaction_value  16 non-null     float64       
 4   is_loyalty_member  16 non-null     bool          
dtypes: bool(1), datetime64[ns](1), float64(1), int64(2)
memory usage: 660.0 bytes
None


In [4]:
# Now lets filter the dataframe to include transactions for July 2024
jul_fct_df = merged_fct_df[(merged_fct_df['transaction_date'] >= '2024-07-01') & (merged_fct_df['transaction_date'] < '2024-08-01')]
print(jul_fct_df.info())
print()
print(jul_fct_df)
print()
print("=" * 150)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   customer_id        16 non-null     int64         
 1   transaction_id     16 non-null     int64         
 2   transaction_date   16 non-null     datetime64[ns]
 3   transaction_value  16 non-null     float64       
 4   is_loyalty_member  16 non-null     bool          
dtypes: bool(1), datetime64[ns](1), float64(1), int64(2)
memory usage: 660.0 bytes
None

    customer_id  transaction_id transaction_date  transaction_value  \
0             1             101       2024-07-05               5.50   
1             1             102       2024-07-15               7.25   
2             1             116       2024-07-31               6.25   
3             2             103       2024-07-10               4.00   
4             3             104       2024-07-20               8.75  

In [5]:
# Now we will count how many transaction did members with and without loyalty membership made in this month
print("Number of transactions made by members with and without loyalty membership during July 2024:")
print(jul_fct_df['is_loyalty_member'].value_counts())


Number of transactions made by members with and without loyalty membership during July 2024:
is_loyalty_member
True     10
False     6
Name: count, dtype: int64


Question 2:

What is the average transaction value for loyalty program members and non-members during July 2024? Use this to identify which group has a higher average transaction value.

In [6]:
# Since the data is already filtrered for July 2024, we can groupby loyalty membership and calculate the average transaction value
jul_avg_txn_value = jul_fct_df.groupby('is_loyalty_member')['transaction_value'].mean().reset_index(name='average_transaction_value').round(2)
print("Average transaction value for members with and without loyalty membership during July 2024:")
print(jul_avg_txn_value)


Average transaction value for members with and without loyalty membership during July 2024:
   is_loyalty_member  average_transaction_value
0              False                       4.92
1               True                       8.80


Question 3:

Determine the percentage difference in average transaction value between loyalty program members and non-members for July 2024.

In [9]:
# We can directly calculate the percentage difference in average transaction value between members and non-members by subtracting the average transaction value of non-members from the average transaction value of members and 
percentage_diff = (jul_avg_txn_value[jul_avg_txn_value['is_loyalty_member'] == True]['average_transaction_value'].values[0] - jul_avg_txn_value[jul_avg_txn_value['is_loyalty_member'] == False]['average_transaction_value'].values[0]) / jul_avg_txn_value[jul_avg_txn_value['is_loyalty_member'] == False]['average_transaction_value'].values[0] * 100
print("Percentage difference in average transaction value between members and non-members during July 2024:")
print(percentage_diff.round(2), "%")



Percentage difference in average transaction value between members and non-members during July 2024:
78.86 %
