Day 8 of Python Summer Party

by Interview Master

Lululemon

Payment Method Impact on Athleisure Online Sales

You are a Product Analyst for the Lululemon Online Store team investigating how alternative payment methods might influence sales performance. The team wants to understand the potential impact of introducing a new installment payment option. Your analysis will predict sales lift and customer conversion for the proposed payment method.

Question 1 of 3

Between April 1st and June 30th, 2025, what is the count of transactions for each payment method? This analysis will establish the baseline distribution of how customers currently pay.

In [1]:
import pandas as pd
import numpy as np


In [2]:
# Load the CSV file into a DataFrame and display it
fct_transactions = pd.read_csv('fct_transactions.csv')
fct_transactions_df = fct_transactions.copy()
fct_transactions_df


Unnamed: 0,customer_id,order_value,payment_method,transaction_id,transaction_date
0,201,250.0,credit_card,1,2025-03-15
1,202,95.0,debit_card,2,2025-03-20
2,203,75.0,paypal,3,2025-03-25
3,204,310.0,credit_card,4,2024-11-10
4,205,65.0,paypal,5,2024-12-05
5,206,265.0,credit_card,6,2024-07-15
6,207,290.0,credit_card,7,2024-08-10
7,208,275.0,credit_card,8,2024-09-05
8,209,280.0,credit_card,9,2024-10-20
9,210,90.0,debit_card,10,2024-10-25


In [3]:
# Displaying data information and statistics
print(fct_transactions_df.info())
print(fct_transactions_df.describe(include='all').T)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customer_id       50 non-null     int64  
 1   order_value       50 non-null     float64
 2   payment_method    50 non-null     object 
 3   transaction_id    50 non-null     int64  
 4   transaction_date  50 non-null     object 
dtypes: float64(1), int64(2), object(2)
memory usage: 2.1+ KB
None
                 count unique          top freq   mean        std    min  \
customer_id       50.0    NaN          NaN  NaN  137.5   35.91728  101.0   
order_value       50.0    NaN          NaN  NaN  204.9  98.736973   65.0   
payment_method      50      3  credit_card   31    NaN        NaN    NaN   
transaction_id    50.0    NaN          NaN  NaN   25.5   14.57738    1.0   
transaction_date    50     46   2025-06-25    2    NaN        NaN    NaN   

                     25%    50%     75%    ma

In [4]:
# First we need to transform the 'transaction_date' column to datetime format
fct_transactions_df['transaction_date'] = pd.to_datetime(fct_transactions_df['transaction_date'], format='%Y-%m-%d')
print(fct_transactions_df.info())



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   customer_id       50 non-null     int64         
 1   order_value       50 non-null     float64       
 2   payment_method    50 non-null     object        
 3   transaction_id    50 non-null     int64         
 4   transaction_date  50 non-null     datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(2), object(1)
memory usage: 2.1+ KB
None


In [5]:
# Now lets group by transactions between April 1, 2023 and June 30, 2023
Apr_Jun_df = fct_transactions_df[(fct_transactions_df['transaction_date'] >= '2025-04-01') & (fct_transactions_df['transaction_date'] <= '2025-06-30')]
print(Apr_Jun_df)


    customer_id  order_value payment_method  transaction_id transaction_date
10          101        275.0    credit_card              11       2025-04-02
11          102        285.0    credit_card              12       2025-04-05
12          103        280.0    credit_card              13       2025-04-10
13          104        290.0    credit_card              14       2025-04-15
14          105        270.0    credit_card              15       2025-04-20
15          106        295.0    credit_card              16       2025-04-25
16          107        280.0    credit_card              17       2025-05-01
17          108        275.0    credit_card              18       2025-05-05
18          109        285.0    credit_card              19       2025-05-10
19          110        290.0    credit_card              20       2025-05-15
20          111        270.0    credit_card              21       2025-05-20
21          112        280.0    credit_card              22       2025-05-25

In [6]:
# Now, in order to find out the count of transactions by payment method, we can use the groupby function
fct_transaction_count = Apr_Jun_df.groupby('payment_method').size().reset_index(name='transaction_count')
print(fct_transaction_count)


  payment_method  transaction_count
0    credit_card                 25
1     debit_card                  8
2         paypal                  7


In [7]:
# Answer to question 1: The count of transactions by payment method between April 1, 2023, and June 30, 2023
print("The count of transactions by payment method between April 1, 2023, and June 30, 2023 is as follows:;");
print(fct_transaction_count)


The count of transactions by payment method between April 1, 2023, and June 30, 2023 is as follows:;
  payment_method  transaction_count
0    credit_card                 25
1     debit_card                  8
2         paypal                  7


Question 2:

Between April 1st and June 30th, 2025, what is the average order value for each payment method? This metric will help us assess which payment methods are tied to higher spending levels.



In [8]:
# Showing the datafram again
print(Apr_Jun_df)


    customer_id  order_value payment_method  transaction_id transaction_date
10          101        275.0    credit_card              11       2025-04-02
11          102        285.0    credit_card              12       2025-04-05
12          103        280.0    credit_card              13       2025-04-10
13          104        290.0    credit_card              14       2025-04-15
14          105        270.0    credit_card              15       2025-04-20
15          106        295.0    credit_card              16       2025-04-25
16          107        280.0    credit_card              17       2025-05-01
17          108        275.0    credit_card              18       2025-05-05
18          109        285.0    credit_card              19       2025-05-10
19          110        290.0    credit_card              20       2025-05-15
20          111        270.0    credit_card              21       2025-05-20
21          112        280.0    credit_card              22       2025-05-25

In [9]:
# To answer question 2, we can use the groupby function again, but this time we will calculate the mean of the 'order_value' column for each payment method
fct_avg_order_value = Apr_Jun_df.groupby('payment_method')['order_value'].mean().reset_index(name='average_order_value')
print(fct_avg_order_value)


  payment_method  average_order_value
0    credit_card                281.6
1     debit_card                 90.0
2         paypal                 70.0


In [10]:
# We can even append the average order value to the transaction count dataframe
fct_transaction_summary = pd.merge(fct_transaction_count, fct_avg_order_value, on='payment_method')
print("The transaction summary of transaction cound and average order value between April 1, 2025, and June 30, 2025 is as follows:");
print(fct_transaction_summary)


The transaction summary of transaction cound and average order value between April 1, 2025, and June 30, 2025 is as follows:
  payment_method  transaction_count  average_order_value
0    credit_card                 25                281.6
1     debit_card                  8                 90.0
2         paypal                  7                 70.0


In [11]:
# Answer to question 2: The average order value for each payment method between April 1, 2025, and June 30, 2025
print("The average order value for each payment method between April 1, 2025, and June 30, 2025 is as follows:");
print(fct_avg_order_value)


The average order value for each payment method between April 1, 2025, and June 30, 2025 is as follows:
  payment_method  average_order_value
0    credit_card                281.6
1     debit_card                 90.0
2         paypal                 70.0


Question 3:

Between April 1st and June 30th, 2025, what would be the predicted sales lift if a 'pay over time' option were introduced? Assume that 20% of credit card transactions during this period would switch to using the 'pay over time' option. And that for these switched transactions, the order value is expected to increase by 15% based on the average order value of all credit card transactions in that same time period.

In [12]:
# Printing dataframe again
print(Apr_Jun_df)


    customer_id  order_value payment_method  transaction_id transaction_date
10          101        275.0    credit_card              11       2025-04-02
11          102        285.0    credit_card              12       2025-04-05
12          103        280.0    credit_card              13       2025-04-10
13          104        290.0    credit_card              14       2025-04-15
14          105        270.0    credit_card              15       2025-04-20
15          106        295.0    credit_card              16       2025-04-25
16          107        280.0    credit_card              17       2025-05-01
17          108        275.0    credit_card              18       2025-05-05
18          109        285.0    credit_card              19       2025-05-10
19          110        290.0    credit_card              20       2025-05-15
20          111        270.0    credit_card              21       2025-05-20
21          112        280.0    credit_card              22       2025-05-25

In [13]:
# The question is asking for the predicted sales lift if a 'pay over time' option were introduced.
# We know that 20% of credit card transactions during this period would switch to using the 'pay over time' option.
# And that for these switched transactions, the order value is expected to increase by 15% based on the average order value of all credit card transactions in that same time period.
# Since fct_transaction_summary contains the average order value for each payment method, and the transaction count, we can use this information to calculate the predicted sales lift.
Apr_Jun_fct_trans_summary = fct_transaction_summary
print("The transaction summary of transaction cound and average order value between April 1, 2025, and June 30, 2025 is as follows:");
print(Apr_Jun_fct_trans_summary)



The transaction summary of transaction cound and average order value between April 1, 2025, and June 30, 2025 is as follows:
  payment_method  transaction_count  average_order_value
0    credit_card                 25                281.6
1     debit_card                  8                 90.0
2         paypal                  7                 70.0


In [14]:
# We can start by defninidn the switch rate and the uplift rate
switch_frac = 0.20
uplift = 0.15

credit_card_row = Apr_Jun_fct_trans_summary.query("payment_method == 'credit_card'").iloc[0]
print(credit_card_row)



payment_method         credit_card
transaction_count               25
average_order_value          281.6
Name: 0, dtype: object


In [15]:
# New 'pay over time' row predicted
pay_over_time_predicted = {
    'payment_method': 'pay_over_time_predicted',
    'transaction_count' : int(round(credit_card_row['transaction_count'] * switch_frac)),
    'average_order_value': credit_card_row['average_order_value'] * (1+uplift)
}

# Adjust credit card row (remaining 80% of txns; AOV stays the same as per assumption)
adjusted_credit_card_row = {
    "payment_method": "credit_card_adjusted",
    "transaction_count": int(round(credit_card_row["transaction_count"] * (1 - switch_frac))),
    "average_order_value": credit_card_row["average_order_value"],
}

# Now we can create a new dataframe that includes the original summary, the predicted 'pay over time' row, and the adjusted credit card row
predicted_df = pd.concat(
    [
        Apr_Jun_fct_trans_summary,                           # original summary
        pd.DataFrame([pay_over_time_predicted, adjusted_credit_card_row]) # scenario additions
    ],
    ignore_index=True
)

print(predicted_df)



            payment_method  transaction_count  average_order_value
0              credit_card                 25               281.60
1               debit_card                  8                90.00
2                   paypal                  7                70.00
3  pay_over_time_predicted                  5               323.84
4     credit_card_adjusted                 20               281.60


In [16]:
# Finally, we can calculate the total sales for each payment method by multiplying the transaction count by the average order value
predicted_df["sales_lift"] = (predicted_df["transaction_count"] * predicted_df["average_order_value"])
print(predicted_df)
print()


final_prediction_df = predicted_df[
    predicted_df["payment_method"] != "credit_card"
]
print(final_prediction_df)


            payment_method  transaction_count  average_order_value  sales_lift
0              credit_card                 25               281.60      7040.0
1               debit_card                  8                90.00       720.0
2                   paypal                  7                70.00       490.0
3  pay_over_time_predicted                  5               323.84      1619.2
4     credit_card_adjusted                 20               281.60      5632.0

            payment_method  transaction_count  average_order_value  sales_lift
1               debit_card                  8                90.00       720.0
2                   paypal                  7                70.00       490.0
3  pay_over_time_predicted                  5               323.84      1619.2
4     credit_card_adjusted                 20               281.60      5632.0


In [17]:
# Answer to question 3: The predicted sales lift and predicted conversion are as follows:'
# Get the predicted conversion (number of txns switched)
predicted_conversion = pay_over_time_predicted["transaction_count"]

# Compute the baseline revenue of those txns if they stayed as credit card
baseline_revenue = predicted_conversion * credit_card_row["average_order_value"]

# Compute the new revenue under 'pay over time'
new_revenue = predicted_conversion * pay_over_time_predicted["average_order_value"]

# Sales lift is the difference
predicted_sales_lift = new_revenue - baseline_revenue

print(f"Predicted conversion (transactions switched): {predicted_conversion}")
print(f"Predicted sales lift (extra revenue): ${predicted_sales_lift:.2f}")


Predicted conversion (transactions switched): 5
Predicted sales lift (extra revenue): $211.20


In [18]:
# ################################################################################
# print()
# print("=" * 150)
# print("=" * 150)
# print()
# ################################################################################
# # Question 1 of 3
# # Between April 1st and June 30th, 2025, what is the count of transactions for each payment method? This analysis will establish the baseline distribution of how customers currently pay.

# # Note: pandas and numpy are already imported as pd and np
# # The following tables are loaded as pandas DataFrames with the same names: fct_transactions
# # Please print your final result or dataframe

# # Load the CSV file into a DataFrame and display it
# fct_transactions_df = fct_transactions.copy()
# print(fct_transactions_df)
# print("=" * 150)
# print()

# # Displaying data information and statistics
# print(fct_transactions_df.info())
# print()
# print(fct_transactions_df.describe(include='all').T)
# print("=" * 150)
# print()

# # There is no missing data in the dataframe which is great!

# # First we need to transform the 'transaction_date' column to datetime format
# fct_transactions_df['transaction_date'] = pd.to_datetime(fct_transactions_df['transaction_date'], format='%Y-%m-%d')
# print(fct_transactions_df.info())
# print("=" * 150)
# print()

# # Now lets group by transactions between April 1, 2023 and June 30, 2023
# Apr_Jun_df = fct_transactions_df[(fct_transactions_df['transaction_date'] >= '2025-04-01') & (fct_transactions_df['transaction_date'] <= '2025-06-30')]
# print(Apr_Jun_df)
# print("=" * 150)
# print()

# # Now, in order to find out the count of transactions by payment method, we can use the groupby function
# fct_transaction_count = Apr_Jun_df.groupby('payment_method').size().reset_index(name='transaction_count')
# print(fct_transaction_count)
# print("=" * 150)
# print()

# # Answer to question 1: The count of transactions by payment method between April 1, 2023, and June 30, 2023
# print("The count of transactions by payment method between April 1, 2023, and June 30, 2023 is as follows:;");
# print(fct_transaction_count)
# print("=" * 150)
# print()

# ################################################################################
# print()
# print("=" * 150)
# print("=" * 150)
# print()
# ################################################################################
# # Question 2: Between April 1st and June 30th, 2025, what is the average order value for each payment method? This metric will help us assess which payment methods are tied to higher spending levels.

# # Showing the datafram again
# print(Apr_Jun_df)
# print("=" * 150)
# print()

# # To answer question 2, we can use the groupby function again, but this time we will calculate the mean of the 'order_value' column for each payment method
# fct_avg_order_value = Apr_Jun_df.groupby('payment_method')['order_value'].mean().reset_index(name='average_order_value')
# print(fct_avg_order_value)
# print("=" * 150)
# print()

# # We can even append the average order value to the transaction count dataframe
# fct_transaction_summary = pd.merge(fct_transaction_count, fct_avg_order_value, on='payment_method')
# print(fct_transaction_summary)
# print("=" * 150)
# print()

# # Answer to question 2: The average order value for each payment method between April 1, 2025, and June 30, 2025
# print("The average order value for each payment method between April 1, 2025, and June 30, 2025 is as follows:");
# print(fct_avg_order_value)
# print("=" * 150)
# print()

# ################################################################################
# print()
# print("=" * 150)
# print("=" * 150)
# print()
# ################################################################################
# # Between April 1st and June 30th, 2025, what would be the predicted sales lift if a 'pay over time' option were introduced? Assume that 20% of credit card transactions during this period would switch to using the 'pay over time' option. And that for these switched transactions, the order value is expected to increase by 15% based on the average order value of all credit card transactions in that same time period.

# # Printing dataframe again
# print(Apr_Jun_df)

# # The question is asking for the predicted sales lift if a 'pay over time' option were introduced.
# # We know that 20% of credit card transactions during this period would switch to using the 'pay over time' option.
# # And that for these switched transactions, the order value is expected to increase by 15% based on the average order value of all credit card transactions in that same time period.
# # Since fct_transaction_summary contains the average order value for each payment method, and the transaction count, we can use this information to calculate the predicted sales lift.
# Apr_Jun_fct_trans_summary = fct_transaction_summary
# print("The transaction summary of transaction cound and average order value between April 1, 2025, and June 30, 2025 is as follows:");
# print(Apr_Jun_fct_trans_summary)

# # We can start by defninidn the switch rate and the uplift rate
# switch_frac = 0.20
# uplift = 0.15

# credit_card_row = Apr_Jun_fct_trans_summary.query("payment_method == 'credit_card'").iloc[0]
# print(credit_card_row)

# # New 'pay over time' row predicted
# pay_over_time_predicted = {
#     'payment_method': 'pay_over_time_predicted',
#     'transaction_count' : int(round(credit_card_row['transaction_count'] * switch_frac)),
#     'average_order_value': credit_card_row['average_order_value'] * (1+uplift)
# }

# # Adjust credit card row (remaining 80% of txns; AOV stays the same as per assumption)
# adjusted_credit_card_row = {
#     "payment_method": "credit_card_adjusted",
#     "transaction_count": int(round(credit_card_row["transaction_count"] * (1 - switch_frac))),
#     "average_order_value": credit_card_row["average_order_value"],
# }

# # Now we can create a new dataframe that includes the original summary, the predicted 'pay over time' row, and the adjusted credit card row
# predicted_df = pd.concat(
#     [
#         Apr_Jun_fct_trans_summary,                           # original summary
#         pd.DataFrame([pay_over_time_predicted, adjusted_credit_card_row]) # scenario additions
#     ],
#     ignore_index=True
# )

# print(predicted_df)

# # Finally, we can calculate the total sales for each payment method by multiplying the transaction count by the average order value
# predicted_df["sales_lift"] = (predicted_df["transaction_count"] * predicted_df["average_order_value"])
# print(predicted_df)
# print()


# final_prediction_df = predicted_df[
#     predicted_df["payment_method"] != "credit_card"
# ]
# print(final_prediction_df)

# # Answer to question 3: The predicted sales lift and predicted conversion are as follows:'
# # Get the predicted conversion (number of txns switched)
# predicted_conversion = pay_over_time_predicted["transaction_count"]

# # Compute the baseline revenue of those txns if they stayed as credit card
# baseline_revenue = predicted_conversion * credit_card_row["average_order_value"]

# # Compute the new revenue under 'pay over time'
# new_revenue = predicted_conversion * pay_over_time_predicted["average_order_value"]

# # Sales lift is the difference
# predicted_sales_lift = new_revenue - baseline_revenue

# print(f"Predicted conversion (transactions switched): {predicted_conversion}")
# print(f"Predicted sales lift (extra revenue): ${predicted_sales_lift:.2f}")
