# Day 3: Disney Parks Guest Spending Behavior

You are a data analyst working with the Disney Parks revenue team to understand nuanced guest spending patterns across different park experiences. The team wants to develop a comprehensive view of visitor purchasing behaviors. Your goal is to uncover meaningful insights that can drive personalized marketing strategies.

In [None]:
import pandas as pd
import numpy as np

fct_guest_spending_data = [
  {
    "guest_id": 1,
    "visit_date": "2024-07-05",
    "amount_spent": 50,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-07-06",
    "amount_spent": 30,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-07-10",
    "amount_spent": 20.5,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 4,
    "visit_date": "2024-07-12",
    "amount_spent": 40,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-07-15",
    "amount_spent": 35,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 5,
    "visit_date": "2024-07-20",
    "amount_spent": 60,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 6,
    "visit_date": "2024-07-25",
    "amount_spent": 25,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-08-03",
    "amount_spent": 55,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-08-15",
    "amount_spent": 45,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-08-05",
    "amount_spent": 22,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-08-20",
    "amount_spent": 38,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 7,
    "visit_date": "2024-08-10",
    "amount_spent": 15,
    "park_experience_type": "Character Meet"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-08-25",
    "amount_spent": 28,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-08-27",
    "amount_spent": 32,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-09-02",
    "amount_spent": 65,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 8,
    "visit_date": "2024-09-05",
    "amount_spent": 50,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 9,
    "visit_date": "2024-09-15",
    "amount_spent": 40,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 10,
    "visit_date": "2024-09-20",
    "amount_spent": 70,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-09-25",
    "amount_spent": 35,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 8,
    "visit_date": "2024-09-28",
    "amount_spent": 10,
    "park_experience_type": "Character Meet"
  }
]
fct_guest_spending = pd.DataFrame(fct_guest_spending_data)


## Question 1

What is the average spending per guest per visit for each park experience type during July 2024? Ensure that park experience types with no recorded transactions are shown with an average spending of 0.0. This analysis helps establish baseline spending differences essential for later segmentation.

In [None]:
# Convert date
fct_guest_spending['visit_date'] = pd.to_datetime(fct_guest_spending['visit_date'])

# Filter for July 2024
july_data = fct_guest_spending[
    (fct_guest_spending['visit_date'] >= '2024-07-01') &
    (fct_guest_spending['visit_date'] <= '2024-07-31')
]
print("July data shape:", july_data.shape)

# Group by park_experience_type to compute total amount and total visits
agg_data = july_data.groupby('park_experience_type').agg(
    total_spent=('amount_spent', 'sum'),
    total_visits=('guest_id', 'count')  # each row = 1 visit
).reset_index()

# Calculate avg spending per guest per visit
agg_data['avg_spending_per_visit'] = agg_data['total_spent'] / agg_data['total_visits']

# Ensure park experience types with 0 visits are included
all_experiences = pd.DataFrame(
    fct_guest_spending['park_experience_type'].unique(),
    columns=['park_experience_type']
)
print("All experience types:\n", all_experiences)

# Left join to include all types, fill NaN with 0.0
final_result = all_experiences.merge(agg_data, on='park_experience_type', how='left')
final_result['avg_spending_per_visit'] = final_result['avg_spending_per_visit'].fillna(0.0)

# Display final result
print("Final Output:\n", final_result[['park_experience_type', 'avg_spending_per_visit']])

## Question 2

For guests who visited our parks more than once in August 2024, what is the difference in spending between their first and their last visit? This investigation, using sequential analysis, will reveal any shifts in guest spending behavior over multiple visits.

In [None]:
# Step 1: Convert visit_date to datetime
fct_guest_spending['visit_date'] = pd.to_datetime(fct_guest_spending['visit_date'])

# Step 2: Filter for August 2024
aug_data = fct_guest_spending[
    (fct_guest_spending['visit_date'].dt.year == 2024) &
    (fct_guest_spending['visit_date'].dt.month == 8)
]

# Step 3: Total spending per guest per visit (in case of multiple entries per day)
visit_summary = aug_data.groupby(['guest_id', 'visit_date'], as_index=False)['amount_spent'].sum()

# Step 4: Filter guests with more than one visit
visit_counts = visit_summary['guest_id'].value_counts()
repeat_guests = visit_counts[visit_counts > 1].index

repeat_visits = visit_summary[visit_summary['guest_id'].isin(repeat_guests)]

# Step 5: For each guest, calculate difference between last and first visit spending
first_last_diff = (
    repeat_visits.sort_values(['guest_id', 'visit_date'])
    .groupby('guest_id')
    .agg(first_spend=('amount_spent', 'first'),
         last_spend=('amount_spent', 'last'))
    .reset_index()
)

# Step 6: Compute difference
first_last_diff['spend_diff'] = first_last_diff['last_spend'] - first_last_diff['first_spend']

# Step 7: Display result
print(first_last_diff[['guest_id', 'spend_diff']])

## Question 3

In September 2024, how can guests be categorized into distinct spending segments such as Low, Medium, and High based on their total spending? Use the following thresholds for categorization: 
-Low: Includes values from $0 up to, but not including, $50.
-Medium: Includes values from $50 up to, but not including, $100.
-High: Includes values from $100 and above. 
Exclude guests who did not make any purchases in the period.

In [None]:
# Step 1: Convert visit_date to datetime
fct_guest_spending['visit_date'] = pd.to_datetime(fct_guest_spending['visit_date'])

# Step 2: Filter for September 2024
sep_data = fct_guest_spending[
    (fct_guest_spending['visit_date'].dt.year == 2024) &
    (fct_guest_spending['visit_date'].dt.month == 9)
]

# Step 3: Total spending per guest
guest_spending = sep_data.groupby('guest_id', as_index=False)['amount_spent'].sum()
guest_spending.rename(columns={'amount_spent': 'total_spent'}, inplace=True)

# Step 4: Exclude guests with no spending
guest_spending = guest_spending[guest_spending['total_spent'] > 0]

# Step 5: Categorize into Low, Medium, High
def categorize_spending(amount):
    if amount < 50:
        return 'Low'
    elif amount < 100:
        return 'Medium'
    else:
        return 'High'

guest_spending['spending_segment'] = guest_spending['total_spent'].apply(categorize_spending)

# Step 6: Display result
print(guest_spending[['guest_id', 'total_spent', 'spending_segment']])

Made with ❤️ by [Interview Master](https://www.interviewmaster.ai)