# Day 3: Disney Parks Guest Spending Behavior

You are a data analyst working with the Disney Parks revenue team to understand nuanced guest spending patterns across different park experiences. The team wants to develop a comprehensive view of visitor purchasing behaviors. Your goal is to uncover meaningful insights that can drive personalized marketing strategies.

In [None]:
import pandas as pd
import numpy as np

fct_guest_spending_data = [
  {
    "guest_id": 1,
    "visit_date": "2024-07-05",
    "amount_spent": 50,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-07-06",
    "amount_spent": 30,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-07-10",
    "amount_spent": 20.5,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 4,
    "visit_date": "2024-07-12",
    "amount_spent": 40,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-07-15",
    "amount_spent": 35,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 5,
    "visit_date": "2024-07-20",
    "amount_spent": 60,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 6,
    "visit_date": "2024-07-25",
    "amount_spent": 25,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-08-03",
    "amount_spent": 55,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-08-15",
    "amount_spent": 45,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-08-05",
    "amount_spent": 22,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 2,
    "visit_date": "2024-08-20",
    "amount_spent": 38,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 7,
    "visit_date": "2024-08-10",
    "amount_spent": 15,
    "park_experience_type": "Character Meet"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-08-25",
    "amount_spent": 28,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 3,
    "visit_date": "2024-08-27",
    "amount_spent": 32,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-09-02",
    "amount_spent": 65,
    "park_experience_type": "Attraction"
  },
  {
    "guest_id": 8,
    "visit_date": "2024-09-05",
    "amount_spent": 50,
    "park_experience_type": "Retail"
  },
  {
    "guest_id": 9,
    "visit_date": "2024-09-15",
    "amount_spent": 40,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 10,
    "visit_date": "2024-09-20",
    "amount_spent": 70,
    "park_experience_type": "Entertainment"
  },
  {
    "guest_id": 1,
    "visit_date": "2024-09-25",
    "amount_spent": 35,
    "park_experience_type": "Dining"
  },
  {
    "guest_id": 8,
    "visit_date": "2024-09-28",
    "amount_spent": 10,
    "park_experience_type": "Character Meet"
  }
]
fct_guest_spending = pd.DataFrame(fct_guest_spending_data)


## Question 1

What is the average spending per guest per visit for each park experience type during July 2024? Ensure that park experience types with no recorded transactions are shown with an average spending of 0.0. This analysis helps establish baseline spending differences essential for later segmentation.

In [None]:
# Convert 'visit_date' to datetime objects for easy filtering
fct_guest_spending['visit_date'] = pd.to_datetime(fct_guest_spending['visit_date'])

# Filter the DataFrame for visits that occurred in July 2024
july_2024_spending = fct_guest_spending[(fct_guest_spending['visit_date'].dt.year == 2024) & (fct_guest_spending['visit_date'].dt.month == 7)]

# Group by park experience type and calculate the average spending
avg_spending_july_2024 = july_2024_spending.groupby('park_experience_type')['amount_spent'].mean()

# Get all unique park experience types from the original table
all_experience_types = fct_guest_spending['park_experience_type'].unique()

# Reindex the result to include all experience types, filling missing ones with 0.0
final_avg_spending = avg_spending_july_2024.reindex(all_experience_types, fill_value=0.0)

# Rename the series for clarity in the output
final_avg_spending.name = "average_spending"

print(final_avg_spending)

## Question 2

For guests who visited our parks more than once in August 2024, what is the difference in spending between their first and their last visit? This investigation, using sequential analysis, will reveal any shifts in guest spending behavior over multiple visits.

In [None]:
# Ensure 'visit_date' is in datetime format
fct_guest_spending['visit_date'] = pd.to_datetime(fct_guest_spending['visit_date'])

# 1. Filter for visits that occurred in August 2024
august_2024_spending = fct_guest_spending[
    (fct_guest_spending['visit_date'].dt.year == 2024) & 
    (fct_guest_spending['visit_date'].dt.month == 8)
]

# 2. Isolate guests who visited more than once in the month
multi_visit_df = august_2024_spending.groupby('guest_id').filter(lambda x: len(x) > 1)

# 3. Sort the visits chronologically to identify the first and last
sorted_visits = multi_visit_df.sort_values(by='visit_date')

# Get the full record for the first and last visit of each guest
first_visits = sorted_visits.groupby('guest_id').first()
last_visits = sorted_visits.groupby('guest_id').last()

# 4. Calculate the difference in spending
spending_difference = last_visits['amount_spent'] - first_visits['amount_spent']
spending_difference.name = 'spending_difference'

# Display the final result
print(spending_difference)

## Question 3

In September 2024, how can guests be categorized into distinct spending segments such as Low, Medium, and High based on their total spending? Use the following thresholds for categorization: 
-Low: Includes values from $0 up to, but not including, $50.
-Medium: Includes values from $50 up to, but not including, $100.
-High: Includes values from $100 and above. 
Exclude guests who did not make any purchases in the period.

In [None]:
# Convert 'visit_date' to datetime objects for easy filtering
fct_guest_spending['visit_date'] = pd.to_datetime(fct_guest_spending['visit_date'])

# 1. Filter for visits that occurred in September 2024
september_2024_spending = fct_guest_spending[
    (fct_guest_spending['visit_date'].dt.year == 2024) & 
    (fct_guest_spending['visit_date'].dt.month == 9)
]

# 2. Calculate the total spending for each guest
total_spending_per_guest = september_2024_spending.groupby('guest_id')['amount_spent'].sum()

# --- Correction ---
# 3. Exclude guests who did not make a purchase (i.e., total spending is not > 0)
guests_with_purchases = total_spending_per_guest[total_spending_per_guest > 0]

# 4. Define the bins and labels for segmentation
bins = [0, 50, 100, float('inf')]
labels = ['Low', 'Medium', 'High']

# Apply segmentation only to guests who made purchases
guest_segments = pd.cut(guests_with_purchases, bins=bins, labels=labels, right=False)
guest_segments.name = 'spending_segment'

# 5. Display the final, corrected result
print(guest_segments)

Made with ❤️ by [Interview Master](https://www.interviewmaster.ai)