<a href="https://colab.research.google.com/github/AnamHJ24/datascience-python-challenges/blob/main/notebooks/Day3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Day 3 - Disney

You are a data analyst working with the **Disney** Parks revenue team to understand nuanced
guest spending patterns across different park experiences. The team wants to develop a
comprehensive view of visitor purchasing behaviors. Your goal is to uncover meaningful
insights that can drive personalized marketing strategies.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np

# Import data file
url = "https://raw.githubusercontent.com/AnamHJ24/datascience-python-challenges/refs/heads/main/Data/Day3.txt"
fct_guest_spending = pd.read_csv(url)

fct_guest_spending.head()

Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type
0,1,2024-07-05,50.0,Attraction
1,2,2024-07-06,30.0,Dining
2,3,2024-07-10,20.5,Retail
3,4,2024-07-12,40.0,Entertainment
4,1,2024-07-15,35.0,Dining


# Question 1
What is the average spending per guest per visit for each park experience type during July 2024?
Ensure that park experience types with no recorded transactions are shown with an average spending
of 0.0. This analysis helps establish baseline spending differences essential for later segmentation.

# Solution


In [7]:
# Convert required column to datetime
fct_guest_spending['visit_date'] = pd.to_datetime(fct_guest_spending['visit_date'])

# Filter July 2024 data
july = fct_guest_spending[(fct_guest_spending['visit_date'].dt.year == 2024) & (fct_guest_spending['visit_date'].dt.month == 7)]

# Identify unique guests
all_experiences = fct_guest_spending['park_experience_type'].unique()

# Calculate average spending per guest
spending_per_exp= july.groupby(['guest_id', 'visit_date','park_experience_type'])['amount_spent'].sum().reset_index()
avg_spending = spending_per_exp.groupby('park_experience_type')['amount_spent'].mean()
final_spending = avg_spending.reindex(all_experiences, fill_value = 0.0)
formatted_spending = final_spending.apply(lambda x: f"${x:,.2f}")
print("Average spending per guest per visit for each park experience type during July 2024:\n")
print(formatted_spending)

Average spending per guest per visit for each park experience type during July 2024:

park_experience_type
Attraction        $55.00
Dining            $32.50
Retail            $22.75
Entertainment     $40.00
Character Meet     $0.00
Name: amount_spent, dtype: object


## Question 2
For guests who visited our parks more than once in August 2024, what is the difference in spending
between their first and their last visit? This investigation, using sequential analysis, will reveal any shifts
in guest spending behavior over multiple visits.

## Solution

In [11]:
# Filter August 2024 data
aug_2024 = fct_guest_spending[
    (fct_guest_spending['visit_date'].dt.year == 2024) &
     (fct_guest_spending['visit_date'].dt.month == 8)]

# Count number of visits for each guest
visit_count = aug_2024.groupby('guest_id')['visit_date'].nunique()

# Finding repeating guests
repeat_guests = visit_count[visit_count > 1].index
repeat_guests_data = aug_2024[aug_2024['guest_id'].isin(repeat_guests)]

# Calculate amount spent in first and last visits
spending = repeat_guests_data.groupby(['guest_id', 'visit_date'])['amount_spent'].sum().reset_index()
first_visit = spending.sort_values('visit_date').drop_duplicates('guest_id', keep = 'first')
last_visit = spending.sort_values('visit_date').drop_duplicates('guest_id', keep = 'last')
result = pd.merge(first_visit[['guest_id', 'amount_spent']], last_visit[['guest_id', 'amount_spent']], on = 'guest_id', suffixes = ('_first','_last'))
result['Spending Diff'] = result['amount_spent_first'] - result['amount_spent_last']
print("The difference in spending between their first and their last visit:\n")
print(result)




The difference in spending between their first and their last visit:

   guest_id  amount_spent_first  amount_spent_last  Spending Diff
0         1                55.0               45.0           10.0
1         2                22.0               38.0          -16.0
2         3                28.0               32.0           -4.0


## Question 3
In September 2024, how can guests be categorized into distinct spending segments such as Low,
Medium, and High based on their total spending? Use the following thresholds for categorization:

*   **Low**: Includes values from \$0 up to, but not including, \$50.
*   **Medium**: Includes values from \$50 up to, but not including, \$100.
*   **High**: Includes values from $100 and above.


Exclude guests who did not make any purchases in the period.


## Solution

In [14]:
# Filter September 2024 data
sept_2024 = fct_guest_spending[
    (fct_guest_spending['visit_date'].dt.year == 2024) &
    (fct_guest_spending['visit_date'].dt.month == 9)]

# Calculate total spending of each guest
total_spending = sept_2024.groupby('guest_id')['amount_spent'].sum().reset_index()
total_spending = total_spending[total_spending['amount_spent']> 0]

# Separate into bins accordingly
bins = [0, 50, 100, float('inf')]
labels = ['Low', 'Medium', 'High']
total_spending['Spending Segment'] = pd.cut(total_spending['amount_spent'],
                                            bins=bins,labels=labels,right=False,
                                            include_lowest=True)

print(total_spending['Spending Segment'].value_counts())
print("\nLOW SPENDERS:")
print(total_spending[total_spending['Spending Segment'] == 'Low'].head())
print("\nMEDIUM SPENDERS:")
print(total_spending[total_spending['Spending Segment'] == 'Medium'].head())
print("\nHIGH SPENDERS:")
print(total_spending[total_spending['Spending Segment'] == 'High'].head())

Spending Segment
Medium    2
Low       1
High      1
Name: count, dtype: int64

LOW SPENDERS:
   guest_id  amount_spent Spending Segment
2         9          40.0              Low

MEDIUM SPENDERS:
   guest_id  amount_spent Spending Segment
1         8          60.0           Medium
3        10          70.0           Medium

HIGH SPENDERS:
   guest_id  amount_spent Spending Segment
0         1         100.0             High
