Hey there! 👋 Welcome to Day 3 of the Python Party! 🎉

I'm Dawn, and I'll be your coding companion today. We're going to work on some fun Python challenges using pandas and data analysis.

Let's dive into today's challenge!

You are a data analyst working with the Disney Parks revenue team to understand nuanced guest spending patterns across different park experiences. The team wants to develop a comprehensive view of visitor purchasing behaviors. Your goal is to uncover meaningful insights that can drive personalized marketing strategies.

In [1]:
import pandas as pd
import numpy as np

In [2]:
fct_guest_spending = pd.read_csv(r"C:\Users\abarr\Desktop\ironhack\extra\python summer party\day3\guests.csv")
fct_guest_spending

Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type
0,1,2024-07-05,50.0,Attraction
1,2,2024-07-06,30.0,Dining
2,3,2024-07-10,20.5,Retail
3,4,2024-07-12,40.0,Entertainment
4,1,2024-07-15,35.0,Dining
5,5,2024-07-20,60.0,Attraction
6,6,2024-07-25,25.0,Retail
7,1,2024-08-03,55.0,Attraction
8,1,2024-08-15,45.0,Dining
9,2,2024-08-05,22.0,Retail


In [3]:
fct_guest_spending.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 4 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   guest_id              19 non-null     int64  
 1   visit_date            19 non-null     object 
 2   amount_spent          19 non-null     float64
 3   park_experience_type  19 non-null     object 
dtypes: float64(1), int64(1), object(2)
memory usage: 740.0+ bytes


In [4]:
fct_guest_spending['visit_date'] = pd.to_datetime(fct_guest_spending['visit_date'], format = '%Y-%m-%d')

In [9]:
all_experiences = fct_guest_spending['park_experience_type'].unique()
all_experiences

array(['Attraction', 'Dining', 'Retail', 'Entertainment',
       'Character Meet'], dtype=object)

### 1
What is the average spending per guest per visit for each park experience type during July 2024? Ensure that park experience types with no recorded transactions are shown with an average spending of 0.0. This analysis helps establish baseline spending differences essential for later segmentation.

In [6]:
july24 = fct_guest_spending[(fct_guest_spending['visit_date'].dt.month == 7) & (fct_guest_spending['visit_date'].dt.year == 2024)]
july24

Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type
0,1,2024-07-05,50.0,Attraction
1,2,2024-07-06,30.0,Dining
2,3,2024-07-10,20.5,Retail
3,4,2024-07-12,40.0,Entertainment
4,1,2024-07-15,35.0,Dining
5,5,2024-07-20,60.0,Attraction
6,6,2024-07-25,25.0,Retail


In [7]:
avg_spent = july24.groupby('park_experience_type')['amount_spent'].mean()
avg_spent

park_experience_type
Attraction       55.00
Dining           32.50
Entertainment    40.00
Retail           22.75
Name: amount_spent, dtype: float64

In [10]:
avg_spent_all = avg_spent.reindex(all_experiences, fill_value=0.0)
print(avg_spent_all)

park_experience_type
Attraction        55.00
Dining            32.50
Retail            22.75
Entertainment     40.00
Character Meet     0.00
Name: amount_spent, dtype: float64


### 2
For guests who visited our parks more than once in August 2024, what is the difference in spending between their first and their last visit? This investigation, using sequential analysis, will reveal any shifts in guest spending behavior over multiple visits.

In [12]:
aug2024 = fct_guest_spending[(fct_guest_spending['visit_date'].dt.month == 8) & (fct_guest_spending['visit_date'].dt.year == 2024)]
aug2024

Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type
7,1,2024-08-03,55.0,Attraction
8,1,2024-08-15,45.0,Dining
9,2,2024-08-05,22.0,Retail
10,2,2024-08-20,38.0,Entertainment
11,7,2024-08-10,15.0,Character Meet
12,3,2024-08-25,28.0,Retail
13,3,2024-08-27,32.0,Dining


In [17]:
aug2024_sorted = aug2024.sort_values(['guest_id','visit_date'])
aug2024

Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type,rank
7,1,2024-08-03,55.0,Attraction,1.5
8,1,2024-08-15,45.0,Dining,1.5
9,2,2024-08-05,22.0,Retail,3.5
10,2,2024-08-20,38.0,Entertainment,3.5
11,7,2024-08-10,15.0,Character Meet,7.0
12,3,2024-08-25,28.0,Retail,5.5
13,3,2024-08-27,32.0,Dining,5.5


In [20]:
aug2024['rank'] = aug2024.groupby('guest_id')['visit_date'].rank().astype(int)
aug2024

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aug2024['rank'] = aug2024.groupby('guest_id')['visit_date'].rank().astype(int)


Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type,rank
7,1,2024-08-03,55.0,Attraction,1
8,1,2024-08-15,45.0,Dining,2
9,2,2024-08-05,22.0,Retail,1
10,2,2024-08-20,38.0,Entertainment,2
11,7,2024-08-10,15.0,Character Meet,1
12,3,2024-08-25,28.0,Retail,1
13,3,2024-08-27,32.0,Dining,2


In [22]:
aug2024['prev_visit_spend'] = aug2024.groupby('guest_id')['amount_spent'].shift(1)
aug2024

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aug2024['prev_visit_spend'] = aug2024.groupby('guest_id')['amount_spent'].shift(1)


Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type,rank,prev_visit_spend
7,1,2024-08-03,55.0,Attraction,1,
8,1,2024-08-15,45.0,Dining,2,55.0
9,2,2024-08-05,22.0,Retail,1,
10,2,2024-08-20,38.0,Entertainment,2,22.0
11,7,2024-08-10,15.0,Character Meet,1,
12,3,2024-08-25,28.0,Retail,1,
13,3,2024-08-27,32.0,Dining,2,28.0


In [23]:
aug2024['spending_diff'] = aug2024['prev_visit_spend'] - aug2024['amount_spent']
aug2024

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aug2024['spending_diff'] = aug2024['prev_visit_spend'] - aug2024['amount_spent']


Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type,rank,prev_visit_spend,spending_diff
7,1,2024-08-03,55.0,Attraction,1,,
8,1,2024-08-15,45.0,Dining,2,55.0,10.0
9,2,2024-08-05,22.0,Retail,1,,
10,2,2024-08-20,38.0,Entertainment,2,22.0,-16.0
11,7,2024-08-10,15.0,Character Meet,1,,
12,3,2024-08-25,28.0,Retail,1,,
13,3,2024-08-27,32.0,Dining,2,28.0,-4.0


In [25]:
second_visits = aug2024.dropna()
second_visits

Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type,rank,prev_visit_spend,spending_diff
8,1,2024-08-15,45.0,Dining,2,55.0,10.0
10,2,2024-08-20,38.0,Entertainment,2,22.0,-16.0
13,3,2024-08-27,32.0,Dining,2,28.0,-4.0


In [28]:
second_visits_clean = second_visits[['guest_id','spending_diff']]
second_visits_clean

Unnamed: 0,guest_id,spending_diff
8,1,10.0
10,2,-16.0
13,3,-4.0


### above we're calcultaing consecutive visits we need to calculate diference from first to last visit

In [32]:
aug24 = fct_guest_spending[(fct_guest_spending['visit_date'].dt.month == 8) & (fct_guest_spending['visit_date'].dt.year == 2024)]
aug24['visit_number'] = aug24.groupby('guest_id')['visit_date'].rank().astype(int)
aug24

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  aug24['visit_number'] = aug24.groupby('guest_id')['visit_date'].rank().astype(int)


Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type,visit_number
7,1,2024-08-03,55.0,Attraction,1
8,1,2024-08-15,45.0,Dining,2
9,2,2024-08-05,22.0,Retail,1
10,2,2024-08-20,38.0,Entertainment,2
11,7,2024-08-10,15.0,Character Meet,1
12,3,2024-08-25,28.0,Retail,1
13,3,2024-08-27,32.0,Dining,2


In [43]:
guests_multiple_visits = aug24.groupby('guest_id')['visit_number'].max()
guests_to_keep = guests_multiple_visits[guests_multiple_visits > 1].index

# Filter the dataframe to keep only these guests
df_multiple_visits = aug24[aug24['guest_id'].isin(guests_to_keep)]
df_multiple_visits

Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type,visit_number
7,1,2024-08-03,55.0,Attraction,1
8,1,2024-08-15,45.0,Dining,2
9,2,2024-08-05,22.0,Retail,1
10,2,2024-08-20,38.0,Entertainment,2
12,3,2024-08-25,28.0,Retail,1
13,3,2024-08-27,32.0,Dining,2


In [44]:
last_visit_idx = df_multiple_visits.groupby('guest_id')['visit_number'].idxmax()

In [45]:
last_visits = df_multiple_visits.loc[last_visit_idx]

In [46]:
first_visits = df_multiple_visits[df_multiple_visits['visit_number'] == 1]

In [47]:
comparison = first_visits[['guest_id', 'amount_spent']].merge(last_visits[['guest_id', 'amount_spent']], on='guest_id', suffixes=('_first', '_last'))
comparison

Unnamed: 0,guest_id,amount_spent_first,amount_spent_last
0,1,55.0,45.0
1,2,22.0,38.0
2,3,28.0,32.0


In [48]:
comparison['spending_diff'] = comparison['amount_spent_last'] - comparison['amount_spent_first']
print(comparison)

   guest_id  amount_spent_first  amount_spent_last  spending_diff
0         1                55.0               45.0          -10.0
1         2                22.0               38.0           16.0
2         3                28.0               32.0            4.0


### 3
In September 2024, how can guests be categorized into distinct spending segments such as Low, Medium, and High based on their total spending? Use the following thresholds for categorization:
-Low: Includes values from $0 up to, but not including, $50.
-Medium: Includes values from $50 up to, but not including, $100.
-High: Includes values from $100 and above.
Exclude guests who did not make any purchases in the period.

In [50]:
sep24 = fct_guest_spending[(fct_guest_spending['visit_date'].dt.month == 9) & (fct_guest_spending['visit_date'].dt.year == 2024)]
sep24

#fct_guest_spending[(fct_guest_spending['visit_date'].dt.month == 8) & (fct_guest_spending['visit_date'].dt.year == 2024)]

Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type
14,1,2024-09-02,65.0,Attraction
15,8,2024-09-05,50.0,Retail
16,9,2024-09-15,40.0,Dining
17,10,2024-09-20,70.0,Entertainment
18,1,2024-09-25,35.0,Dining


In [54]:
sep24['guest_spending'] = sep24.groupby('guest_id')['amount_spent'].transform(sum)

  sep24['guest_spending'] = sep24.groupby('guest_id')['amount_spent'].transform(sum)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sep24['guest_spending'] = sep24.groupby('guest_id')['amount_spent'].transform(sum)


In [57]:
sep24

Unnamed: 0,guest_id,visit_date,amount_spent,park_experience_type,guest_spending
14,1,2024-09-02,65.0,Attraction,100.0
15,8,2024-09-05,50.0,Retail,50.0
16,9,2024-09-15,40.0,Dining,40.0
17,10,2024-09-20,70.0,Entertainment,70.0
18,1,2024-09-25,35.0,Dining,100.0


In [69]:
guest_category = pd.DataFrame(sep24.groupby('guest_id')['guest_spending'].mean())
guest_category

Unnamed: 0_level_0,guest_spending
guest_id,Unnamed: 1_level_1
1,100.0
8,50.0
9,40.0
10,70.0


In [73]:
# Exclude guests with $0 spending (no purchases)
guest_category_filtered = guest_category[guest_category['guest_spending'] > 0].copy()


In [80]:
guest_category_filtered

Unnamed: 0_level_0,guest_spending
guest_id,Unnamed: 1_level_1
1,100.0
8,50.0
9,40.0
10,70.0


In [85]:
def categorize(spending):
    if 0 < spending < 50:      # $0.01 to $49.99
        return 'Low'
    elif 50 <= spending < 100: # $50.00 to $99.99
        return 'Medium'
    else:                      # $100.00 and above
        return 'High'
    
guest_category_filtered['category'] = guest_category_filtered['guest_spending'].apply(categorize)


In [86]:
guest_category_filtered

Unnamed: 0_level_0,guest_spending,category
guest_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,100.0,High
8,50.0,Medium
9,40.0,Low
10,70.0,Medium
