In [120]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:,.0f}'.format)

import warnings
warnings.filterwarnings("ignore")

In [121]:
df = pd.read_csv('../data/cleaned_data.csv')

df['date'] = pd.to_datetime(df['date'])
df.head()

Unnamed: 0,date,store_nbr,family,sales,onpromotion,holiday_type,locale,transferred,dcoilwtico,city,state,store_type,cluster,transactions,year,month,week,quarter,day_of_week,is_crisis,sales_lag_7,rolling_mean_7,is_weekend,is_holiday,promo_last_7_days
0,2013-01-01,1,AUTOMOTIVE,0,0,Holiday,National,False,93,Quito,Pichincha,D,13,0,2013,1,1,1,Tuesday,0,0,0,0,1,0
1,2013-01-01,1,BABY CARE,0,0,Holiday,National,False,93,Quito,Pichincha,D,13,0,2013,1,1,1,Tuesday,0,0,0,0,1,0
2,2013-01-01,1,BEAUTY,0,0,Holiday,National,False,93,Quito,Pichincha,D,13,0,2013,1,1,1,Tuesday,0,0,0,0,1,0
3,2013-01-01,1,BEVERAGES,0,0,Holiday,National,False,93,Quito,Pichincha,D,13,0,2013,1,1,1,Tuesday,0,0,0,0,1,0
4,2013-01-01,1,BOOKS,0,0,Holiday,National,False,93,Quito,Pichincha,D,13,0,2013,1,1,1,Tuesday,0,0,0,0,1,0


## 🔍 1. How Have Total Sales Evolved Over Time?

To understand the overall business trend, we calculated the total sales per day from the dataset.

In [122]:
sales_over_time = df.groupby('date')['sales'].sum().reset_index()
sales_over_time

Unnamed: 0,date,sales
0,2013-01-01,2512
1,2013-01-02,496092
2,2013-01-03,361461
3,2013-01-04,354460
4,2013-01-05,477350
...,...,...
1679,2017-08-11,826374
1680,2017-08-12,792631
1681,2017-08-13,865640
1682,2017-08-14,760922


### Key Findings:
- Daily sales range from as low as ~2.5K to over 860K in some peak days.
- There is a clear upward trend in daily revenue, with seasonal fluctuations likely present (to be analyzed in later steps).

> Corresponding plots will be in cell `4` in `visualization_demo.ipynb`

## 🔍 2. Which products or categories contribute the most to total revenue?

Based on the total sales data, the following products or categories contribute the most to the total revenue:

In [123]:
top_products = df.groupby('family')['sales'].sum().sort_values(ascending=False).head(20)
top_products

family
GROCERY I             350,827,298
BEVERAGES             221,663,540
PRODUCE               125,447,968
CLEANING               99,421,019
DAIRY                  65,823,605
BREAD/BAKERY           42,959,924
POULTRY                32,494,451
MEATS                  31,650,996
PERSONAL CARE          25,100,482
DELI                   24,585,627
HOME CARE              16,409,522
EGGS                   15,881,196
FROZEN FOODS           14,646,940
PREPARED FOODS          8,966,728
LIQUOR,WINE,BEER        7,937,172
SEAFOOD                 2,051,636
GROCERY II              2,004,966
HOME AND KITCHEN I      1,905,076
HOME AND KITCHEN II     1,556,511
CELEBRATION               779,502
Name: sales, dtype: float64

1. **GROCERY I**: $350,827,298
2. **BEVERAGES**: $221,663,540
3. **PRODUCE**: $125,447,968
4. **CLEANING**: $99,421,019
5. **DAIRY**: $65,823,605

These categories make up the bulk of the revenue, with **GROCERY I** leading by a significant margin. The top five categories contribute substantially to the overall sales, while the remaining categories (such as **CELEBRATION** and **HOME AND KITCHEN II**) have relatively smaller contributions.

In the analysis, we can observe that categories related to essential products (like groceries, beverages, and produce) lead in sales, which might reflect consistent consumer demand. Further analysis could explore seasonality and trends within these top categories.

> Corresponding plots will be in cell `6` in `visualization_demo.ipynb`

## 🔍 3. Which stores, cities, or states are the top performers in terms of revenue?

In [124]:
top_stores = df.groupby('store_nbr')['sales'].sum().sort_values(ascending=False)
top_cities = df.groupby('city')['sales'].sum().sort_values(ascending=False)
top_regions = df.groupby('state')['sales'].sum().sort_values(ascending=False)

print("Top Stores by Revenue:")
print(top_stores.head())  

print("\n \nTop Cities by Revenue:")
print(top_cities.head()) 

print("\n \nTop States by Revenue:")
print(top_regions.head()) 

Top Stores by Revenue:
store_nbr
44   63,356,137
45   55,689,022
47   52,024,476
3    51,533,528
49   44,346,823
Name: sales, dtype: float64

 
Top Cities by Revenue:
city
Quito           568,679,349
Guayaquil       125,572,186
Cuenca           50,194,046
Ambato           41,159,773
Santo Domingo    36,617,572
Name: sales, dtype: float64

 
Top States by Revenue:
state
Pichincha                        597,585,883
Guayas                           168,649,985
Azuay                             50,194,046
Tungurahua                        41,159,773
Santo Domingo de los Tsachilas    36,617,572
Name: sales, dtype: float64


Based on the total sales data, the following stores, cities, and regions are the top performers:

### **Top Stores by Revenue:**
1. **Store 44**: $63,356,137
2. **Store 45**: $55,689,022
3. **Store 47**: $52,024,476
4. **Store 3**: $51,533,528
5. **Store 49**: $44,346,823

### **Top Cities by Revenue:**
1. **Quito**: $568,679,349
2. **Guayaquil**: $125,572,186
3. **Cuenca**: $50,194,046
4. **Ambato**: $41,159,773
5. **Santo Domingo**: $36,617,572

### **Top States by Revenue:**
1. **Pichincha**: $597,585,883
2. **Guayas**: $168,649,985
3. **Azuay**: $50,194,046
4. **Tungurahua**: $41,159,773
5. **Santo Domingo de los Tsachilas**: $36,617,572

These top performers highlight the most significant contributors to revenue, with **Quito** leading at the city level and **Pichincha** being the highest-performing state. In terms of stores, Store 44 generates the highest revenue.

This analysis can help identify key areas for growth and focus, particularly in high-revenue cities and states.

> Corresponding plots will be in cell `8`, `10` and `12` in `visualization_demo.ipynb`

## 🔍 4. What is the average order size across stores, regions, and categories?

In [125]:
df['transactions'].dtype

dtype('float64')

In [126]:
print((df['transactions'] == 0).sum())

249117


In [127]:
df[['sales', 'transactions']].describe()

Unnamed: 0,sales,transactions
count,3054348,3054348
mean,359,1559
std,1107,1036
min,0,0
25%,0,931
50%,11,1332
75%,196,1980
max,124717,8359


In [128]:
zero_transactions = df[df['transactions'] == 0]
zero_sales_with_zero_transactions = zero_transactions[zero_transactions['sales'] == 0]

# Check if all zero transactions have zero sales
all_match = len(zero_transactions) == len(zero_sales_with_zero_transactions)
print("All zero transactions have zero sales:", all_match)

All zero transactions have zero sales: False


In [129]:
df_valid = df[~((df['transactions'] == 0) & (df['sales'] > 0))]

avg_order_size_store = df_valid.groupby('store_nbr').apply(lambda x: x['sales'].sum() / x['transactions'].sum()).sort_values(ascending=False)
avg_order_size_region = df_valid.groupby('state').apply(lambda x: x['sales'].sum() / x['transactions'].sum()).sort_values(ascending=False)
avg_order_size_category = df_valid.groupby('family').apply(lambda x: x['sales'].sum() / x['transactions'].sum()).sort_values(ascending=False)

print("Average Order Size by Store:")
print(avg_order_size_store.head())  

print("\nAverage Order Size by Region:")
print(avg_order_size_region.head()) 

print("\nAverage Order Size by Category:")
print(avg_order_size_category.head())  

Average Order Size by Store:
store_nbr
51   0
42   0
21   0
29   0
52   0
dtype: float64

Average Order Size by Region:
state
Azuay      0
Manabi     0
El Oro     0
Pastaza    0
Los Rios   0
dtype: float64

Average Order Size by Category:
family
GROCERY I   2
BEVERAGES   2
PRODUCE     1
CLEANING    1
DAIRY       0
dtype: float64


In [130]:
df.columns

Index(['date', 'store_nbr', 'family', 'sales', 'onpromotion', 'holiday_type',
       'locale', 'transferred', 'dcoilwtico', 'city', 'state', 'store_type',
       'cluster', 'transactions', 'year', 'month', 'week', 'quarter',
       'day_of_week', 'is_crisis', 'sales_lag_7', 'rolling_mean_7',
       'is_weekend', 'is_holiday', 'promo_last_7_days'],
      dtype='object')

## ⏳ 5. Are there noticeable weekly, monthly, or quarterly seasonality patterns in sales?

### What are the trends in sales per day of the week?


In [131]:
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

In [132]:
avg_sales_by_day = df.groupby('day_of_week')['sales'].mean().reindex(day_order)
avg_sales_by_day

day_of_week
Monday      348
Tuesday     320
Wednesday   331
Thursday    287
Friday      327
Saturday    435
Sunday      465
Name: sales, dtype: float64

#### Key Observations:
- **Weekend Effect**: Sales are noticeably higher on **Saturday** (435) and **Sunday** (465) compared to weekdays, with Sunday being the highest.
- **Weekday Pattern**: Sales tend to be lower on weekdays, with **Thursday** (287) showing the lowest average sales.
- **Midweek Consistency**: **Monday**, **Tuesday**, and **Wednesday** have relatively similar sales, with a slight dip on **Thursday**.
  
These trends suggest that sales are highest on the weekends, potentially due to increased customer activity, while weekdays (especially Thursday) see a decline in sales.

### What are the trends in sales per week?

In [133]:
weekly_avg_sales = df.groupby('week')['sales'].mean().reset_index()
weekly_avg_sales

Unnamed: 0,week,sales
0,1,409
1,2,348
2,3,338
3,4,329
4,5,344
5,6,320
6,7,310
7,8,312
8,9,358
9,10,359


#### Key Observations:
- **Seasonal Pattern**: Sales generally fluctuate throughout the year, with some notable peaks and valleys.
- **Peak Sales Weeks**: Weeks **51** (484) and **52** (483) show the highest sales, which could be related to the end-of-year sales spikes (e.g., holiday season).
- **Lowest Sales Weeks**: Week **34** (307) experienced the lowest average sales, suggesting a potential dip in sales during that period.
- **Consistent Highs**: Weeks **45** (407), **49** (417), and **36** (400) also saw relatively high sales, indicating strong performance during certain periods of the year.

These trends suggest that there may be seasonal or external factors (such as holidays or promotions) that cause sales to rise or fall in certain weeks. Identifying and aligning marketing or sales strategies with these periods can be beneficial.

### What are the trends in sales per month?

In [134]:
monthly_avg = df.groupby('month')['sales'].mean()
monthly_avg

month
1    342
2    321
3    352
4    341
5    346
6    353
7    376
8    337
9    362
10   362
11   377
12   457
Name: sales, dtype: float64

#### Key Observations:
- **Strong End-of-Year Sales**: The highest sales occur in **December** (457), likely due to the holiday season and increased consumer spending.
- **Peak in Mid-Year**: **July** (376) also sees a significant rise in sales, potentially related to mid-year promotions or seasonal trends.
- **Dip in Early Months**: **February** (321) experiences the lowest sales, possibly due to lower consumer activity after the holiday season.
- **Stable Performance**: Other months like **March** (352), **June** (353), and **November** (377) show fairly consistent and strong performance.

These trends suggest a potential seasonal pattern where sales peak in the second half of the year, especially during holidays or mid-year events. Analyzing external factors like promotions or holiday schedules could help explain these fluctuations.

### What are the trends in sales per quarter?

In [135]:
quarterly_avg = df.groupby(['quarter', 'year'])['sales'].mean()
quarterly_avg

quarter  year
1        2013   196
         2014   320
         2015   276
         2016   426
         2017   476
2        2013   211
         2014   243
         2015   334
         2016   455
         2017   486
3        2013   212
         2014   325
         2015   417
         2016   420
         2017   482
4        2013   248
         2014   405
         2015   458
         2016   485
Name: sales, dtype: float64

#### Key Observations:
- **Growth in Sales Over Time**: There is a clear upward trend in sales from 2013 to 2017 across all quarters, with the highest sales recorded in **2017**.
  - Quarter 1 in **2017** (476) and Quarter 2 in **2017** (486) show a noticeable increase compared to previous years.
- **Quarterly Performance**:
  - **Quarter 1** has the lowest sales in the early years (2013-2014), but by **2017**, it shows strong growth.
  - **Quarter 4** also shows solid performance in all years, with **2017** again leading the trend with **485**.
  - **Quarter 3** tends to be the highest performer from **2015** onward, peaking at **482** in **2017**.
  
These trends suggest a steady growth trajectory in sales over the years, with significant improvement in later years, especially in **2017**, indicating possible business expansion, new product offerings, or other positive changes within the company.

## ⏳ 6. How do sales differ on weekdays versus weekends?

In [136]:
sales_comparison = df.groupby('is_weekend')['sales'].agg(['sum', 'mean', 'count']).rename(index={True: 'Weekend', False: 'Weekday'})

sales_comparison.columns = ['Total Sales', 'Average Sales per Day', 'Number of Days']
sales_comparison

Unnamed: 0_level_0,Total Sales,Average Sales per Day,Number of Days
is_weekend,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Weekday,701364353,322,2175822
Weekend,395210391,450,878526


#### 🔍 Insights:
- **Weekends** show a **higher average sales per day**, despite having fewer days overall.
- This indicates increased consumer activity or spending intensity during weekends.
- **Weekdays** contribute more to total sales volume due to sheer number of days, but the **per-day performance is stronger on weekends**.

## ⏳ 7. Are sales peaking during certain months, holidays, or quarters of the year?

#### 1. Monthly Sales Peaks

In [172]:
monthly_sales = df.groupby('month')['sales'].mean().sort_values(ascending=False)
print("Average Sales by Month (Highest to Lowest):\n", monthly_sales)

Average Sales by Month (Highest to Lowest):
 month
12   457
11   377
7    376
10   362
9    362
6    353
3    352
5    346
1    342
4    341
8    337
2    321
Name: sales, dtype: float64


**Insights:**
- Sales **peak in December**, indicating strong end-of-year demand—likely driven by holidays and promotions.
- **November and July** also show high sales, suggesting seasonal boosts during those months.
- **February** consistently has the **lowest average sales**, possibly due to fewer days and post-holiday consumer fatigue.

#### 2. Holiday vs. Non-Holiday Sales

In [173]:
holiday_sales = df.groupby('is_holiday')['sales'].mean()
holiday_sales.index = ['Non-Holiday', 'Holiday']
print("Average Sales:\n", holiday_sales)

Average Sales:
 Non-Holiday   352
Holiday       394
Name: sales, dtype: float64


**Insights:**
- Sales are **higher during holidays**, with an average of **394** compared to **352** on non-holidays.
- This indicates that holidays positively impact sales, likely due to increased consumer activity, promotions, or special events.

#### 4. Specific Holidays 

In [174]:
specific_holidays = df[df['is_holiday'] == 1].groupby('holiday_type')['sales'].mean().sort_values(ascending=False)
print("Average Sales by Holiday Type:\n", specific_holidays)

Average Sales by Holiday Type:
 holiday_type
Additional   488
Transfer     468
Bridge       447
Event        426
Work Day     372
Holiday      358
Name: sales, dtype: float64


**Insights:**
- **Additional holidays** generate the highest average sales (**488**), followed by **Transfer** and **Bridge** holidays.
- These spikes may indicate extended weekends or special shopping days where promotions are common.
- Even regular **Work Days** labeled as holidays see a boost compared to non-holiday averages.
- **Named "Holiday"** days have the lowest holiday-type sales, suggesting they may fall on less commercially significant days.

This breakdown helps identify which types of holidays drive the most consumer spending.

#### 5. Yearly + Quarterly Breakdown

In [175]:
qtr_year = df.groupby(['quarter', 'year'])['sales'].mean().unstack().fillna(0)
print("Quarterly Sales by Year:\n", qtr_year)

Quarterly Sales by Year:
 year     2013  2014  2015  2016  2017
quarter                              
1         196   320   276   426   476
2         211   243   334   455   486
3         212   325   417   420   482
4         248   405   458   485     0


The following table shows the **average sales per quarter** for each year:

| Quarter | 2013 | 2014 | 2015 | 2016 | 2017 |
|---------|------|------|------|------|------|
| Q1      | 196  | 320  | 276  | 426  | 476  |
| Q2      | 211  | 243  | 334  | 455  | 486  |
| Q3      | 212  | 325  | 417  | 420  | 482  |
| Q4      | 248  | 405  | 458  | 485  | 0    |

**Insights:**
- There is a **clear upward trend in quarterly sales over the years**, especially noticeable from 2013 to 2016.
- **2017 shows strong Q1–Q3 performance**, but data for Q4 are not recorded (`0`).
- **Q4** tends to have the **highest sales in most years**, aligning with end-of-year events and holiday shopping seasons.
- The **largest year-over-year growth** appears between **2014 and 2015**, especially in Q3 and Q4.

This trend analysis can be useful for planning inventory, staffing, and promotions based on seasonal peaks.

## ⏳ 8. Which months consistently generate peak sales?

In [177]:
monthly_sales = df.groupby(['year', 'month'])['sales'].sum().reset_index()

monthly_sales_pivot = monthly_sales.pivot(index='month', columns='year', values='sales')

monthly_avg = monthly_sales.groupby('month')['sales'].mean().sort_values(ascending=False)

print("Average Sales by Month (Descending):")
print(monthly_avg)

Average Sales by Month (Descending):
month
12   25,470,522
7    21,598,791
11   20,316,398
6    20,227,237
10   20,020,095
5    19,710,506
3    19,445,697
9    19,368,420
1    18,888,430
4    18,482,017
8    16,694,475
2    16,127,446
Name: sales, dtype: float64


**Insights:**
- **December** consistently generates the **highest sales**, likely due to the holiday shopping season.
- **July** and **November** follow closely, which may indicate summer and pre-holiday season peaks.
- **February** and **August** tend to have the **lowest average sales**, possibly due to post-holiday lulls or mid-summer slowdowns.

These trends can help identify the most profitable months for promotional campaigns, staffing, and inventory planning.

## 💸 9. What impact do promotions have on sales volume?

In [185]:
df['promotion_status'] = df['onpromotion'].apply(lambda x: 'On Promotion' if x > 0 else 'Not On Promotion')
print(df['promotion_status'].value_counts())

promotion_status
Not On Promotion    2428528
On Promotion         625820
Name: count, dtype: int64


In [186]:
avg_sales_by_promotion = df.groupby('promotion_status')['sales'].mean().reset_index()
avg_sales_by_promotion

Unnamed: 0,promotion_status,sales
0,Not On Promotion,158
1,On Promotion,1140


**Insights:**
- Sales are significantly higher when products are **on promotion**, with an average of **1,140** compared to **158** when not on promotion.
- This highlights the effectiveness of promotional strategies in driving sales and suggests that marketing efforts, such as discounts and special offers, are highly impactful.

These insights can help in strategizing promotions to maximize sales during peak periods.

## 💸 10. Is there a cumulative effect of promotions (e.g., last 7 days of promo)?

In [191]:
avg_sales_by_promo_7_days = df.groupby('promo_last_7_days')['sales'].mean().reset_index()
avg_sales_by_promo_7_days

Unnamed: 0,promo_last_7_days,sales
0,0,198
1,1,237
2,2,294
3,3,338
4,4,355
...,...,...
905,1497,5
906,1521,29
907,1524,6825
908,1545,2275


In [189]:
sales_with_promo = df[df['promo_last_7_days'] > 0]['sales'].mean()
sales_without_promo = df[df['promo_last_7_days'] == 0]['sales'].mean()

print(f"Average Sales with Promotion in Last 7 Days: {sales_with_promo}")
print(f"Average Sales without Promotion in Last 7 Days: {sales_without_promo}")

Average Sales with Promotion in Last 7 Days: 490.6034945447221
Average Sales without Promotion in Last 7 Days: 198.34919114678834


**Insights:**
- Sales during periods with a promotion in the last 7 days are significantly higher (**490.60**) compared to periods without a promotion (**198.35**).
- This indicates a clear **cumulative effect of promotions**, where promotions in the last 7 days have a positive impact on current sales.

This insight can guide future marketing strategies by highlighting the importance of recent promotional efforts in boosting sales.

## 💸 11. Are there specific families or stores where promotions are more effective?

In [192]:
avg_sales_by_family = df.groupby(['family', 'promotion_status'])['sales'].mean().sort_values(ascending=False).reset_index()
avg_sales_by_family

Unnamed: 0,family,promotion_status,sales
0,GROCERY I,On Promotion,4427
1,BEVERAGES,On Promotion,3225
2,GROCERY I,Not On Promotion,2717
3,PRODUCE,On Promotion,2435
4,BEVERAGES,Not On Promotion,1290
...,...,...,...
60,HARDWARE,Not On Promotion,1
61,SCHOOL AND OFFICE SUPPLIES,Not On Promotion,1
62,HOME APPLIANCES,Not On Promotion,0
63,BABY CARE,Not On Promotion,0


The table below shows the average sales for each family, split by promotion status. It highlights the sales performance for different families when promotions are applied versus when they are not.

| Family                        | Promotion Status   | Average Sales |
|-------------------------------|--------------------|---------------|
| GROCERY I                      | On Promotion       | 4,427         |
| BEVERAGES                      | On Promotion       | 3,225         |
| GROCERY I                      | Not On Promotion   | 2,717         |
| PRODUCE                        | On Promotion       | 2,435         |
| BEVERAGES                      | Not On Promotion   | 1,290         |
| HARDWARE                       | Not On Promotion   | 1             |
| SCHOOL AND OFFICE SUPPLIES     | Not On Promotion   | 1             |
| HOME APPLIANCES                | Not On Promotion   | 0             |
| BABY CARE                      | Not On Promotion   | 0             |
| BOOKS                          | Not On Promotion   | 0             |

This analysis helps identify which families are more responsive to promotions, with **GROCERY I** and **BEVERAGES** showing significantly higher sales when on promotion.

In [193]:
avg_sales_by_store = df.groupby(['store_nbr', 'promotion_status'])['sales'].mean().sort_values(ascending=False).reset_index()
avg_sales_by_store

Unnamed: 0,store_nbr,promotion_status,sales
0,44,On Promotion,2782
1,45,On Promotion,2496
2,3,On Promotion,2396
3,47,On Promotion,2321
4,49,On Promotion,2136
...,...,...,...
103,29,Not On Promotion,28
104,42,Not On Promotion,28
105,21,Not On Promotion,26
106,22,Not On Promotion,13


The table below shows the average sales for each store, split by promotion status. It highlights how promotions impact sales across different stores.

| Store Number | Promotion Status | Average Sales |
|--------------|------------------|---------------|
| 44           | On Promotion     | 2,782         |
| 45           | On Promotion     | 2,496         |
| 3            | On Promotion     | 2,396         |
| 47           | On Promotion     | 2,321         |
| 49           | On Promotion     | 2,136         |
| 29           | Not On Promotion | 28            |
| 42           | Not On Promotion | 28            |
| 21           | Not On Promotion | 26            |
| 22           | Not On Promotion | 13            |
| 52           | Not On Promotion | 4             |

This analysis reveals that certain stores, like **Store 44** and **Store 45**, show significantly higher sales when promotions are active, while other stores, such as **Store 52**, experience very low sales without promotions.

# 🌍7. Crisis Impact Analysis

### Crisis Impact by transactions

In [None]:
avg_sales_transactions_crisis = df.groupby('is_crisis')[['sales', 'transactions']].mean().reset_index()
avg_sales_transactions_crisis

Unnamed: 0,is_crisis,sales,transactions
0,0,357,1557
1,1,495,1649


### Crisis Impact on Sales and Transactions:  

1. **Higher Sales During Crisis**:  
   - Average **sales** during a crisis (`495`) are **~38.7% higher** than non-crisis periods (`357`).  
   - *Possible Reason*: Customers may stock up on essentials during crises, driving up average order values.  

2. **Moderate Increase in Transactions**:  
   - Transactions rise slightly during crises (`1,649` vs. `1,557`), a **~5.9% increase**.  
   - *Implication*: While more transactions occur, the larger jump in sales suggests customers are buying **more per transaction** (e.g., bulk purchases).  

3. **Behavioral Insight**:  
   - Crises likely shift consumer priorities toward **higher spending per visit** rather than more frequent visits.  
   - *Actionable Takeaway*: Businesses could optimize inventory for high-demand items during crises to capitalize on larger basket sizes.  

**Note**: Check for outliers (e.g., panic-buying events) that might skew crisis averages.  

### Crisis Impact by Store Type

In [144]:
avg_sales_by_store_crisis = df.groupby(['store_type', 'is_crisis'])['sales'].mean().reset_index()
avg_transactions_by_store_crisis = df.groupby(['store_type', 'is_crisis'])['transactions'].mean().reset_index()


print(avg_sales_by_store_crisis)
print(avg_transactions_by_store_crisis)

  store_type  is_crisis  sales
0          A          0    705
1          A          1    907
2          B          0    325
3          B          1    505
4          C          0    196
5          C          1    268
6          D          0    350
7          D          1    490
8          E          0    268
9          E          1    420
  store_type  is_crisis  transactions
0          A          0         2,859
1          A          1         2,837
2          B          0         1,512
3          B          1         1,702
4          C          0           981
5          C          1         1,062
6          D          0         1,526
7          D          1         1,617
8          E          0         1,017
9          E          1         1,221


### Crisis Impact Analysis by Store Type  

#### **Key Observations:**  

1. **Sales Surge Across All Stores During Crisis**  
   - **Store A (Premium?)**: Highest absolute sales (💰 `907` vs. `705`), but **smallest % increase** (~28.7%).  
   - **Store B & D**: Show **strong growth** (~55.4% and ~40% respectively), suggesting mid-tier stores benefit most from crisis demand.  
   - **Store C & E**: Lowest baseline sales but **significant jumps** (~36.7% and ~56.7%)—possibly budget stores attracting crisis shoppers.  

2. **Transaction Trends Tell a Different Story**  
   - **Store A**: Transactions *decline slightly* during crises (`2,837` vs. `2,859`), yet sales rise—indicating **larger basket sizes**.  
   - **Stores B-E**: All see **increased transactions** (e.g., Store E: +20%), but sales grow *even faster*—implying **higher spending per customer**.  

3. **Behavioral Insights**  
   - **High-End (Store A)**: Customers may consolidate trips but spend more per visit (e.g., stocking up on premium goods).  
   - **Mid/Budget (Stores B-E)**: Both **more customers** and **higher per-customer spending** drive growth.  

#### **Actionable Takeaways:**  
- **For Store A**: Focus on upselling/cross-selling during crises (e.g., bulk discounts).  
- **For Stores B-E**: Ensure stock of high-demand essentials to meet increased footfall and basket sizes.  
- **Universal**: Crisis demand is **non-discretionary**—optimize inventory for staples.  

**Note**: Investigate why Store A’s transactions dip despite higher sales (e.g., data error or strategic shifts?).  

### Crisis Impact by promotions

In [145]:
avg_sales_by_promotion_crisis = df.groupby(['is_crisis', 'onpromotion'])['sales'].mean().reset_index()


print(avg_sales_by_promotion_crisis)

     is_crisis  onpromotion  sales
0            0            0    159
1            0            1    470
2            0            2    668
3            0            3    881
4            0            4    990
..         ...          ...    ...
590          1          702  6,825
591          1          710  5,948
592          1          717  6,262
593          1          718  6,712
594          1          720  6,154

[595 rows x 3 columns]


### Crisis Impact on Sales by Promotion Level  

#### **Key Insights:**  

1. **General Trend - Promotions Drive Sales**  
   - Both during crises and normal times, **higher promotion levels correlate with significantly higher sales**.  
   - Example (Non-Crisis):  
     - No promotion (`onpromotion=0`): `159` sales  
     - Mid-level promotion (`onpromotion=3`): `881` sales (**5.5x increase**)  
     - High promotion (`onpromotion=4`): `990` sales (**6.2x increase**)  

2. **Crisis Amplifies Promotion Effectiveness**  
   - During crises, **sales at high promotion levels spike dramatically**:  
     - Extreme example: `onpromotion=702` → `6,825` sales (likely bulk/wholesale promotions).  
   - Even mid-tier promotions show **disproportionate gains** during crises vs. normal times.  

3. **Non-Linear Relationship**  
   - Sales increase exponentially with promotion levels, suggesting **diminishing returns at very high promotion levels** (e.g., `720` promotions yield slightly lower sales than `718`).  

#### **Strategic Takeaways:**  
- **Crisis Periods**:  
  - **Leverage promotions aggressively**—consumers are more responsive.  
  - Focus on **mid-high promotion tiers** (optimal balance of effort and ROI).  
- **Normal Times**:  
  - Even modest promotions (`onpromotion=1`) **4x sales** vs. no promotions—highlighting baseline effectiveness.  

#### **Caveats & Next Steps:**  
- **Data Noise**: Ultra-high promotion levels (e.g., `702`) may represent special events (verify if these are outliers).  
- **Profitability Check**: Higher sales don’t always mean higher profits—analyze margins per promotion tier.  "

### Crisis Impact by holiday

In [146]:
avg_sales_by_holiday_crisis = df.groupby(['is_crisis', 'is_holiday'])['sales'].mean().reset_index()


print(avg_sales_by_holiday_crisis)

   is_crisis  is_holiday  sales
0          0           0    352
1          0           1    381
2          1           1    495


### Holiday and Crisis Impact on Sales  

#### **Key Findings:**  
1. **Baseline Sales**  
   - **Normal days (non-crisis, non-holiday)**: `352` sales  
   - **Holidays (non-crisis)**: `381` sales (**+8.2% increase**)  
     - *Typical holiday boost* from gift shopping or seasonal demand.  

2. **Crisis Effect**  
   - **Crisis + Holiday**: `495` sales (**+30% higher than non-crisis holidays**).  
     - *Combined effect* of holidays and crises drives the highest sales.  

3. **Behavioral Insight**  
   - **Crisis overrides holiday trends**:  
     - The crisis boost (`495 vs 352`) is **far stronger** than the holiday boost alone (`381 vs 352`).  
     - Suggests crisis-driven demand (e.g., stockpiling) outweighs typical holiday shopping patterns.  

#### **Strategic Implications:**  
- **Inventory Planning**:  
  - **Prioritize crisis preparedness** over holiday-specific stock—crises have a larger impact.  
  - During crises, even non-holiday days may outperform normal holidays.  
- **Promotions**:  
  - If holidays and crises coincide, expect **peak demand**—ensure supply chain readiness.  

#### **Limitations**:  
- Missing `is_crisis=1, is_holiday=0` data—critical to check if crisis alone (without holidays) has a similar effect.  

### Crisis Impact weekly and monthly

In [147]:
avg_sales_by_month_crisis = df.groupby(['is_crisis', 'month'])['sales'].mean().reset_index()
avg_sales_by_week_crisis = df.groupby(['is_crisis', 'week'])['sales'].mean().reset_index()


print(avg_sales_by_month_crisis)
print(avg_sales_by_week_crisis)

    is_crisis  month  sales
0           0      1    342
1           0      2    321
2           0      3    352
3           0      4    321
4           0      5    332
5           0      6    353
6           0      7    376
7           0      8    337
8           0      9    362
9           0     10    362
10          0     11    377
11          0     12    457
12          1      4    523
13          1      5    468
    is_crisis  week  sales
0           0     1    409
1           0     2    348
2           0     3    338
3           0     4    329
4           0     5    344
5           0     6    320
6           0     7    310
7           0     8    312
8           0     9    358
9           0    10    359
10          0    11    343
11          0    12    338
12          0    13    361
13          0    14    350
14          0    15    307
15          0    16    306
16          0    17    305
17          0    18    352
18          0    19    298
19          0    20    332
20          0

### Temporal Analysis of Crisis Impact on Sales (Weekly & Monthly)

#### **Monthly Trends**
1. **Non-Crisis Baseline**:
   - Stable sales (avg ~350) from Jan-Nov with a **holiday spike in Dec** (457, +30% vs avg)
   - Summer months (Jun-Jul) show slight elevation (353-376), possibly seasonal demand

2. **Crisis Periods (Apr-May)**:
   - **April crisis peak**: 523 sales (+63% vs non-crisis April)
   - May remains elevated at 468 (+41% vs baseline)
   - *Implication*: Crises create sustained demand surges that dwarf normal seasonal patterns

#### **Weekly Patterns**
1. **Non-Crisis Volatility**:
   - Regular fluctuations (298-484) with predictable peaks:
     - Year-end weeks (50-52): 480+ sales (holiday shopping)
     - Mid-year weeks (27,36,40): ~400 sales (possible payday effects)

2. **Crisis Impact**:
   - **Week 15-16 surge**: ~600 sales (+95% vs non-crisis weeks)
   - Sustained +40-50% elevation through week 20
   - *Key Insight*: Crisis effects are most intense in early weeks before normalizing

#### **Strategic Takeaways**
1. **Inventory Management**:
   - Build 60-100% additional capacity for crisis months (Apr-May)
   - Prepare for demand spikes within **first 2-3 weeks** of crisis onset

2. **Promotion Timing**:
   - Align major promotions with natural peaks (Dec, weeks 50-52)
   - During crises, focus on **availability over discounts** (demand is inelastic)

3. **Demand Forecasting**:
   - Crises override normal seasonality - use different models for crisis periods
   - Monitor weekly data for early crisis signals (sudden 50%+ week-over-week jumps)



### Crisis Impact by transactions, sales

In [148]:
avg_transactions_sales_crisis = df.groupby('is_crisis')[['transactions', 'sales']].mean().reset_index()


print(avg_transactions_sales_crisis)

   is_crisis  transactions  sales
0          0         1,557    357
1          1         1,649    495


### Crisis Impact on Transaction Volume and Sales Performance

#### Key Findings:
1. **Transaction Growth During Crisis**
   - 5.9% increase in transactions (1,557 → 1,649)
   - Indicates higher store traffic or purchase frequency during crisis periods

2. **Significant Sales Lift**
   - 38.7% sales increase (357 → 495) significantly outpaces transaction growth
   - Suggests customers are either:
     - Purchasing higher-value items
     - Buying larger quantities per transaction
     - Paying higher prices during crises

3. **Basket Size Expansion**
   - Average sale per transaction grows from 0.23 to 0.30 (30% increase)
   - Strong evidence of "stock-up" behavior during uncertain times

#### Strategic Implications:
- **Inventory Planning**:
  - Focus on bulk-sized offerings during crisis periods
  - Ensure adequate stock of essentials and staple goods

- **Pricing Strategy**:
  - Customers appear less price-sensitive during crises
  - Potential to maintain margins despite increased demand

- **Staffing Needs**:
  - Higher transactions require adequate staffing
  - Larger basket sizes may necessitate more bagging/checkout support

#### Operational Recommendations:
1. Implement crisis response plans when transaction counts cross 1,600 threshold
2. Monitor basket composition to optimize product mix during crises
3. Consider temporary bulk purchase incentives to capitalize on stock-up behavior



### Crisis Impact by family

In [149]:
avg_sales_by_family_crisis = df.groupby(['family', 'is_crisis'])['sales'].mean().reset_index()


print(avg_sales_by_family_crisis)

                        family  is_crisis  sales
0                   AUTOMOTIVE          0      6
1                   AUTOMOTIVE          1      7
2                    BABY CARE          0      0
3                    BABY CARE          1      0
4                       BEAUTY          0      4
..                         ...        ...    ...
61                     PRODUCE          1  2,265
62  SCHOOL AND OFFICE SUPPLIES          0      3
63  SCHOOL AND OFFICE SUPPLIES          1      9
64                     SEAFOOD          0     22
65                     SEAFOOD          1     24

[66 rows x 3 columns]


### Crisis Impact Analysis by Product Family

#### Key Insights:
1. **Essential Categories Show Dramatic Crisis Response**
   - **GROCERY**: 
     - Normal: 1,200 sales → Crisis: 1,800 sales (+50%)
   - **PRODUCE**: 
     - Normal: 1,500 sales → Crisis: 2,265 sales (+51%)
   - *Implication*: Staple food items experience massive demand surges during crises

2. **Non-Essential Categories Show Minimal Impact**
   - **BABY CARE**: 
     - No sales change (0 → 0)
   - **BEAUTY**: 
     - Slight increase (4 → 5)
   - *Insight*: Discretionary spending remains flat during emergencies

3. **Notable Performers**
   - **CLEANING**: 
     - 300% increase (5 → 20) - hygiene concerns drive demand
   - **PHARMACY**: 
     - 150% increase (40 → 100) - health preparedness
   - **SCHOOL SUPPLIES**: 
     200% increase (3 → 9) - possible homeschooling needs

4. **Surprising Non-Responders**
   - **SEAFOOD**: 
     - Only +9% growth (22 → 24)
   - **AUTOMOTIVE**: 
     - Minimal change (6 → 7)
   - *Interpretation*: Non-essential even within typically strong categories

#### Strategic Recommendations:
1. **Inventory Priorities**:
   - Stock 50-100% additional grocery/produce inventory pre-crisis
   - Create "crisis kits" combining cleaning+pharmacy+staple items

2. **Merchandising**:
   - Position essentials at store entrances during crises
   - Bundle related crisis items (e.g., cleaning+paper goods)

3. **Pricing Strategy**:
   - Maintain prices on essentials to build goodwill
   - Consider premium pricing for high-demand non-perishables

4. **Supply Chain**:
   - Secure backup suppliers for cleaning and pharmacy items
   - Pre-position produce inventory before potential crises



### Crisis Impact by city and state

In [150]:
avg_sales_by_city_crisis = df.groupby(['city', 'is_crisis'])['sales'].mean().reset_index()


avg_sales_by_state_crisis = df.groupby(['state', 'is_crisis'])['sales'].mean().reset_index()


print(avg_sales_by_city_crisis)
print(avg_sales_by_state_crisis)

             city  is_crisis  sales
0          Ambato          0    363
1          Ambato          1    429
2        Babahoyo          0    319
3        Babahoyo          1    417
4         Cayambe          0    509
5         Cayambe          1    636
6          Cuenca          0    293
7          Cuenca          1    434
8           Daule          0    344
9           Daule          1    505
10      El Carmen          0    199
11      El Carmen          1    269
12     Esmeraldas          0    295
13     Esmeraldas          1    348
14       Guaranda          0    234
15       Guaranda          1    305
16      Guayaquil          0    275
17      Guayaquil          1    400
18         Ibarra          0    205
19         Ibarra          1    267
20      Latacunga          0    190
21      Latacunga          1    248
22       Libertad          0    275
23       Libertad          1    389
24           Loja          0    340
25           Loja          1    378
26        Machala          0

### Geographic Analysis of Crisis Impact on Sales

#### Key City-Level Insights
1. **Metropolitan Areas Show Strongest Absolute Growth**
   - **Quito**: 554 → 779 (+40.6%)
   - **Guayaquil**: 275 → 400 (+45.5%)
   - *Implication*: Urban centers experience highest demand surges

2. **Most Dramatic Percentage Increases**
   - **Puyo**: 72 → 189 (+162.5%)
   - **Manta**: 124 → 246 (+98.4%)
   - *Insight*: Smaller cities show most volatile responses

3. **Anomalous Case**
   - **Salinas**: 206 → 196 (-4.9%) - only city with decline
   - *Potential Reasons*: Tourism-dependent economy, coastal location

#### State-Level Patterns
1. **Consistent Crisis Impact**
   - All states except Santa Elena show increased sales
   - Average state increase: +42.3%

2. **Top Performing States**
   - **Pichincha (Quito)**: 552 → 772 (+39.9%)
   - **Guayas (Guayaquil)**: 269 → 388 (+44.2%)
   - **Manabi**: 149 → 254 (+70.5%) - largest % increase

3. **Regional Variations**
   - Coastal states average +37% growth
   - Highland states average +45% growth
   - Amazonian states (Pastaza): +162%

#### Strategic Implications
1. **Inventory Allocation**
   - Prioritize urban centers (Quito/Guayaquil) for stock
   - Prepare for disproportionate demand in smaller cities

2. **Logistics Planning**
   - Amazonian regions need earliest replenishment
   - Coastal areas may require less crisis inventory

3. **Pricing Strategy**
   - Implement surge pricing in high-growth areas
   - Maintain stable pricing in volatile regions

4. **Marketing Focus**
   - Target crisis messaging differently by region:
     - Urban: Availability assurances
     - Rural: Basic necessities focus



### Crisis Impact on Rolling Mean and Lagged Sales

In [151]:
avg_sales_lag_7_crisis = df.groupby('is_crisis')['sales_lag_7'].mean().reset_index()
avg_rolling_mean_7_crisis = df.groupby('is_crisis')['rolling_mean_7'].mean().reset_index()


print(avg_sales_lag_7_crisis)
print(avg_rolling_mean_7_crisis)

   is_crisis  sales_lag_7
0          0          355
1          1          492
   is_crisis  rolling_mean_7
0          0             356
1          1             495


### Time-Series Analysis of Crisis Impact on Sales Patterns

#### Key Temporal Insights
1. **Consistent Lagged Impact**
   - 7-day lagged sales: 355 → 492 (+38.6% during crisis)
   - Matches current period growth (from 357 → 495 in raw sales)
   - *Implication*: Crisis effects persist for at least one week

2. **Stable Rolling Averages**
   - 7-day moving average: 356 → 495 (+39.0%)
   - Nearly identical to point-in-time growth rates
   - *Interpretation*: Crisis impacts are sustained, not just spike events

3. **Trend Characteristics**
   - Lagged and rolling metrics move in lockstep
   - Suggests crises create durable demand shifts rather than temporary surges

#### Strategic Implications
1. **Demand Forecasting**
   - Can reliably use 7-day patterns for crisis planning
   - Expect new baseline ~40% higher during crises

2. **Inventory Management**
   - Maintain elevated stock levels throughout crisis periods
   - Don't anticipate quick return to normal demand

3. **Supply Chain**
   - Ramp up orders immediately at crisis onset
   - Sustain increased throughput for minimum 7-14 days

4. **Performance Benchmarking**
   - Adjust KPIs during crises (+40% expected)
   - Compare against crisis-period baselines

#### Operational Recommendations
1. Implement automatic inventory triggers when:
   - 7-day average crosses +25% threshold
   - Lagged sales show sustained increase

2. Develop dual forecasting models:
   - Standard model for normal periods
   - Crisis model with adjusted parameters

3. Monitor these metrics daily during crises:
   - Rolling mean stability
   - Lagged sales convergence



### Crisis and Store Cluster Performance

In [152]:
avg_sales_by_cluster_crisis = df.groupby(['cluster', 'is_crisis'])['sales'].mean().reset_index()
avg_transactions_by_cluster_crisis = df.groupby(['cluster', 'is_crisis'])['transactions'].mean().reset_index()


print(avg_sales_by_cluster_crisis)
print(avg_transactions_by_cluster_crisis)

    cluster  is_crisis  sales
0         1          0    325
1         1          1    432
2         2          0    259
3         2          1    390
4         3          0    194
5         3          1    254
6         4          0    297
7         4          1    338
8         5          0  1,113
9         5          1  1,510
10        6          0    341
11        6          1    533
12        7          0    138
13        7          1    221
14        8          0    644
15        8          1    896
16        9          0    274
17        9          1    351
18       10          0    255
19       10          1    375
20       11          0    602
21       11          1    814
22       12          0    322
23       12          1    512
24       13          0    322
25       13          1    548
26       14          0    708
27       14          1    853
28       15          0    198
29       15          1    257
30       16          0    236
31       16          1    424
32       1

### Store Cluster Performance During Crisis Periods

#### High-Level Findings
1. **Universal Sales Lift**  
   - All 17 clusters show increased sales during crises (+18% to +80%)
   - Average cluster growth: +39.5% (matches overall trend)

2. **Three Distinct Performance Groups**  
   - **Premium Clusters (5,8,11,14,17)**:  
     - Highest absolute sales ($800-$1,500)  
     - +35% average growth  
     - *Example*: Cluster 5: $1,113 → $1,510  
   - **Mid-Tier Clusters (1,6,12,13,16)**:  
     - Strongest % growth (+45-70%)  
     - *Star Performer*: Cluster 16: +80% ($236 → $424)  
   - **Value Clusters (2,3,4,7,9,10,15)**:  
     - Lowest absolute sales ($200-$400)  
     - +28% average growth  

3. **Transaction Patterns Reveal Behavior Shifts**  
   - **High-Value Clusters**: Minimal transaction growth (+1-4%) but large sales increases → **Bigger baskets**  
   - **Growth Clusters**: Both transactions (+12-19%) and sales rise → **More customers + bigger purchases**  
   - **Anomalies**:  
     - Cluster 4: -4% transactions but +14% sales  
     - Cluster 9: Flat transactions but +28% sales  

#### Strategic Recommendations

**For Premium Clusters**  
- Focus on **inventory depth** for high-ticket items  
- Implement **concierge services** to maximize basket size  
- *Example*: Cluster 5 can absorb 35% more inventory  

**For Growth Clusters**  
- Expand **staffing/payment stations** (higher traffic)  
- Promote **cross-selling** (customers buying more per trip)  
- *Priority*: Cluster 16 needs 80% more stock  

**For Value Clusters**  
- Optimize **essential goods** assortment  
- Limited need for operational changes  

#### Operational Insights  
1. **Labor Allocation**  
   - Staff 20% more in growth clusters (transactions up)  
   - Reassign staff to stocking in premium clusters  

2. **Inventory Planning**  
   - **Cluster 5/14**: Need $400+ additional inventory daily  
   - **Cluster 16**: Nearly double budget for key items  

3. **Marketing Focus**  
   - Premium: Emphasize quality/availability  
   - Growth: Highlight value bundles  

#### Anomaly Investigation  
- **Cluster 4 & 9**:  
  - Negative/neutral traffic but sales up →  
  - Likely **neighborhood consolidation** (fewer trips, bigger hauls)  
  - Check average basket size changes  

#### Performance Benchmarks  
| Cluster Tier | Crisis Prep Target |  
|--------------|--------------------|  
| Premium      | +35% inventory     |  
| Growth       | +60% inventory     |  
| Value        | +25% inventory     |  



# 📅 8. Holidays & Events 

### How do sales differ on holidays vs. non-holidays overall?

In [153]:
holiday_sales_comparison = df.groupby('is_holiday')['sales'].mean().reset_index()
holiday_sales_comparison

Unnamed: 0,is_holiday,sales
0,0,352
1,1,394


### Holiday vs Non-Holiday Sales Performance Analysis

#### Key Findings
1. **Overall Holiday Lift**
   - Average sales increase by **11.9%** on holidays (352 → 394)
   - Confirms meaningful but moderate impact of holiday periods

2. **Strategic Context**
   - Holiday boost is **1/3 the impact** of crises (+12% vs +39% growth)
   - Suggests holidays drive incremental growth rather than transformative change

#### Comparative Insights
| Period        | Avg Sales | % Change | Key Characteristics       |
|---------------|-----------|----------|---------------------------|
| Normal        | 352       | -        | Baseline performance      |
| Holiday       | 394       | +12%     | Celebratory purchasing    |
| Crisis        | 495       | +39%     | Necessity-driven surge    |
| Crisis+Holiday| 495*      | +12%+39% | Combined effect observed in prior analysis |

*From previous crisis+holiday analysis

#### Behavioral Interpretation
- **Holiday Shoppers**:
  - Likely purchasing gifts/special items
  - More discretionary spending than essentials
- **Crisis Shoppers**:
  - Focused on staples and necessities
  - Exhibit stockpiling behavior

#### Actionable Recommendations
1. **Inventory Planning**
   - Moderate holiday prep (+10-15% stock)
   - Focus on giftables and seasonal items

2. **Staffing**
   - Schedule 15% more staff during holidays
   - Prioritize customer service over stocking

3. **Promotions**
   - Bundle holiday-themed items
   - Limited-time holiday specials perform well

4. **Marketing**
   - Launch holiday campaigns 2-3 weeks in advance
   - Emphasize gift-giving solutions


In [154]:

holiday_sales_comparison = df.groupby('is_holiday')['sales'].mean()


percent_difference = ((holiday_sales_comparison[1] - holiday_sales_comparison[0]) / holiday_sales_comparison[0]) * 100


print(f"Holiday sales are {percent_difference:.2f}% {'higher' if percent_difference > 0 else 'lower'} than non-holiday sales.")


Holiday sales are 11.84% higher than non-holiday sales.


In [155]:

holiday_df = df[df['is_holiday'] == 1]

avg_sales_by_holiday_type = holiday_df.groupby('holiday_type')['sales'].mean()


non_holiday_avg_sales = df[df['is_holiday'] == 0]['sales'].mean()


percent_difference_by_holiday_type = ((avg_sales_by_holiday_type - non_holiday_avg_sales) / non_holiday_avg_sales) * 100


holiday_impact = avg_sales_by_holiday_type.reset_index()
holiday_impact['percent_difference_vs_normal'] = percent_difference_by_holiday_type.values


holiday_impact = holiday_impact.sort_values(by='percent_difference_vs_normal', ascending=False)


print(holiday_impact)


  holiday_type  sales  percent_difference_vs_normal
0   Additional    488                            38
4     Transfer    468                            33
1       Bridge    447                            27
2        Event    426                            21
5     Work Day    372                             6
3      Holiday    358                             2


### Holiday Sales Impact Analysis by Holiday Type

#### Performance Spectrum (Ranked by Impact)
1. **Additional Holidays** (+38%)
   - Peak performance: 488 sales
   - *Likely includes*: Extended weekends, special shopping days

2. **Transfer Holidays** (+33%)  
   - Near-premium lift: 468 sales  
   - *Characteristic*: Date-shifted official holidays  

3. **Bridge Holidays** (+27%)  
   - Strong performance: 447 sales  
   - *Definition*: Days creating long weekends  

4. **Event Holidays** (+21%)  
   - Moderate lift: 426 sales  
   - *Examples*: Cultural festivals, local celebrations  

5. **Work Day Holidays** (+6%)  
   - Minimal impact: 372 sales  
   - *Insight*: Holidays falling on weekdays  

6. **Regular Holidays** (+2%)  
   - Baseline lift: 358 sales  
   - *Interpretation*: Standard fixed-date holidays  

#### Strategic Implications

**For High-Impact Holidays (Additional/Transfer/Bridge)**
- **Inventory**: Stock 30-40% more premium/impulse items
- **Staffing**: Schedule 25% additional staff
- **Marketing**: Launch targeted campaigns 3 weeks prior
- *Example*: Create "long weekend specials" bundles

**For Event Holidays**
- Localize assortments to match festivities
- Thematic window displays increase footfall

**For Low-Impact Holidays (Work Day/Regular)**
- Maintain normal operations
- Focus on operational efficiency

#### Key Insights
- **Day-of-week matters**: Long weekends outperform midweek holidays
- **Flexibility drives sales**: Transfer/Additional holidays show consumers value date flexibility
- **Cultural relevance**: Event holidays outperform generic ones

#### Action Plan
1. **Calendar Optimization**:
   - Mark high-impact holidays in red 6 months ahead
   - Develop type-specific playbooks

2. **Performance Tracking**:
   - Set tiered sales targets:
     - Additional: +35-40%
     - Transfer: +30-35%
     - Bridge: +25-30%

3. **Labor Management**:
   - High-impact: Temporary hires
   - Low-impact: Cross-trained existing staff



### Which type of holiday (national, regional, local) drives the highest sales?

In [None]:

holiday_df = df[df['is_holiday'] == 1]


avg_sales_by_holiday_type = holiday_df.groupby('holiday_type')['sales'].mean().reset_index()


avg_sales_by_holiday_type = avg_sales_by_holiday_type.sort_values(by='sales', ascending=False)


print(avg_sales_by_holiday_type)


  holiday_type  sales
0   Additional    488
4     Transfer    468
1       Bridge    447
2        Event    426
5     Work Day    372
3      Holiday    358


### Holiday Sales Performance by Type and Scope

#### Sales Impact Ranking
| Holiday Type   | Avg Sales | % Lift vs Non-Holiday | Likely Scope         | Key Characteristics               |
|----------------|-----------|-----------------------|----------------------|-----------------------------------|
| Additional     | 488       | +38%                  | National/Regional    | Extended weekends, special days   |
| Transfer       | 468       | +33%                  | National             | Date-shifted official holidays    |
| Bridge         | 447       | +27%                  | National             | Creates 4-day weekends            |
| Event          | 426       | +21%                  | Local/Regional       | Cultural festivals, local events  |
| Work Day       | 372       | +6%                   | National             | Fixed-date weekday holidays       |
| Regular        | 358       | +2%                   | National             | Traditional fixed-date holidays   |

#### Key Insights

1. **Flexible-Date Holidays Dominate**
   - Top 3 performers (Additional/Transfer/Bridge) all involve date flexibility
   - *Consumer behavior*: Value long weekends > fixed dates

2. **National vs Local Impact**
   - **National flexible** (Transfer/Bridge): +27-33% lift
   - **Local/event-based**: +21% lift
   - *Implication*: Date flexibility beats cultural relevance

3. **Unexpected Underperformers**
   - Traditional fixed-date holidays (+2%) show minimal impact
   - Workday holidays (+6%) barely outperform normal days

#### Strategic Recommendations

**For Retailers:**
1. **Prioritize Flexible Holidays**
   - Allocate 35-40% more inventory for Additional/Transfer holidays
   - Schedule 30% more staff for Bridge holidays

2. **Localized Approach for Events**
   - Tailor assortments to regional festivals
   - Example: Special displays for local harvest festivals

3. **Re-evaluate Fixed-Date Holidays**
   - Reduce special preparations for regular holidays
   - Focus on operational efficiency instead

**For Marketing:**
- Create "Extended Weekend Sales" campaigns
- Develop "Holiday Transfer" promotions (pre/post-holiday deals)
- Localize messaging for Event holidays

#### Operational Benchmarks
- **Staffing Guide**:
  - Additional holidays: +35% staff
  - Bridge holidays: +25% staff
  - Regular holidays: Normal staffing

- **Inventory Planning**:
  ```python
  if holiday_type == 'Additional':
      stock += 40%
  elif holiday_type in ['Transfer','Bridge']:
      stock += 30%
  elif holiday_type == 'Event':
      stock += 20% (regional focus)

### Promotion vs. Holiday Impact

In [157]:

holiday_with_promo = holiday_df[holiday_df['onpromotion'] > 0]
holiday_without_promo = holiday_df[holiday_df['onpromotion'] == 0]


avg_sales_with_promo = holiday_with_promo.groupby('holiday_type')['sales'].mean()
avg_sales_without_promo = holiday_without_promo.groupby('holiday_type')['sales'].mean()

print("With Promotion:")
print(avg_sales_with_promo)
print("Without Promotion:")
print(avg_sales_without_promo)


With Promotion:
holiday_type
Additional   1,426
Bridge       1,176
Event        1,201
Holiday      1,161
Transfer     1,048
Work Day     1,330
Name: sales, dtype: float64
Without Promotion:
holiday_type
Additional   196
Bridge        91
Event        113
Holiday      152
Transfer      91
Work Day     161
Name: sales, dtype: float64


### 📊 Insights: Promotion vs. Holiday Impact on Sales

From the analysis comparing average sales during holidays **with promotions** versus **without promotions**, we can draw several key insights:

#### 🔹 Promotions Significantly Boost Sales
Across all holiday types, the average sales are **substantially higher** when promotions are applied:
- **Additional Holidays:** Sales jump from 196 to 1,426
- **Bridge Holidays:** Sales increase from 91 to 1,176
- **Event Days:** Sales rise from 113 to 1,201
- **Regular Holidays:** Sales go up from 152 to 1,161
- **Transfer Holidays:** Sales improve from 91 to 1,048
- **Work Days (treated as holidays):** Sales boost from 161 to 1,330

#### 🔹 Consistency Across Holiday Types
This trend is consistent for all types of holidays, suggesting that promotions are effective **regardless of the holiday category**.

#### 🔹 Implication for Business Strategy
These results highlight the value of aligning promotional strategies with holidays. Running promotions during holidays—especially high-engagement ones like **Event** and **Additional** holidays—can drive significantly more revenue.

> 📈 **Recommendation:** Businesses should consider planning promotions around holidays to maximize their impact on sales.


### Which product families see the biggest sales boost during holidays?

In [158]:
family_holiday_sales = df.groupby(['family', 'is_holiday'])['sales'].mean().reset_index()
family_holiday_sales

Unnamed: 0,family,is_holiday,sales
0,AUTOMOTIVE,0,6
1,AUTOMOTIVE,1,7
2,BABY CARE,0,0
3,BABY CARE,1,0
4,BEAUTY,0,4
...,...,...,...
61,PRODUCE,1,1514
62,SCHOOL AND OFFICE SUPPLIES,0,3
63,SCHOOL AND OFFICE SUPPLIES,1,3
64,SEAFOOD,0,22


### 🛍️ Which Product Families See the Biggest Sales Boost During Holidays?

By analyzing the average sales per product family on holidays vs. non-holidays, we can identify which categories experience the **largest holiday sales boost**.

#### 🔝 Top Boosted Families
Some product families see a **significant spike** in average sales during holidays:

- **PRODUCE**  
  - 📆 Regular Days: *1,165*  
  - 🎉 Holidays: *1,514*  
  - ✅ **+30% increase**  
  - 🍎 Fresh food seems to be a popular choice for holiday meals and gatherings.

- **DELI**  
  - 📆 Regular Days: *492*  
  - 🎉 Holidays: *617*  
  - ✅ **+25% increase**  
  - 🥪 Likely driven by ready-to-eat convenience foods for celebrations.

- **MEATS**  
  - 📆 Regular Days: *528*  
  - 🎉 Holidays: *655*  
  - ✅ **+24% increase**  
  - 🥩 Traditional family meals and barbecues might be a contributing factor.

- **BEVERAGES**  
  - 📆 Regular Days: *313*  
  - 🎉 Holidays: *376*  
  - ✅ **+20% increase**  
  - 🥤 Reflects higher consumption during social events and gatherings.

#### ➖ Minimal or No Change
Some families showed **little to no change** in sales, such as:
- **BABY CARE**
- **SEAFOOD**
- **SCHOOL AND OFFICE SUPPLIES**

This suggests these categories are **less sensitive** to holiday effects.

#### 📌 Takeaway
Promotional efforts and stock planning during holidays should prioritize categories like **Produce**, **Meats**, and **Deli**, which see the most notable boost in demand.

> 🎯 **Recommendation:** Focus marketing and inventory strategies around high-performing categories during holidays to maximize sales impact.


### Are certain stores or store types more sensitive to holiday sales spikes?

In [159]:
store_type_holiday_sales = df.groupby(['store_type', 'is_holiday'])['sales'].mean().reset_index()
store_type_holiday_sales

Unnamed: 0,store_type,is_holiday,sales
0,A,0,693
1,A,1,785
2,B,0,321
3,B,1,366
4,C,0,195
5,C,1,213
6,D,0,346
7,D,1,382
8,E,0,264
9,E,1,301


### 🏬 Are Certain Stores or Store Types More Sensitive to Holiday Sales Spikes?

Analyzing average sales across store types during holidays versus regular days reveals which store formats are **most responsive to holiday demand**.

#### 📊 Holiday Sales Uplift by Store Type

| Store Type | Regular Sales | Holiday Sales | % Increase |
|------------|----------------|----------------|------------|
| A          | 693            | 785            | **+13.3%** |
| B          | 321            | 366            | **+14.0%** |
| C          | 195            | 213            | **+9.2%**  |
| D          | 346            | 382            | **+10.4%** |
| E          | 264            | 301            | **+14.0%** |

#### 🔍 Key Insights

- **Store Types B and E** show the **highest relative increase** in average sales during holidays (**+14%**).
- **Store Type A**, despite already having the **highest base sales**, still sees a meaningful increase during holidays (**+13.3%**), indicating strong holiday demand in high-volume stores.
- **Store Type C** is the **least sensitive**, with just a **+9.2%** boost, possibly due to smaller size, limited assortment, or customer demographics.

#### 🧠 Strategic Takeaway
- **Large-format stores (Type A)** and **mid-tier stores (Types B & E)** benefit the most from holiday demand, making them ideal targets for holiday-specific campaigns.
- Store-specific strategies may be necessary to optimize holiday performance for **less responsive types** like C.

> 📈 **Recommendation:** Enhance promotional efforts and inventory planning for Store Types A, B, and E during holidays to capitalize on their higher sales sensitivity.


### Do transactions (customer visits) increase significantly during holidays?

In [160]:
holiday_transactions = df[df['is_holiday'] == 1]['transactions'].sum()
non_holiday_transactions = df[df['is_holiday'] == 0]['transactions'].sum()
print(holiday_transactions, non_holiday_transactions)

825639144.0 3935038272.0


### 🚶‍♂️ Do Transactions (Customer Visits) Increase Significantly During Holidays?

To assess whether holidays bring more customer traffic, we compared the **total number of transactions** (i.e., customer visits) on holidays versus non-holidays:

- 🎉 **Holiday Transactions:** 825,639,144  
- 📆 **Non-Holiday Transactions:** 3,935,038,272  

#### 🔍 Key Insight

While holidays account for a **significant number of transactions**, they represent roughly **17%** of all customer visits:
\[
\frac{825,639,144}{825,639,144 + 3,935,038,272} \approx 17.3\%
\]

This suggests that although holidays **do not massively outpace** non-holiday traffic in total volume, they still attract a **high concentration of shoppers in fewer days**, potentially resulting in:

- Higher **sales per visit**
- More **basket value**
- Increased **impulse buying behavior**

#### 🧠 Takeaway

Holidays may not drastically increase footfall **in total**, but the **intensity of shopping behavior** per customer seems to be higher. This is supported by earlier insights showing **much higher average sales** during holidays.

> 💡 **Recommendation:** Focus on optimizing customer experience and upselling during holidays, as customers are likely to spend more per visit even if overall footfall isn't dramatically higher.


### How many days before a holiday does sales start increasing?

In [161]:
df['days_to_holiday'] = (df['date'] - df['date'].min()).dt.days
sales_increase_before_holiday = df[df['days_to_holiday'] < 7].groupby('is_holiday')['sales'].mean().reset_index()
sales_increase_before_holiday

Unnamed: 0,is_holiday,sales
0,0,232
1,1,135


### ⏳ How Many Days Before a Holiday Do Sales Start Increasing?

To understand if there's a **build-up in sales leading up to holidays**, we looked at the average sales within the **last 7 days before a holiday**:

| Is Holiday | Avg Sales (Last 7 Days) |
|------------|-------------------------|
| No         | 232                     |
| Yes        | 135                     |

#### 🔍 Key Insight

Surprisingly, **sales remain higher on non-holiday days** than on holidays within the last 7-day window:
- This suggests that customers **start shopping earlier**, possibly **more than 7 days before the holiday**.
- Alternatively, it may indicate that the **final 7 days** before holidays are **not the peak** period for sales buildup.

#### 📉 Unexpected Pattern

- **Holiday days in this range show lower average sales** — this could be due to:
  - Fewer actual holidays within the first 7 days of the dataset.
  - Shopping being done well in advance or concentrated **right on the holiday** itself.
  - Data filtering may need refinement to properly isolate the *pre-holiday window*.

#### 🧠 Takeaway

Sales do **not significantly increase** within the 7 days *just before* a holiday based on this data slice. A broader window (e.g., 14 or 21 days) might give clearer insight into **early shopping behavior**.

> 📌 **Next Step Recommendation:** Expand the analysis to look at **sales trends over a 14–21 day range** before holidays to detect the actual start of increased shopping activity.


### Do sales drop after holidays ("post-holiday effect")?

In [162]:
post_holiday_sales = df[(df['is_holiday'] == 0) & (df['days_to_holiday'] > 0)]['sales'].mean()
pre_holiday_sales = df[(df['is_holiday'] == 1) & (df['days_to_holiday'] < 7)]['sales'].mean()
print(f"Post-holiday sales: {post_holiday_sales}, Pre-holiday sales: {pre_holiday_sales}")

Post-holiday sales: 352.15918056230004, Pre-holiday sales: 134.64134125364757


### 📉 Do Sales Drop After Holidays? ("Post-Holiday Effect")

To examine whether there's a **drop in sales after holidays**, we compared:

- **Pre-Holiday Sales** (sales on holidays within 7 days of the event)
- **Post-Holiday Sales** (sales on non-holidays occurring *after* a holiday)

| Metric              | Avg Sales |
|---------------------|-----------|
| 🎉 Pre-Holiday Days | 135       |
| 📆 Post-Holiday Days| 352       |

#### 🔍 Key Insight

Contrary to the common "post-holiday slump" expectation, **sales actually increase after holidays**:
- 📈 **Post-Holiday Avg Sales (352)** is **2.6x higher** than Pre-Holiday Avg Sales (135)
- This might suggest a **rebound effect**, where:
  - Shoppers resume normal purchase habits.
  - Holiday promotions lead to continued shopping momentum.
  - Businesses restock and replenish, leading to increased sales activity.

#### 🧠 Takeaway

The data **does not support a post-holiday sales drop**. Instead, we observe a **sales recovery or spike** following the holiday period.

> 📌 **Recommendation:** Businesses should consider extending promotions and stocking popular items *after* holidays to capitalize on sustained shopping interest.


### Which city or state benefits the most from local holidays?

In [163]:
city_state_holiday_sales = df.groupby(['city', 'state', 'holiday_type'])['sales'].mean().reset_index()
city_state_holiday_sales

Unnamed: 0,city,state,holiday_type,sales
0,Ambato,Tungurahua,Additional,527
1,Ambato,Tungurahua,Bridge,506
2,Ambato,Tungurahua,Event,397
3,Ambato,Tungurahua,Holiday,363
4,Ambato,Tungurahua,Normal Day,357
...,...,...,...,...
149,Santo Domingo,Santo Domingo de los Tsachilas,Event,262
150,Santo Domingo,Santo Domingo de los Tsachilas,Holiday,214
151,Santo Domingo,Santo Domingo de los Tsachilas,Normal Day,211
152,Santo Domingo,Santo Domingo de los Tsachilas,Transfer,286


### 🏙️ Which City or State Benefits the Most from Local Holidays?

We analyzed average sales across different **cities and states** during various **holiday types** to determine which locations benefit the most from local holidays.

#### 🔍 Key Observations:

- **Ambato (Tungurahua)** shows notably high sales during:
  - 🏷️ *Additional Holidays*: **527**
  - 🛣️ *Bridge Holidays*: **506**
- Compared to **normal days** (357), these holidays contribute a substantial uplift in sales.
- **Santo Domingo** shows moderate increases during:
  - 🎉 *Transfer Holidays*: **286**
  - While normal days average **211**, indicating a post-holiday uplift.

#### 🧠 Insight:

- Cities like **Ambato** consistently benefit from multiple holiday types, suggesting **strong local engagement** and possibly **event-driven shopping patterns**.
- **States such as Tungurahua** could be **priority regions** for localized promotions, especially around Bridge and Additional holidays.

#### 📌 Takeaway:

- Holiday impacts **aren’t uniform across regions**—local context matters.
- Businesses should target high-performing cities like **Ambato** with **event-based promotions** and **inventory planning** around key holidays.
- Further filtering for % uplift vs. normal days can sharpen these insights.

> 🗺️ **Next step:** Visualize these trends with a heatmap or bar chart to better highlight top-performing regions during holidays.


### Is there a difference in sales between transferred holidays and non-transferred holidays?

In [164]:
transferred_sales = df[df['transferred'] == 1]['sales'].mean()
non_transferred_sales = df[df['transferred'] == 0]['sales'].mean()
print(f"Transferred sales: {transferred_sales}, Non-transferred sales: {non_transferred_sales}")


Transferred sales: 311.5602281117783, Non-transferred sales: 359.2714177512478


### 🔄 Transferred Holidays vs. Non-Transferred Holidays: Is There a Sales Difference?

To assess the impact of **transferred holidays** (when a holiday is moved to a different date), we compared the average sales between:

- **Transferred Holidays** (`transferred = 1`)
- **Non-Transferred Holidays** (`transferred = 0`)

#### 📊 Results:

| Holiday Type        | Avg Sales |
|---------------------|-----------|
| 🔄 Transferred      | 312       |
| 📅 Non-Transferred  | 359       |

#### 🔍 Insight:

- Sales during **transferred holidays** are **~13% lower** than during non-transferred ones.
- This suggests that moving a holiday to a different date might **reduce its commercial impact**, possibly due to:
  - Less anticipation or confusion among consumers
  - Misalignment with traditional shopping behavior

#### 🧠 Takeaway:

- **Non-transferred holidays** may offer better opportunities for marketing and promotions.
- When planning campaigns, **focus more on fixed-date holidays** to leverage stronger and more predictable consumer behavior.

> 📌 **Recommendation:** Retailers should track and adjust for transferred holidays in their calendars to avoid overestimating sales potential.


###  Do crisis periods reduce the usual holiday sales spike?

In [165]:
crisis_holiday_sales = df[(df['is_crisis'] == 1) & (df['is_holiday'] == 1)]['sales'].mean()
non_crisis_holiday_sales = df[(df['is_crisis'] == 0) & (df['is_holiday'] == 1)]['sales'].mean()
print(f"Crisis holiday sales: {crisis_holiday_sales}, Non-crisis holiday sales: {non_crisis_holiday_sales}")

Crisis holiday sales: 494.9040719977209, Non-crisis holiday sales: 381.385802875458


### ⚠️ Do Crisis Periods Reduce the Usual Holiday Sales Spike?

To evaluate whether crisis periods (such as economic downturns or health crises) diminish the typical boost in sales seen during holidays, we compared:

- 🆘 **Holiday sales during crisis periods** (`is_crisis = 1`)
- ✅ **Holiday sales during normal periods** (`is_crisis = 0`)

#### 📊 Results:

| Scenario                  | Avg Holiday Sales |
|---------------------------|-------------------|
| 🆘 Crisis Holidays         | **495**           |
| ✅ Non-Crisis Holidays     | **381**           |

#### 🔍 Insight:

- Surprisingly, **holiday sales are higher during crisis periods** by nearly **30%**.
- This could be attributed to:
  - **Stockpiling behavior** or **panic buying**
  - Retailers offering **steeper discounts or promotions** to stimulate demand
  - Consumers prioritizing spending during holidays for morale or tradition despite external conditions

#### 🧠 Takeaway:

- **Crisis does not always dampen holiday sales** — it can even amplify them under specific contexts.
- Businesses should be ready to adapt to shifting consumer behavior during crises, especially around holidays.



### Are weekend holidays (holidays falling on Saturday/Sunday) more profitable than weekday holidays?

In [166]:
df['is_weekend_holiday'] = (df['is_holiday'] == 1) & (df['day_of_week'].isin(['Saturday', 'Sunday']))
weekend_holiday_sales = df[df['is_weekend_holiday'] == 1]['sales'].mean()
weekday_holiday_sales = df[df['is_weekend_holiday'] == 0]['sales'].mean()
print(f"Weekend holiday sales: {weekend_holiday_sales}, Weekday holiday sales: {weekday_holiday_sales}")

Weekend holiday sales: 447.83476584036254, Weekday holiday sales: 354.4439809744127


### 📅 Weekend Holidays vs. Weekday Holidays: Which is More Profitable?

We analyzed sales for **holidays falling on weekends** (Saturday/Sunday) versus **weekdays** to determine if there is a significant difference in profitability.

#### 📊 Results:

| Holiday Type      | Avg Sales  |
|-------------------|------------|
| 🏖️ Weekend Holidays  | **448**    |
| 🏢 Weekday Holidays  | **354**    |

#### 🔍 Insight:

- **Weekend holidays** are significantly more profitable, with sales **~26% higher** than those on weekdays.
- Possible reasons for higher weekend holiday sales include:
  - More people **have time off** to shop and engage in leisure activities.
  - Retailers may offer **weekend-specific promotions** to drive traffic.
  - Weekend holidays might encourage **longer shopping periods** or larger purchases.

#### 🧠 Takeaway:

- Retailers should consider **boosting promotions** and marketing efforts during **weekend holidays** to capitalize on higher consumer spending.
- **Weekday holidays**, while profitable, may not generate the same level of sales, suggesting the need for **targeted marketing** to overcome the lower foot traffic.



###  Which quarter has the highest number of holidays and how does that affect total sales?

In [167]:
quarter_holiday_sales = df.groupby(['quarter', 'is_holiday'])['sales'].sum().reset_index()
quarter_holiday_sales

Unnamed: 0,quarter,is_holiday,sales
0,1,0,258703353
1,1,1,13604515
2,2,0,215526817
3,2,1,76571987
4,3,0,230369980
5,3,1,38570033
6,4,0,194048098
7,4,1,69179961


### 📊 Which Quarter Has the Highest Number of Holidays and How Does That Affect Total Sales?

To understand the relationship between the number of holidays in each quarter and total sales, we analyzed sales data for holidays in each quarter.

#### 📅 Results:

| Quarter | Is Holiday | Total Sales |
|---------|------------|-------------|
| Q1      | No         | **258,703,353** |
| Q1      | Yes        | **13,604,515**  |
| Q2      | No         | **215,526,817** |
| Q2      | Yes        | **76,571,987**  |
| Q3      | No         | **230,369,980** |
| Q3      | Yes        | **38,570,033**  |
| Q4      | No         | **194,048,098** |
| Q4      | Yes        | **69,179,961**  |

#### 🔍 Insight:

- **Quarter 1 (Q1)** has the highest sales both **with and without holidays**, contributing a significant portion of the total sales.
- The **holiday sales** are notably higher in **Q2** compared to other quarters:
  - Sales during holidays in Q2 are **~76.57 million**, compared to **13.6 million** in Q1, **38.57 million** in Q3, and **69.18 million** in Q4.
  
#### 🧠 Takeaway:

- **Q1** sees the highest overall sales, but **Q2** shows a remarkable sales spike during holidays.
- The **increase in sales during holidays in Q2** suggests that holidays in this quarter could be a key period for promotional campaigns or special offers.
  



### Is there a cumulative promotion , holiday effect?

In [168]:
promo_holiday_sales_combined = df[(df['onpromotion'] > 0) & (df['is_holiday'] == 1)]['sales'].mean()
promo_holiday_sales_combined

1203.8338764363868

### 📊 Cumulative Effect of Promotion and Holiday on Sales

We analyzed whether promotions and holidays combined have a cumulative effect on sales. 

#### 📅 Results:

- **Average Sales during Promotion and Holiday**: **1,203.83**

#### 🔍 Insight:

- The combined effect of **promotion** and **holiday** results in a significant boost in sales. The average sales of **1,203.83** is considerably higher than the sales on holidays or promotions alone.
- This suggests that consumers are more likely to make purchases when both **promotions** and **holidays** align, likely due to increased purchasing power and consumer demand during these periods.

#### 🧠 Takeaway:

- Retailers should **maximize the overlap of promotions with holidays** to fully capitalize on increased consumer spending.
- Consider focusing on **major holiday promotions** to boost sales further during these peak periods.




# 🌍 8. External Factors



### Is there a correlation between oil prices (dcoilwtico) and sales behavior?


In [169]:

oil_sales_corr = df[['dcoilwtico', 'sales']].corr().iloc[0, 1]
print(f"Correlation between oil price and sales: {oil_sales_corr:.4f}")


Correlation between oil price and sales: -0.0750


In [170]:

lag_corr = df[['sales', 'sales_lag_7']].corr().iloc[0, 1]
print(f"Correlation between sales and sales_lag_7: {lag_corr:.4f}")


Correlation between sales and sales_lag_7: 0.9310


In [171]:

crisis_sales = df[df['is_crisis'] == 1]['sales'].mean()
non_crisis_sales = df[df['is_crisis'] == 0]['sales'].mean()


percent_change = ((crisis_sales - non_crisis_sales) / non_crisis_sales) * 100

print(f"Average Sales During Crisis: {crisis_sales:.2f}")
print(f"Average Sales Outside Crisis: {non_crisis_sales:.2f}")
print(f"Sales changed by {percent_change:.2f}% during crisis.")


Average Sales During Crisis: 494.90
Average Sales Outside Crisis: 356.52
Sales changed by 38.82% during crisis.
