# 📈 Elasticity Project: Model Summary

This model focuses on **Price Elasticity of Demand (PED)** and its effect on total revenue. It allows users to explore how changes in price influence quantity demanded and overall sales performance.

## ✅ **Key Components:**

1. **Price Elasticity of Demand (PED) Calculation**

   The elasticity is calculated using the midpoint formula to provide stable and realistic elasticity estimates:

   $$
   E_d = \frac{\frac{Q_2 - Q_1}{(Q_2 + Q_1)/2}}{\frac{P_2 - P_1}{(P_2 + P_1)/2}}
   $$

   Where:
   - \( Q_1 \), \( Q_2 \) = Original and new quantity demanded.
   - \( P_1 \), \( P_2 \) = Original and new price.

2. **Elasticity Classification**

   The model classifies elasticity as:
   - **Elastic** if \( E_d > 1 \)
   - **Inelastic** if \( E_d < 1 \)
   - **Unitary Elastic** if \( E_d = 1 \)

3. **Revenue Impact Calculation**

   We calculate **Total Revenue (TR)** before and after the price change:

   $$
   TR_1 = P_1 \times Q_1
   $$

   $$
   TR_2 = P_2 \times Q_2
   $$

   The **change in revenue** is expressed as:

   $$
   \Delta TR = TR_2 - TR_1
   $$

4. **Visualizations**

   - **Demand Curve Plot:**
     Shows the demand curve shifting based on user input.
   - **Revenue Comparison:**
     Displays side-by-side revenue before and after the price change.

5. **User Inputs (via Sliders):**
   - Initial price (\( P_1 \))
   - Initial quantity (\( Q_1 \))
   - % change in price (\( \%\Delta P \))

6. **Output:**
   - New price & quantity estimates.
   - Elasticity classification (with interpretation).
   - Revenue before & after (with impact summary).
   - Interactive graph updates in real-time.


In [13]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

In [14]:
processed_data = pd.read_csv('../data/processed/processed_data.csv')

## ✅ Check data is clean

In [15]:
processed_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 843482 entries, 0 to 843481
Data columns (total 9 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   Date           843482 non-null  object
 1   Store          843482 non-null  int64 
 2   DayOfWeek      843482 non-null  int64 
 3   Sales          843482 non-null  int64 
 4   Customers      843482 non-null  int64 
 5   Open           843482 non-null  int64 
 6   Promo          843482 non-null  int64 
 7   StateHoliday   843482 non-null  int64 
 8   SchoolHoliday  843482 non-null  int64 
dtypes: int64(8), object(1)
memory usage: 57.9+ MB


In [16]:
processed_data.head()


Unnamed: 0,Date,Store,DayOfWeek,Sales,Customers,Open,Promo,StateHoliday,SchoolHoliday
0,2015-07-31,1,5,5263,555,1,1,0,1
1,2015-07-31,2,5,6064,625,1,1,0,1
2,2015-07-31,3,5,8314,821,1,1,0,1
3,2015-07-31,4,5,13995,1498,1,1,0,1
4,2015-07-31,5,5,4822,559,1,1,0,1


## 🔥 First elasticity-style insight: Promo effect
- We can directly model the effect of Promo (binary: 0/1) on Sales. This tells you:

- How much more (or less) you sell when running a promo vs. not running one.

- Even a simple OLS regression can give you:

- The coefficient for Promo → this acts like a proxy elasticity for how responsive sales are to promotions.

## 💡 Let’s draft the steps:


### 1️⃣ Convert Date as before:

In [17]:
processed_data['Date'] = pd.to_datetime(processed_data['Date'])
processed_data['Month'] = processed_data['Date'].dt.month
processed_data['Year'] = processed_data['Date'].dt.year
processed_data['WeekOfYear'] = processed_data['Date'].dt.isocalendar().week


### 2️⃣ Filter to open stores only (because closed = 0 sales):

In [18]:
data_open = processed_data[processed_data['Open'] == 1]


### 3️⃣ Set up features & target:

In [19]:
features = ['Promo', 'StateHoliday', 'SchoolHoliday', 'DayOfWeek', 'Month', 'Year']
X = data_open[features]
y = data_open['Sales']


### 4️⃣ Linear regression:

In [20]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

lr = LinearRegression()
lr.fit(X_train, y_train)

print("Train R^2:", lr.score(X_train, y_train))
print("Test R^2:", lr.score(X_test, y_test))


Train R^2: 0.1502372392327186
Test R^2: 0.1487795746920464


### 5️⃣ Elasticity-like insight: Promo effect
After training, check the coefficients:

In [21]:
coef_table = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': lr.coef_
})
print(coef_table)


         Feature   Coefficient
0          Promo  2.158871e+03
1   StateHoliday  4.831691e-13
2  SchoolHoliday  7.052238e+01
3      DayOfWeek -1.367336e+02
4          Month  8.163864e+01
5           Year  2.024603e+02


## 🔍 Analysis of the model

### 🟢 Promo: +2158.87
💥 BOOM—this is your headline stat.

✅ On average, when a promo is running, sales increase by about 2,159 units compared to days when there’s no promo.

📈 This is your "promotion elasticity proxy"—while it’s not a percentage change (since we don’t have price), it tells you how sensitive sales are to the presence of a promotion.



### 🟠 StateHoliday: ~ 0 (4.8e-13)
That’s super tiny—basically no effect.

This tells us:

🏖️ Whether it’s a state holiday or not doesn’t seem to impact sales much in your data.

Do we know if this column had real variation (were there holidays at all?), or was it sparse? Worth checking with:

In [22]:
print(processed_data['StateHoliday'].value_counts())


StateHoliday
0    843482
Name: count, dtype: int64


## This tells us:

- ✅ 100% of your data points (843,482 rows) have StateHoliday = 0.
- ❌ No actual state holidays are present.

### 💡 Why did the model give us that tiny coefficient (~4.8e-13)?
- Because the StateHoliday feature is constant—it never changes. That means it's giving the model no real signal at all.

- In linear regression, when a feature has no variation, it can’t actually contribute meaningfully to prediction. The regression still assigns it a tiny (basically zero) coefficient, but it’s doing nothing.

### ✅ Next Steps?
- Remove StateHoliday from the feature list going forward because:
    - It’s useless here (no variation = no predictive power).
    - It might even slightly slow down or complicate future models (especially tree-based ones that don’t handle constant features well).

### 🟡 SchoolHoliday: +70.5
This one's interesting:

- When there’s a school holiday, sales increase by ~71 units on average.
- Not a massive effect, but it’s positive.

✅ This makes intuitive sense—families might shop more when kids are out of school.

### 🔵 DayOfWeek: -136.7
This one tells us that as the day of the week increases (likely Monday=1 up to Sunday=7):

- Sales drop about 137 units per day going later in the week.
- It’s linear here, so it might not fully capture patterns like weekend spikes—this could be better handled later with dummy variables (categoricals).

### 🟣 Month: +81.6
Each later month in the year is associated with ~82 units more in sales.

- This may reflect seasonality trends (e.g., Q4 increases), but it’s a pretty small per-month bump.

### 🟤 Year: +202.5
Each year forward (like from 2022 to 2023) is associated with ~202 extra sales units.

- This suggests an upward trend year over year (maybe business growth, inflation, or other market factors).

## 🚦 What’s the Big Takeaway?

| 📊 **Feature**      | 💥 **Interpretation**                                                                                      |
|---------------------|----------------------------------------------------------------------------------------------------------|
| **Promo**           | 🔥 **Major impact: +2159 sales boost.** This is your *main elasticity-like driver.*                       |
| **StateHoliday**    | 💤 **No real effect.**                                                                                   |
| **SchoolHoliday**   | 👍 Small positive bump (~71 units).                                                                      |
| **DayOfWeek**       | 📉 Sales **decline by ~137 units** later in the week (might hint at a weekend lull—worth deeper analysis). |
| **Month**           | 📈 Slight positive trend across months (~82 units increase per month).                                    |
| **Year**            | 🚀 Solid +200 unit boost per year—suggests business growth or other long-term upward trend.               |


**Note:** The `StateHoliday` feature was removed from further modeling because the dataset contains no actual state holidays (100% of rows have `StateHoliday = 0`), making it a constant feature with no predictive value.


In [1]:
features = ['Promo', 'SchoolHoliday', 'DayOfWeek', 'Month', 'Year']
