
# 📊 Elasticity Project — Phase 2: Data Cleaning and Feature Engineering

---

## 📝 Purpose of this Notebook

This notebook initiates **Phase 2** of the elasticity modeling project:
- Clean the raw dataset after initial exploration
- Engineer features necessary for elasticity regression modeling
- Prepare a finalized dataset ready for modeling

---

## 📚 Tasks Covered

- Remove zero-sales observations to avoid skewing elasticity
- Create log-transformed sales feature (`Log_Sales`)
- Engineer promotional flags and seasonal features (Month, Weekday, Year)
- Output a clean dataset for modeling

---

## 🔥 Next Steps After This Notebook

- Model log-sales as a function of price and promotions
- Estimate price elasticity across stores and products
- Build a Streamlit dashboard to visualize elasticity curves

---

## 🚀 Let's Get Started!

In [1]:
# 📚 Import necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# 🏗️ Set some basic visual configs
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context('talk')


In [5]:
# Load exploration-cleaned dataset
train_df2 = pd.read_csv(
    '../data/processed/train_df_exploration_clean.csv',
    index_col=0,
    parse_dates=['Date'],
    on_bad_lines='skip',
    low_memory=False
)

# After loading, still good practice:
train_df2['Date'] = pd.to_datetime(train_df2['Date'], errors='coerce')

