# Feature Creation Cheat Sheet üìä

This cheat sheet covers common feature creation techniques in Pandas for time series and categorical data.

---

## üìå Importing Required Libraries
```python
import pandas as pd
import numpy as np
```

---

## 1Ô∏è‚É£ Creating Dummy Variables
Convert categorical variables into numerical representations.

```python
# Sample Data
df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C', 'B', 'C']})

# Convert categorical column to dummy variables
df_dummies = pd.get_dummies(df, columns=['Category'], drop_first=True)

print(df_dummies)
```

---

## 2Ô∏è‚É£ Creating Lagged Variables
Lagging a variable means shifting its values backward to use past values as features.

```python
# Sample Time Series Data
df = pd.DataFrame({
    'date': pd.date_range(start='2024-01-01', periods=6, freq='M'),
    'sales': [100, 110, 120, 115, 125, 130]
})

df['sales_lag1'] = df['sales'].shift(1)  # 1-period lag
df['sales_lag2'] = df['sales'].shift(2)  # 2-period lag

print(df)
```

---

## 3Ô∏è‚É£ Calculating First Difference (ŒîX = X_t - X_{t-1})
The first difference helps in removing trends in time series data.

```python
df['sales_diff'] = df['sales'].diff()  # First difference

print(df)
```

---

## 4Ô∏è‚É£ Calculating Year-over-Year (YoY) Change
YoY compares the same period across different years.

```python
# Sample Data with Multiple Years
df = pd.DataFrame({
    'date': pd.date_range(start='2022-01-01', periods=24, freq='M'),
    'sales': np.random.randint(100, 500, size=24)
})

df['sales_yoy'] = df['sales'].pct_change(periods=12) * 100  # YoY % change

print(df)
```

---

## üîπ Additional Tips:
- Use `.fillna(0)` if NaN values appear after `shift()` or `diff()`.
- Convert `date` column to a **datetime index**: `df.set_index('date', inplace=True)`.
- Ensure data is **sorted chronologically** before shifting or differencing.

Happy Feature Engineering! üöÄ
