# Q4: Feature Engineering

**Phase 5:** Feature Engineering & Aggregation  
**Points: 9 points**

**Focus:** Create derived features, perform time-based aggregations, calculate rolling windows.

**Lecture Reference:** Lecture 11, Notebook 2 ([`11/demo/02_wrangling_feature_engineering.ipynb`](https://github.com/christopherseaman/datasci_217/blob/main/11/demo/02_wrangling_feature_engineering.ipynb)), Phase 5. Also see Lecture 09 (rolling windows).

---

## Setup

In [24]:
# Import libraries
import pandas as pd
import numpy as np
import os
from IPython.display import display, Markdown

# Load wrangled data from Q3
df = pd.read_csv('output/q3_wrangled_data.csv', parse_dates=['Measurement Timestamp'], index_col='Measurement Timestamp')
# Or if you saved without index:
# df = pd.read_csv('output/q3_wrangled_data.csv')
# df['Measurement Timestamp'] = pd.to_datetime(df['Measurement Timestamp'])
# df = df.set_index('Measurement Timestamp')
print(f"Loaded {len(df):,} records with datetime index")

Loaded 78,177 records with datetime index


---

## Objective

Create derived features, perform time-based aggregations, and calculate rolling windows for time series analysis.

**Time Series Note:** Rolling windows are essential for time series data. They capture temporal dependencies (e.g., 7-hour rolling mean captures short-term patterns). See **Lecture 09** for time series rolling window operations. For hourly data, common window sizes are 7-24 hours (capturing daily patterns). Use pandas `rolling()` method with `window` parameter to specify the number of periods.

---

## Required Artifacts

You must create exactly these 3 files in the `output/` directory:

### 1. `output/q4_features.csv`
**Format:** CSV file
**Content:** Dataset with all derived features added
**Requirements:**
- All original columns from Q3
- All new derived features added as columns
- **No index column** (save with `index=False`)

### 2. `output/q4_rolling_features.csv`
**Format:** CSV file
**Content:** Dataset with rolling window features
**Required Columns:**
- Original datetime column
- At least one rolling window calculation column (e.g., `water_temp_rolling_7h`, `air_temp_rolling_24h`)

**Requirements:**
- Must include at least one rolling window calculation
- Rolling window names should be descriptive (e.g., `temp_rolling_7h` for 7-hour rolling mean)
- **No index column** (save with `index=False`)

**Example columns:**
```csv
Measurement Timestamp,wind_speed_rolling_7h,humidity_rolling_24h,pressure_rolling_7h
2022-01-01 00:00:00,6.8,65.2,1013.5
2022-01-01 01:00:00,6.9,65.3,1013.6
...
```

**Note:** The example shows rolling windows of predictor variables (wind speed, humidity, pressure), not the target variable. If you're predicting Air Temperature, do NOT create rolling windows of Air Temperature - this causes data leakage.

### 3. `output/q4_feature_list.txt`
**Format:** Plain text file
**Content:** List of new features created (one per line)
**Requirements:**
- One feature name per line
- No extra text, just feature names
- Include all derived features, rolling features, and categorical features created

**Example format:**
```
temp_difference
temp_ratio
wind_speed_squared
comfort_index
water_temp_rolling_7h
air_temp_rolling_24h
wind_speed_rolling_7h
temp_category
wind_category
```

---

## Requirements Checklist

- [ ] Derived features created (differences, ratios, interactions, etc.)
- [ ] Time-based aggregations performed (by hour, day, month, etc.) - optional but recommended
- [ ] At least one rolling window calculation (rolling mean, rolling median, etc.)
- [ ] Categorical features created (if applicable)
- [ ] Feature list documented
- [ ] All 3 required artifacts saved with exact filenames

---

## Your Approach

1. **Create derived features** - Differences, ratios, interactions between variables (watch for division by zero)
2. **Calculate rolling windows** - Use `.rolling()` on predictor variables to capture temporal patterns

   ‚ö†Ô∏è **Data Leakage Warning:** Do not create ANY features that use your target variable - this includes rolling windows, differences, ratios, or interactions involving the target. For example, if predicting Air Temperature, do not create `air_temp * humidity` or `air_temp - wet_bulb`. Only derive features from other predictor variables.

3. **Create categorical features** - Bin continuous variables if useful (optional)
4. **Check for infinity values** - Ratios can produce infinity; replace with NaN and handle appropriately
5. **Document and save** - Remember to `reset_index()` before saving CSVs

---

## Decision Points

- **Derived features:** What relationships might be useful? Temperature differences? Ratios? Interactions between variables?
- **Rolling windows:** What window size makes sense? 7 hours? 24 hours? Consider the temporal scale of your data. For hourly data, 7-24 hours captures daily patterns.
- **Time-based aggregations:** Aggregate by hour? Day? Week? What temporal granularity is useful for your analysis?

---

## Checkpoint

After Q4, you should have:
- [ ] Derived features created
- [ ] At least one rolling window calculation
- [ ] Feature list documented
- [ ] All 3 artifacts saved: `q4_features.csv`, `q4_rolling_features.csv`, `q4_feature_list.txt`

---

**Next:** Continue to `q5_pattern_analysis.md` for Pattern Analysis.


In [25]:
#1. CREATE DERIVED FEATURES FROM DATETIME INDEX
# Temperature derived features
df['wet_bulb_difference'] = df['Wet Bulb Temperature'].diff().fillna(0)
df['wet_bulb_humidity_ratio'] = df['Wet Bulb Temperature'] / (df['Humidity']) 
df['wet_bulb_humidity_ratio'] = df['wet_bulb_humidity_ratio'].replace([np.inf, -np.inf], np.nan)  # Handle division by zero 
df['wet_bulb_humidity_interaction'] = df['Wet Bulb Temperature'] * df['Humidity']

# Rain dervived features
df['rain_difference'] = df['Total Rain'] - df['Interval Rain']
df['rain_intensity_ratio'] = df['Rain Intensity'] / (df['Total Rain']) 
df['rain_intensity_ratio'] = df['rain_intensity_ratio'].replace([np.inf, -np.inf], np.nan)  # Handle division by zero
df['rain_humidity_interaction'] = df['Rain Intensity'] * df['Humidity']
df['Intervrain_humidity_interaction'] = df['Interval Rain'] * df['Humidity']
df['rain_pressure_interaction'] = df['Rain Intensity'] * df['Barometric Pressure']
df['rain_wind_interaction'] = df['Rain Intensity'] * df['Wind Speed']

# Wind-derived features
df['wind_range'] = df['Maximum Wind Speed'] - df['Wind Speed']
df['wind_speed_ratio'] = df['Wind Speed']/(df['Maximum Wind Speed'])
df['wind_speed_ratio'] = df['wind_speed_ratio'].replace([np.inf, -np.inf], np.nan)  # Handle division by zero
df['wind_speed_interaction'] = df['Wind Speed'] * df['Maximum Wind Speed']

# Wind-Humidity
df['wind_humidity_ratio'] = df['Wind Speed'] / (df['Humidity']) 
df['wind_humidity_ratio'] = df['wind_humidity_ratio'].replace([np.inf, -np.inf], np.nan)  # Handle division by zero
df['wind_pressure_interaction'] = df['Wind Speed'] * df['Barometric Pressure']

# Pressure-Humidity
df['pressure_humidity_ratio'] = df['Barometric Pressure'] / df['Humidity']
df['pressure_humidity_ratio'] = df['pressure_humidity_ratio'].replace([np.inf, -np.inf], np.nan)  # Handle division by zero
df['pressure_humidity_interaction'] = df['Barometric Pressure'] * df['Humidity']

# Solar-himoudity-rain and wind interactions
df['solar_totalrain_interaction'] = df['Solar Radiation'] * df['Total Rain']
df['solar_humidity_interaction'] = df['Solar Radiation'] * df['Humidity']
df['solar_pressure_ratio'] = df['Solar Radiation'] / (df['Barometric Pressure'])
df['solar_pressure_ratio'] = df['solar_pressure_ratio'].replace([np.inf, -np.inf], np.nan)  # Handle division by zero
df['solar_wind_interaction'] = df['Solar Radiation'] * df['Wind Speed']

# Save the final wrangled data
df.to_csv('output/q4_features.csv', index=False)
print("‚úì Saved: output/q4_features.csv")


‚úì Saved: output/q4_features.csv


In [57]:
# CALUCLAUTING ROLLING ON PREDICTORS VARIABLES (7-DAY ROLLING MEAN AND STD)
# Resample to hourly for rolling calculations
hourly_data = df.resample('h').agg({
    'Wet Bulb Temperature': 'mean',
    'Humidity': 'mean',
    'Total Rain': 'sum',
    'Wind Speed': 'mean',
    'Barometric Pressure': 'mean',
    'Solar Radiation': 'mean',
    'Wind Direction': 'mean',
    'Precipitation Type':'count',
    'Rain Intensity':'mean',
    'Interval Rain':'sum',
    'Battery Life':'mean'
})

hourly_data.columns = ['Wet Bulb Temperature', 'Humidity', 'Total Rain', 'Wind Speed', 'Barometric Pressure','Solar Radiation',
                        'Wind Direction','Precipitation Type','Rain Intensity','Interval Rain','Battery Life']
                    
hourly_data = hourly_data[['Wet Bulb Temperature', 'Humidity', 'Total Rain', 'Wind Speed', 'Barometric Pressure','Solar Radiation',
                        'Wind Direction','Precipitation Type','Rain Intensity','Interval Rain','Battery Life']]
display(Markdown("### ‚è±Ô∏è Resampled Hourly Data for Rolling Calculations"))       
display(hourly_data.head(20).round(2))

# Calculate 7-hour rolling mean
ROLLING_WINDOW_HOURS = 7  # 7-hour window 
for col in hourly_data.columns:
    hourly_data[f'{col}_7h_mean'] = hourly_data[col].rolling(window=ROLLING_WINDOW_HOURS, min_periods=1).mean()

display(Markdown("### üìà 7-Hour Rolling Mean"))
display(hourly_data[[f'{col}_7h_mean' for col in hourly_data.columns if not col.endswith('_7h_mean')]].head(20).round(2))

# Calculate 24-hour rolling mean and std
ROLLING_WINDOW_24HOURS = 24  # 24-hour window   
for col in hourly_data.columns:
    hourly_data[f'{col}_24h_mean'] = hourly_data[col].rolling(window=ROLLING_WINDOW_24HOURS, min_periods=1).mean()
display(Markdown("### üìà 24-Hour Rolling Mean"))
display(hourly_data[[f'{col}_24h_mean' for col in hourly_data.columns if not col.endswith('_24h_mean')]].head(20).round(2))

# Calculate monthly rolling mean and std
ROLLING_WINDOW_30DAYS = 30 * 24  # 30 days in hours
for col in hourly_data.columns:
    hourly_data[f'{col}_30d_mean'] = hourly_data[col].rolling(window=ROLLING_WINDOW_30DAYS, min_periods=1).mean()
display(Markdown("### üìà 30-Day Rolling Mean and Std Dev"))
display(hourly_data[[f'{col}_30d_mean' for col in hourly_data.columns if not col.endswith('_30d_mean')]].head(20).round(2))

#CREATE A SUBSET OF ONLY ROLLING VARIABLES
keep_cols = []

for col in hourly_data.columns:
    if col.endswith('_7h_mean') or col.endswith('_24h_mean') or col.endswith('_30d_mean'):
        keep_cols.append(col)
hourly_data = hourly_data[keep_cols]

# save rooling features     
hourly_data.to_csv('output/q4_rolling_features.csv', index=True)
print("‚úì Saved: output/q4_rolling_features.csv")



### ‚è±Ô∏è Resampled Hourly Data for Rolling Calculations

Unnamed: 0_level_0,Wet Bulb Temperature,Humidity,Total Rain,Wind Speed,Barometric Pressure,Solar Radiation,Wind Direction,Precipitation Type,Rain Intensity,Interval Rain,Battery Life
Measurement Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2015-05-22 19:00:00,14.8,58.0,7.3,1.9,990.4,79.0,115.0,1,0.0,0.0,15.1
2015-05-22 20:00:00,14.8,59.0,7.3,2.1,990.4,5.0,127.0,1,0.0,0.0,15.2
2015-05-22 21:00:00,14.8,62.0,7.3,2.8,990.4,0.0,65.0,1,0.0,0.0,15.2
2015-05-22 22:00:00,14.8,66.0,7.3,1.8,990.4,0.0,81.0,1,0.0,0.0,15.2
2015-05-22 23:00:00,14.8,63.0,7.3,2.0,990.4,0.0,145.0,1,0.0,0.0,15.1
2015-05-23 00:00:00,,,0.0,,,,,0,,0.0,
2015-05-23 01:00:00,,,0.0,,,,,0,,0.0,
2015-05-23 02:00:00,,,0.0,,,,,0,,0.0,
2015-05-23 03:00:00,,,0.0,,,,,0,,0.0,
2015-05-23 04:00:00,,,0.0,,,,,0,,0.0,


### üìà 7-Hour Rolling Mean

Unnamed: 0_level_0,Wet Bulb Temperature_7h_mean,Humidity_7h_mean,Total Rain_7h_mean,Wind Speed_7h_mean,Barometric Pressure_7h_mean,Solar Radiation_7h_mean,Wind Direction_7h_mean,Precipitation Type_7h_mean,Rain Intensity_7h_mean,Interval Rain_7h_mean,Battery Life_7h_mean
Measurement Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2015-05-22 19:00:00,14.8,58.0,7.3,1.9,990.4,79.0,115.0,1.0,0.0,0.0,15.1
2015-05-22 20:00:00,14.8,58.5,7.3,2.0,990.4,42.0,121.0,1.0,0.0,0.0,15.15
2015-05-22 21:00:00,14.8,59.67,7.3,2.27,990.4,28.0,102.33,1.0,0.0,0.0,15.17
2015-05-22 22:00:00,14.8,61.25,7.3,2.15,990.4,21.0,97.0,1.0,0.0,0.0,15.18
2015-05-22 23:00:00,14.8,61.6,7.3,2.12,990.4,16.8,106.6,1.0,0.0,0.0,15.16
2015-05-23 00:00:00,14.8,61.6,6.08,2.12,990.4,16.8,106.6,0.83,0.0,0.0,15.16
2015-05-23 01:00:00,14.8,61.6,5.21,2.12,990.4,16.8,106.6,0.71,0.0,0.0,15.16
2015-05-23 02:00:00,14.8,62.5,4.17,2.17,990.4,1.25,104.5,0.57,0.0,0.0,15.18
2015-05-23 03:00:00,14.8,63.67,3.13,2.2,990.4,0.0,97.0,0.43,0.0,0.0,15.17
2015-05-23 04:00:00,14.8,64.5,2.09,1.9,990.4,0.0,113.0,0.29,0.0,0.0,15.15


### üìà 24-Hour Rolling Mean

Unnamed: 0_level_0,Wet Bulb Temperature_24h_mean,Humidity_24h_mean,Total Rain_24h_mean,Wind Speed_24h_mean,Barometric Pressure_24h_mean,Solar Radiation_24h_mean,Wind Direction_24h_mean,Precipitation Type_24h_mean,Rain Intensity_24h_mean,Interval Rain_24h_mean,...,Humidity_7h_mean_24h_mean,Total Rain_7h_mean_24h_mean,Wind Speed_7h_mean_24h_mean,Barometric Pressure_7h_mean_24h_mean,Solar Radiation_7h_mean_24h_mean,Wind Direction_7h_mean_24h_mean,Precipitation Type_7h_mean_24h_mean,Rain Intensity_7h_mean_24h_mean,Interval Rain_7h_mean_24h_mean,Battery Life_7h_mean_24h_mean
Measurement Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-05-22 19:00:00,14.8,58.0,7.3,1.9,990.4,79.0,115.0,1.0,0.0,0.0,...,58.0,7.3,1.9,990.4,79.0,115.0,1.0,0.0,0.0,15.1
2015-05-22 20:00:00,14.8,58.5,7.3,2.0,990.4,42.0,121.0,1.0,0.0,0.0,...,58.25,7.3,1.95,990.4,60.5,118.0,1.0,0.0,0.0,15.12
2015-05-22 21:00:00,14.8,59.67,7.3,2.27,990.4,28.0,102.33,1.0,0.0,0.0,...,58.72,7.3,2.06,990.4,49.67,112.78,1.0,0.0,0.0,15.14
2015-05-22 22:00:00,14.8,61.25,7.3,2.15,990.4,21.0,97.0,1.0,0.0,0.0,...,59.35,7.3,2.08,990.4,42.5,108.83,1.0,0.0,0.0,15.15
2015-05-22 23:00:00,14.8,61.6,7.3,2.12,990.4,16.8,106.6,1.0,0.0,0.0,...,59.8,7.3,2.09,990.4,37.36,108.39,1.0,0.0,0.0,15.15
2015-05-23 00:00:00,14.8,61.6,6.08,2.12,990.4,16.8,106.6,0.83,0.0,0.0,...,60.1,7.1,2.09,990.4,33.93,108.09,0.97,0.0,0.0,15.15
2015-05-23 01:00:00,14.8,61.6,5.21,2.12,990.4,16.8,106.6,0.71,0.0,0.0,...,60.32,6.83,2.1,990.4,31.49,107.88,0.94,0.0,0.0,15.15
2015-05-23 02:00:00,14.8,61.6,4.56,2.12,990.4,16.8,106.6,0.62,0.0,0.0,...,60.59,6.5,2.11,990.4,27.71,107.45,0.89,0.0,0.0,15.16
2015-05-23 03:00:00,14.8,61.6,4.06,2.12,990.4,16.8,106.6,0.56,0.0,0.0,...,60.93,6.12,2.12,990.4,24.63,106.29,0.84,0.0,0.0,15.16
2015-05-23 04:00:00,14.8,61.6,3.65,2.12,990.4,16.8,106.6,0.5,0.0,0.0,...,61.29,5.72,2.1,990.4,22.16,106.96,0.78,0.0,0.0,15.16


### üìà 30-Day Rolling Mean and Std Dev

Unnamed: 0_level_0,Wet Bulb Temperature_30d_mean,Humidity_30d_mean,Total Rain_30d_mean,Wind Speed_30d_mean,Barometric Pressure_30d_mean,Solar Radiation_30d_mean,Wind Direction_30d_mean,Precipitation Type_30d_mean,Rain Intensity_30d_mean,Interval Rain_30d_mean,...,Humidity_7h_mean_24h_mean_30d_mean,Total Rain_7h_mean_24h_mean_30d_mean,Wind Speed_7h_mean_24h_mean_30d_mean,Barometric Pressure_7h_mean_24h_mean_30d_mean,Solar Radiation_7h_mean_24h_mean_30d_mean,Wind Direction_7h_mean_24h_mean_30d_mean,Precipitation Type_7h_mean_24h_mean_30d_mean,Rain Intensity_7h_mean_24h_mean_30d_mean,Interval Rain_7h_mean_24h_mean_30d_mean,Battery Life_7h_mean_24h_mean_30d_mean
Measurement Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-05-22 19:00:00,14.8,58.0,7.3,1.9,990.4,79.0,115.0,1.0,0.0,0.0,...,58.0,7.3,1.9,990.4,79.0,115.0,1.0,0.0,0.0,15.1
2015-05-22 20:00:00,14.8,58.5,7.3,2.0,990.4,42.0,121.0,1.0,0.0,0.0,...,58.12,7.3,1.92,990.4,69.75,116.5,1.0,0.0,0.0,15.11
2015-05-22 21:00:00,14.8,59.67,7.3,2.27,990.4,28.0,102.33,1.0,0.0,0.0,...,58.32,7.3,1.97,990.4,63.06,115.26,1.0,0.0,0.0,15.12
2015-05-22 22:00:00,14.8,61.25,7.3,2.15,990.4,21.0,97.0,1.0,0.0,0.0,...,58.58,7.3,2.0,990.4,57.92,113.65,1.0,0.0,0.0,15.13
2015-05-22 23:00:00,14.8,61.6,7.3,2.12,990.4,16.8,106.6,1.0,0.0,0.0,...,58.83,7.3,2.01,990.4,53.81,112.6,1.0,0.0,0.0,15.13
2015-05-23 00:00:00,14.8,61.6,6.08,2.12,990.4,16.8,106.6,0.83,0.0,0.0,...,59.04,7.27,2.03,990.4,50.49,111.85,1.0,0.0,0.0,15.14
2015-05-23 01:00:00,14.8,61.6,5.21,2.12,990.4,16.8,106.6,0.71,0.0,0.0,...,59.22,7.2,2.04,990.4,47.78,111.28,0.99,0.0,0.0,15.14
2015-05-23 02:00:00,14.8,61.6,4.56,2.12,990.4,16.8,106.6,0.62,0.0,0.0,...,59.39,7.12,2.05,990.4,45.27,110.8,0.97,0.0,0.0,15.14
2015-05-23 03:00:00,14.8,61.6,4.06,2.12,990.4,16.8,106.6,0.56,0.0,0.0,...,59.56,7.0,2.05,990.4,42.98,110.3,0.96,0.0,0.0,15.14
2015-05-23 04:00:00,14.8,61.6,3.65,2.12,990.4,16.8,106.6,0.5,0.0,0.0,...,59.74,6.88,2.06,990.4,40.89,109.97,0.94,0.0,0.0,15.14


‚úì Saved: output/q4_rolling_features.csv


In [None]:
# 3. FEATURES INFORMATION IN TEXT
# List of  hourly rolling features
rolling_variables = [col for col in hourly_data.columns
                    if col.endswith('_7h_mean') or col.endswith('_24h_mean') or col.endswith('_30d_mean')]
print(rolling_variables)
# derived features

derived_features = [
    'wet_bulb_difference', 'wet_bulb_humidity_ratio', 'wet_bulb_humidity_interaction',
    'rain_difference', 'rain_intensity_ratio', 'rain_humidity_interaction',
    'Intervrain_humidity_interaction', 'rain_pressure_interaction', 'rain_wind_interaction',
    'wind_range', 'wind_speed_ratio', 'wind_speed_interaction',
    'wind_humidity_ratio', 'wind_pressure_interaction',
    'pressure_humidity_ratio', 'pressure_humidity_interaction',
    'solar_totalrain_interaction', 'solar_humidity_interaction',
    'solar_pressure_ratio', 'solar_wind_interaction'
]


# Create report
report = []
report.append("Features List:")
report.append(f"rolling :")
for col in rolling_varaibles:
    report.append(f"  - {col}") 
report.append(f" Derived features:") 
for col in derived_features:
    report.append(f"  - {col}") 

report.append("Categorical features:")
for col in category_created:
    report.append(f"  - {col}") 
                 

# Save report to file
with open('output/q4_features_list.txt', 'w') as f:
    for line in report:
        f.write(line + '\n')
print("‚úì Saved: output/q4_features_list.txt")


['Wet Bulb Temperature_7h_mean', 'Humidity_7h_mean', 'Total Rain_7h_mean', 'Wind Speed_7h_mean', 'Barometric Pressure_7h_mean', 'Solar Radiation_7h_mean', 'Wind Direction_7h_mean', 'Precipitation Type_7h_mean', 'Rain Intensity_7h_mean', 'Interval Rain_7h_mean', 'Battery Life_7h_mean', 'Wet Bulb Temperature_24h_mean', 'Humidity_24h_mean', 'Total Rain_24h_mean', 'Wind Speed_24h_mean', 'Barometric Pressure_24h_mean', 'Solar Radiation_24h_mean', 'Wind Direction_24h_mean', 'Precipitation Type_24h_mean', 'Rain Intensity_24h_mean', 'Interval Rain_24h_mean', 'Battery Life_24h_mean', 'Wet Bulb Temperature_7h_mean_24h_mean', 'Humidity_7h_mean_24h_mean', 'Total Rain_7h_mean_24h_mean', 'Wind Speed_7h_mean_24h_mean', 'Barometric Pressure_7h_mean_24h_mean', 'Solar Radiation_7h_mean_24h_mean', 'Wind Direction_7h_mean_24h_mean', 'Precipitation Type_7h_mean_24h_mean', 'Rain Intensity_7h_mean_24h_mean', 'Interval Rain_7h_mean_24h_mean', 'Battery Life_7h_mean_24h_mean', 'Wet Bulb Temperature_30d_mean', '

In [None]:
#Decision
# I created derived features by combining existing numeric columns through arithmetic operations like difference, ratio, and interaction terms.
# These new features can help capture complex relationships in the data that may improve model performance.
# While the difference in temperature features can capture sudden changes, interaction terms can reveal how two variables jointly influence our target variable(Air Temperature)
# While the 7-hour rolling mean captures short-term trends, daily temporal changes less than 6hours might not be captured in the analysis.
# However, the 24-hour may help identify daily trends
# For this analysis, I think weekly (7-day) rolling features will be most useful to capture both short-term and medium-term trends in the beach sensor data. It will smooth out daily fluctuations while preserving the underlying trend.