# Feature Engineering

## External Data Integration

### Data Source

Historical weather data was sourced from a fictional provider, "Global Weather Services (GWS)". The data is provided in CSV format and includes daily weather information for Hong Kong.

### Data Files

- `hk_weather_2023.csv`: Daily weather data for Hong Kong for the year 2023.
- `hk_weather_2024_jan.csv`: Daily weather data for Hong Kong for January 2024.

### Features

The weather dataset includes the following columns:

- `date`: The date of the weather observation (YYYY-MM-DD).
- `temperature_celsius`: The average daily temperature in Celsius.
- `humidity_percent`: The average daily humidity in percent.
- `wind_speed_kmh`: The average daily wind speed in kilometers per hour.

## Time-Based Features

We create several time-based features from the timestamp column, which can help the model capture cyclical patterns.

In [None]:
import pandas as pd
from src.feature_engineering.time_features import create_time_features

# Create a sample dataframe
df = pd.DataFrame({'timestamp': pd.to_datetime(['2023-01-01 10:00', '2023-01-02 12:00']])})

# Create time features
df_with_features = create_time_features(df.copy(), 'timestamp')
df_with_features.head()

## Lag Features

Lag features and rolling window statistics are crucial for time series forecasting. We create these features for some of the key numerical columns.

In [None]:
import numpy as np
from src.feature_engineering.lag_features import create_lag_features

# Create a sample dataframe
df = pd.DataFrame({'A': np.arange(10), 'B': np.arange(10, 20)})

# Create lag features
df_with_lag_features = create_lag_features(df.copy(), cols_to_lag=['A', 'B'], window_sizes=[1, 3])
df_with_lag_features.head()

## Weather Features

We can create interaction features between the weather data and time-based features to capture more complex relationships.

In [None]:
from src.feature_engineering.weather_features import create_weather_features

# Create a sample dataframe
df = pd.DataFrame({'temperature_celsius': [25, 26], 'humidity_percent': [80, 82], 'hour': [10, 11]})

# Create weather features
df_with_weather_features = create_weather_features(df.copy())
df_with_weather_features.head()

## Technical Features

Domain-specific features can be engineered to provide more insights to the model. Here, we calculate the temperature difference (delta T) for each chiller.

In [None]:
from src.feature_engineering.technical_features import create_technical_features

# Create a sample dataframe
df = pd.DataFrame({'CHR-01-CHWSWT': [10, 11], 'CHR-01-CHWRWT': [15, 16]})

# Create technical features
df_with_technical_features = create_technical_features(df.copy())
df_with_technical_features.head()