# Feature Engineering for Bike Sharing Demand Prediction

In this notebook, we will perform feature engineering to enhance our dataset for the bike-sharing demand prediction model. This includes generating new features, handling categorical variables, and scaling numerical features.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
import datetime

In [None]:
# Load the processed dataset
data = pd.read_csv('../data/processed/bike_sharing_data.csv')
data.head()

In [None]:
# Feature Engineering

# Convert datetime to datetime object
data['datetime'] = pd.to_datetime(data['datetime'])

# Extract features from datetime
data['year'] = data['datetime'].dt.year
data['month'] = data['datetime'].dt.month
data['day'] = data['datetime'].dt.day
data['hour'] = data['datetime'].dt.hour
data['weekday'] = data['datetime'].dt.weekday

# Drop the original datetime column
data.drop('datetime', axis=1, inplace=True)

# Handle categorical variables
categorical_features = ['season', 'weather', 'month', 'hour', 'weekday']
one_hot_encoder = OneHotEncoder(sparse=False)
encoded_categorical = one_hot_encoder.fit_transform(data[categorical_features])

# Create a DataFrame with the new one-hot encoded features
encoded_df = pd.DataFrame(encoded_categorical, columns=one_hot_encoder.get_feature_names_out(categorical_features))

# Concatenate the original data with the new features
data = pd.concat([data.reset_index(drop=True), encoded_df.reset_index(drop=True)], axis=1)

# Drop the original categorical columns
data.drop(categorical_features, axis=1, inplace=True)

# Scale numerical features
numerical_features = ['temp', 'humidity', 'windspeed', 'year', 'day']
scaler = StandardScaler()
data[numerical_features] = scaler.fit_transform(data[numerical_features])

# Display the transformed dataset
data.head()

## Conclusion

In this notebook, we have successfully performed feature engineering on the bike-sharing dataset. We extracted new features from the datetime column, handled categorical variables using one-hot encoding, and scaled the numerical features. The transformed dataset is now ready for modeling.