# Calendar features

Calendar features serve as key elements in time series forecasting. These features decompose date and time into basic units such as year, month, day, weekday, etc., allowing models to identify recurring patterns, understand seasonal variations, and identify trends. Calendar features can be used as [exogenous variables](https://skforecast.org/latest/user_guides/exogenous-variables) because they are known for the period of time for which predictions are to be made (the forecast horizon).


**Dates and time in Pandas**

Pandas provides a comprehensive set of capabilities tailored for handling time series data in various domains. Using the NumPy datetime64 and timedelta64 data types, Pandas combines a wide range of functionality from various Python libraries while introducing a wealth of novel tools to effectively manipulate time series data. This includes:

+ Easily parse dates and time data from multiple sources and formats.

+ Generating sequences of fixed frequency dates and time spans.

+ Streamlining the manipulation and conversion of date-time information, including time zones.

+ Facilitate the resampling or conversion of time series data to specific frequencies.

For an in-depth exploration of Pandas' comprehensive time series and date capabilities, please refer to this [resource](https://pandas.pydata.org/docs/user_guide/timeseries.html)

## Libraries and data

In [1]:
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
plt.rcParams['lines.linewidth'] = 1.5

from lightgbm import LGBMRegressor
from skforecast.datasets import fetch_dataset
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.ForecasterAutoregDirect import ForecasterAutoregDirect

In [2]:
# Downloading data
# ==============================================================================
data = fetch_dataset(name="bike_sharing", raw=True)
data = data[['date_time', 'users']]
data.head()

bike_sharing
------------
Hourly usage of the bike share system in the city of Washington D.C. during the
years 2011 and 2012. In addition to the number of users per hour, information
about weather conditions and holidays is available.
Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.
https://doi.org/10.24432/C5W894.
Shape of the dataset: (17544, 12)


Unnamed: 0,date_time,users
0,2011-01-01 00:00:00,16.0
1,2011-01-01 01:00:00,40.0
2,2011-01-01 02:00:00,32.0
3,2011-01-01 03:00:00,13.0
4,2011-01-01 04:00:00,1.0


## Extract calendar features



To take advantage of the date-time functionality offered by Pandas, the column of interest must be stored as `datetime`. Although not required, it is recommended to set it as an index for further integration with skforecast.

In [3]:
# Preprocess data
# ==============================================================================
data['date_time'] = pd.to_datetime(data['date_time'], format='%Y-%m-%d %H:%M:%S')
data = data.set_index('date_time')
data = data.asfreq('H')
data = data.sort_index()
data.head(3)

Unnamed: 0_level_0,users
date_time,Unnamed: 1_level_1
2011-01-01 00:00:00,16.0
2011-01-01 01:00:00,40.0
2011-01-01 02:00:00,32.0


Next, several features are created from the date and time information: year, month, day of the week and hour.

In [8]:
# Create calendar features
# ==============================================================================
data['year'] = data.index.year
data['month'] = data.index.month
data['day_of_week'] = data.index.dayofweek
data['hour'] = data.index.hour
data.head(3)

Unnamed: 0_level_0,users,year,month,day_of_week,hour
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2011-01-01 00:00:00,16.0,2011,1,5,0
2011-01-01 01:00:00,40.0,2011,1,5,1
2011-01-01 02:00:00,32.0,2011,1,5,2


<script src="https://kit.fontawesome.com/d20edc211b.js" crossorigin="anonymous"></script>
<div class="admonition note" name="html-admonition" style="background: rgba(0,184,212,.1); padding-top: 0px; padding-bottom: 6px; border-radius: 8px;
border-left: 8px solid #00b8d4; border-color: #00b8d4; padding-left: 10px;">
<p class="title">
    <i class="fa-light fa-pencil fa" style="font-size: 18px; color:#00b8d4;"></i>
    <b> &nbsp Note</b>
</p>


Numerous calendar-related features can be generated, including day of the year, week of the year, hour of the day, and others. A seamless approach to automate their extraction is to use the <code>DatetimeFeatures</code> transformer within the **Feature-engine** Python library. This class integrates seamlessly into the scikit-learn pipeline, making it compatible with skforecast as well. For a deeper understanding and detailed information, please refer to <a href="https://feature-engine.trainindata.com/en/latest/user_guide/datetime/DatetimeFeatures.html#datetime-features">DatetimeFeatures</a>.

</div>

In [4]:
# Create calendar features with Feature-engine
# ==============================================================================
from feature_engine.datetime import DatetimeFeatures

transformer = DatetimeFeatures(
                variables="index",
                features_to_extract="all" # It is also possible to select specific features
              )
calendar_features = transformer.fit_transform(data)
calendar_features.head(3)

Unnamed: 0_level_0,users,month,quarter,semester,year,week,day_of_week,day_of_month,day_of_year,weekend,...,month_end,quarter_start,quarter_end,year_start,year_end,leap_year,days_in_month,hour,minute,second
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-01-01 00:00:00,16.0,1,1,1,2011,52,5,1,1,1,...,0,1,0,1,0,0,31,0,0,0
2011-01-01 01:00:00,40.0,1,1,1,2011,52,5,1,1,1,...,0,1,0,1,0,0,31,1,0,0
2011-01-01 02:00:00,32.0,1,1,1,2011,52,5,1,1,1,...,0,1,0,1,0,0,31,2,0,0


## Sunlight-Related Features


Sunlight often plays a key role in time series patterns. For example, a household's hourly electricity consumption may correlate significantly with whether it's nighttime, as more electricity is typically used for lighting during those hours. Understanding and incorporating sunlight-related characteristics into analyses can provide valuable insights into consumption patterns and behavioral trends. In addition, factors such as sunrise/sunset times, seasonal changes affecting daylight, and their influence on different data sets can provide deeper context and help predict consumption fluctuations. There are several Python libraries available for extracting sunrise and sunset times. Two of the most commonly used are `ephem` and `astral`.

In [43]:
# Features based on the sunligth
# ==============================================================================
from astral.sun import sun
from astral import LocationInfo

location = LocationInfo("Washington, D.C.", "USA")
sunrise_hour = [sun(location.observer, date=date)['sunrise'].hour for date in data.index]
sunset_hour = [sun(location.observer, date=date)['sunset'].hour for date in data.index]
sun_light_features = pd.DataFrame({
        'sunrise_hour': sunrise_hour,
        'sunset_hour': sunset_hour,
    }, index=data.index)
sun_light_features['daylight_hours'] = sun_light_features['sunset_hour'] - sun_light_features['sunrise_hour']
sun_light_features.head(3)

Unnamed: 0_level_0,sunrise_hour,sunset_hour,daylight_hours
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2011-01-01 00:00:00,8,16,8
2011-01-01 01:00:00,8,16,8
2011-01-01 02:00:00,8,16,8


# Cliclical encoding

Certain aspects of the calendar, such as hours or days of the week, behave in cycles. Using techniques such as trigonometric functions - sine and cosine transformations - makes it possible to represent cyclic patterns and avoid inconsistencies in data representation. This technique is called cyclic coding and can significantly improve the predictive power of models.

In [49]:
calendar_features

Unnamed: 0_level_0,users,month,quarter,semester,year,week,day_of_week,day_of_month,day_of_year,weekend,...,quarter_end,year_start,year_end,leap_year,days_in_month,hour,minute,second,month_sin,month_cos
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-01-01 00:00:00,16.0,1,1,1,2011,52,5,1,1,1,...,0,1,0,0,31,0,0,0,5.000000e-01,0.866025
2011-01-01 01:00:00,40.0,1,1,1,2011,52,5,1,1,1,...,0,1,0,0,31,1,0,0,5.000000e-01,0.866025
2011-01-01 02:00:00,32.0,1,1,1,2011,52,5,1,1,1,...,0,1,0,0,31,2,0,0,5.000000e-01,0.866025
2011-01-01 03:00:00,13.0,1,1,1,2011,52,5,1,1,1,...,0,1,0,0,31,3,0,0,5.000000e-01,0.866025
2011-01-01 04:00:00,1.0,1,1,1,2011,52,5,1,1,1,...,0,1,0,0,31,4,0,0,5.000000e-01,0.866025
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2012-12-31 19:00:00,119.0,12,4,2,2012,1,0,31,366,0,...,1,0,1,1,31,19,0,0,-2.449294e-16,1.000000
2012-12-31 20:00:00,89.0,12,4,2,2012,1,0,31,366,0,...,1,0,1,1,31,20,0,0,-2.449294e-16,1.000000
2012-12-31 21:00:00,90.0,12,4,2,2012,1,0,31,366,0,...,1,0,1,1,31,21,0,0,-2.449294e-16,1.000000
2012-12-31 22:00:00,61.0,12,4,2,2012,1,0,31,366,0,...,1,0,1,1,31,22,0,0,-2.449294e-16,1.000000


In [52]:
# Ciclical encoding of month, day of week and hour
# ==============================================================================
def cyclical_encoding(data, column, max_val):
    data = data[[column]].copy()
    data[f'{column}_sin'] = np.sin(2 * np.pi * data[column]/max_val)
    data[f'{column}_cos'] = np.cos(2 * np.pi * data[column]/max_val)
    return data

cyclical_features_month = cyclical_encoding(
    calendar_features,
    'month',
    12
)

cyclical_features_day_of_week = cyclical_encoding(calendar_features, 'day_of_week', 6) # max_val = 6 because 0 is Monday and 6 is Sunday
cyclical_features_hour = cyclical_encoding(calendar_features, 'hour', 24)

In [35]:
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>