# Calendar features

Calendar features serve as key elements in time series forecasting. These features decompose date and time into basic units such as year, month, day, weekday, etc., allowing models to identify recurring patterns, understand seasonal variations, and identify trends. Calendar features can be used as [exogenous variables](https://skforecast.org/latest/user_guides/exogenous-variables) because they are known for the period of time for which predictions are to be made (the forecast horizon).


**Dates and time in Pandas**

Pandas provides a comprehensive set of capabilities tailored for handling time series data in various domains. Using the NumPy datetime64 and timedelta64 data types, Pandas combines a wide range of functionality from various Python libraries while introducing a wealth of novel tools to effectively manipulate time series data. This includes:

+ Easily parse dates and time data from multiple sources and formats.

+ Generating sequences of fixed frequency dates and time spans.

+ Streamlining the manipulation and conversion of date-time information, including time zones.

+ Facilitate the resampling or conversion of time series data to specific frequencies.

For an in-depth exploration of Pandas' comprehensive time series and date capabilities, please refer to this [resource](https://pandas.pydata.org/docs/user_guide/timeseries.html).

## Libraries and data

In [29]:
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from skforecast.datasets import fetch_dataset

In [30]:
# Downloading data
# ==============================================================================
data = fetch_dataset(name="bike_sharing", raw=True)
data = data[['date_time', 'users']]
data.head()

bike_sharing
------------
Hourly usage of the bike share system in the city of Washington D.C. during the
years 2011 and 2012. In addition to the number of users per hour, information
about weather conditions and holidays is available.
Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.
https://doi.org/10.24432/C5W894.
Shape of the dataset: (17544, 12)


Unnamed: 0,date_time,users
0,2011-01-01 00:00:00,16.0
1,2011-01-01 01:00:00,40.0
2,2011-01-01 02:00:00,32.0
3,2011-01-01 03:00:00,13.0
4,2011-01-01 04:00:00,1.0


## Extract calendar features



To take advantage of the date-time functionality offered by Pandas, the column of interest must be stored as `datetime`. Although not required, it is recommended to set it as an index for further integration with skforecast.

In [31]:
# Preprocess data
# ==============================================================================
data['date_time'] = pd.to_datetime(data['date_time'], format='%Y-%m-%d %H:%M:%S')
data = data.set_index('date_time')
data = data.asfreq('H')
data = data.sort_index()
data.head()

Unnamed: 0_level_0,users
date_time,Unnamed: 1_level_1
2011-01-01 00:00:00,16.0
2011-01-01 01:00:00,40.0
2011-01-01 02:00:00,32.0
2011-01-01 03:00:00,13.0
2011-01-01 04:00:00,1.0


Next, several features are created from the date and time information: year, month, day of the week and hour.

In [32]:
# Create calendar features
# ==============================================================================
data['year'] = data.index.year
data['month'] = data.index.month
data['day_of_week'] = data.index.dayofweek
data['hour'] = data.index.hour
data.head()

Unnamed: 0_level_0,users,year,month,day_of_week,hour
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2011-01-01 00:00:00,16.0,2011,1,5,0
2011-01-01 01:00:00,40.0,2011,1,5,1
2011-01-01 02:00:00,32.0,2011,1,5,2
2011-01-01 03:00:00,13.0,2011,1,5,3
2011-01-01 04:00:00,1.0,2011,1,5,4


<script src="https://kit.fontawesome.com/d20edc211b.js" crossorigin="anonymous"></script>

<div class="admonition note" name="html-admonition" style="background: rgba(0,191,191,.1); padding-top: 0px; padding-bottom: 6px; border-radius: 8px;
border-left: 8px solid #00bfa5; border-color: #00bfa5; padding-left: 10px;">
<p class="title">
    <i class="fa-solid fa-fire-flame-curved" style="font-size: 18px; color:#00bfa5;"></i>
    <b> &nbsp Tip</b>
</p>


Numerous calendar-related features can be generated, including day of the year, week of the year, hour of the day, and others. A seamless approach to automate their extraction is to use the <code>DatetimeFeatures</code> transformer within the **Feature-engine** Python library. This class integrates seamlessly into the scikit-learn pipeline, making it compatible with skforecast as well. For a deeper understanding and detailed information, please refer to <a href="https://feature-engine.trainindata.com/en/latest/user_guide/datetime/DatetimeFeatures.html#datetime-features">DatetimeFeatures</a>.

</div>

In [33]:
# Create calendar features with Feature-engine
# ==============================================================================
from feature_engine.datetime import DatetimeFeatures

transformer = DatetimeFeatures(
                variables="index",
                features_to_extract="all" # It is also possible to select specific features
              )
calendar_features = transformer.fit_transform(data)
calendar_features.head()

Unnamed: 0_level_0,users,year,month,day_of_week,hour,quarter,semester,week,day_of_month,day_of_year,...,month_start,month_end,quarter_start,quarter_end,year_start,year_end,leap_year,days_in_month,minute,second
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-01-01 00:00:00,16.0,2011,1,5,0,1,1,52,1,1,...,1,0,1,0,1,0,0,31,0,0
2011-01-01 01:00:00,40.0,2011,1,5,1,1,1,52,1,1,...,1,0,1,0,1,0,0,31,0,0
2011-01-01 02:00:00,32.0,2011,1,5,2,1,1,52,1,1,...,1,0,1,0,1,0,0,31,0,0
2011-01-01 03:00:00,13.0,2011,1,5,3,1,1,52,1,1,...,1,0,1,0,1,0,0,31,0,0
2011-01-01 04:00:00,1.0,2011,1,5,4,1,1,52,1,1,...,1,0,1,0,1,0,0,31,0,0


## Sunlight-Related Features


Sunlight often plays a key role in time series patterns. For example, a household's hourly electricity consumption may correlate significantly with whether it's nighttime, as more electricity is typically used for lighting during those hours. Understanding and incorporating sunlight-related characteristics into analyses can provide valuable insights into consumption patterns and behavioral trends. In addition, factors such as sunrise/sunset times, seasonal changes affecting daylight, and their influence on different data sets can provide deeper context and help predict consumption fluctuations. There are several Python libraries available for extracting sunrise and sunset times. Two of the most commonly used are `ephem` and `astral`.

In [34]:
# Features based on the sunligth
# ==============================================================================
from astral.sun import sun
from astral import LocationInfo

location = LocationInfo("Washington, D.C.", "USA")
sunrise_hour = [sun(location.observer, date=date)['sunrise'].hour for date in data.index]
sunset_hour = [sun(location.observer, date=date)['sunset'].hour for date in data.index]
sun_light_features = pd.DataFrame({
                        'sunrise_hour': sunrise_hour,
                        'sunset_hour': sunset_hour,
                    }, index=data.index)
sun_light_features['daylight_hours'] = sun_light_features['sunset_hour'] - sun_light_features['sunrise_hour']
sun_light_features.head()

Unnamed: 0_level_0,sunrise_hour,sunset_hour,daylight_hours
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2011-01-01 00:00:00,8,16,8
2011-01-01 01:00:00,8,16,8
2011-01-01 02:00:00,8,16,8
2011-01-01 03:00:00,8,16,8
2011-01-01 04:00:00,8,16,8


# Cliclical encoding

Certain aspects of the calendar, such as hours of the day or days of the week, behave in cycles. For example, the hours of a day range from 0 to 23. If interpreted as a continuous variable, the hour of 23:00 is 23 units away from the hour of 00:00. However, this is not true, since 23:00 is only one hour away from 00:00. The same happens with the months of the year, since December is only one month away from January. Using techniques such as trigonometric functions - sine and cosine transformations - makes it possible to represent cyclic patterns and avoid inconsistencies in data representation. This technique is called cyclic coding and can significantly improve the predictive power of models.

In [35]:
# Ciclical encoding of month, day of week and hour
# ==============================================================================
def cyclical_encoding(data: pd.Series, max_val: int) -> pd.DataFrame:
    """
    Encode a cyclical feature with two new features sine and cosine.
    The minimum value of the feature is assumed to be 0. The maximum value
    of the feature is passed as an argument.
      
    Parameters
    ----------
    data : pd.Series
        Series with the feature to encode.
    max_val : int
        Maximum value of the feature. For example, 12 for months, 24 for hours, etc.
        This value is used to calculate the angle of the sin and cos.

    Returns
    -------
    pd.DataFrame
        Dataframe with the two new features sin and cos.
    """
    sin = np.sin(2 * np.pi * data/max_val)
    cos = np.cos(2 * np.pi * data/max_val)
    result =  pd.DataFrame({
        f"{data.name}_sin": sin,
        f"{data.name}_cos": cos
    })

    return result

month_encoded = cyclical_encoding(calendar_features['month'], max_val=12)
day_of_week_encoded = cyclical_encoding(calendar_features['day_of_week'], max_val=6)
hour_encoded = cyclical_encoding(calendar_features['hour'], max_val=23)
cyclical_features = pd.concat([month_encoded, day_of_week_encoded, hour_encoded], axis=1)
cyclical_features.head()

Unnamed: 0_level_0,month_sin,month_cos,day_of_week_sin,day_of_week_cos,hour_sin,hour_cos
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2011-01-01 00:00:00,0.5,0.866025,-0.866025,0.5,0.0,1.0
2011-01-01 01:00:00,0.5,0.866025,-0.866025,0.5,0.269797,0.962917
2011-01-01 02:00:00,0.5,0.866025,-0.866025,0.5,0.519584,0.854419
2011-01-01 03:00:00,0.5,0.866025,-0.866025,0.5,0.730836,0.682553
2011-01-01 04:00:00,0.5,0.866025,-0.866025,0.5,0.887885,0.460065


<script src="https://kit.fontawesome.com/d20edc211b.js" crossorigin="anonymous"></script>
<div class="admonition note" name="html-admonition" style="background: rgba(0,184,212,.1); padding-top: 0px; padding-bottom: 6px; border-radius: 8px;
border-left: 8px solid #00b8d4; border-color: #00b8d4; padding-left: 10px;">
<p class="title">
    <i class="fa-light fa-pencil fa" style="font-size: 18px; color:#00b8d4;"></i>
    <b> &nbsp Note</b>
</p>

See <a href="https://skforecast.org/latest/faq/cyclical-features-time-series.html" target="_blank">Cyclical features in time series forecasting</a> for a more detailed description of strategies for encoding cyclic features.

</div>