# Introduction: Get Date/Time Usage

In this notebook, we will see how to use the `get_datetime_info` function to extract date and time information (month, week, day of week, hour, minute, fraction of day, etc.) from a datetime in a pandas dataframe. This function will provide us with 17 attributes we can use for modeling or to examine patterns.

In [1]:
# Standard data science libraries
import pandas as pd
import numpy as np

# Options for pandas
pd.options.display.max_columns = 25

# Display all cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

from get_datetime_info import get_datetime_info

The data we are using is building energy data. It does not really matter what the data is so long as it has a datetime.

In [2]:
data = pd.read_csv('building_one_with_tz.csv', header = [0, 1], index_col = 0).loc[:, 'Energy']
data.head()

sensor,3
measured_at,Unnamed: 1_level_1
2017-12-14 06:00:00-05:00,6035.21404
2017-12-14 06:15:00-05:00,6182.405506
2017-12-14 06:30:00-05:00,6035.187942
2017-12-14 06:45:00-05:00,6035.192571
2017-12-14 07:00:00-05:00,6035.198581


We will use the `index` here that includes a time zone. __As an important note, the function converts times into local time when calculating the attributes if a time zone is passed in.__ To disable this behavior, do not pass in a time zone.

The function takes in a number of parameters:

* `df`: The dataframe
* `date_col`: a string with the name of the column containing the datetimes. This can also be "index" to use the index of the dataframe
* `timezone`: string with the timezone of the building. If provided, the times are converted to local time
* `drop`: boolean for whether the original column should be dropped form the dataframe

The return is a new dataframe with the 17 columns containing the attributes.

In [3]:
data_with_info = get_datetime_info(df=data, date_col='index', timezone='America/New_York', drop=False)
data_with_info.head()

sensor,3,utc,local,measured_at_Year,measured_at_Month,measured_at_Week,measured_at_Day,measured_at_Dayofweek,measured_at_Dayofyear,measured_at_Is_month_end,measured_at_Is_month_start,measured_at_Is_quarter_end,measured_at_Is_quarter_start,measured_at_Is_year_end,measured_at_Is_year_start,measured_at_Hour,measured_at_Minute,measured_at_Second,measured_at_FracDay,measured_at_FracWeek
measured_at,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2017-12-14 06:00:00-05:00,6035.21404,2017-12-14 11:00:00,2017-12-14 06:00:00,2017,12,50,14,3,348,False,False,False,False,False,False,6,0,0,0.25,0.464286
2017-12-14 06:15:00-05:00,6182.405506,2017-12-14 11:15:00,2017-12-14 06:15:00,2017,12,50,14,3,348,False,False,False,False,False,False,6,15,0,0.260417,0.465774
2017-12-14 06:30:00-05:00,6035.187942,2017-12-14 11:30:00,2017-12-14 06:30:00,2017,12,50,14,3,348,False,False,False,False,False,False,6,30,0,0.270833,0.467262
2017-12-14 06:45:00-05:00,6035.192571,2017-12-14 11:45:00,2017-12-14 06:45:00,2017,12,50,14,3,348,False,False,False,False,False,False,6,45,0,0.28125,0.46875
2017-12-14 07:00:00-05:00,6035.198581,2017-12-14 12:00:00,2017-12-14 07:00:00,2017,12,50,14,3,348,False,False,False,False,False,False,7,0,0,0.291667,0.470238


We can also run this without a timezone which means none of the times will be converted.

In [4]:
data_without_tz = get_datetime_info(df=data, date_col='index', timezone=None, drop=False)
data_without_tz.head()

sensor,3,measured_at_Year,measured_at_Month,measured_at_Week,measured_at_Day,measured_at_Dayofweek,measured_at_Dayofyear,measured_at_Is_month_end,measured_at_Is_month_start,measured_at_Is_quarter_end,measured_at_Is_quarter_start,measured_at_Is_year_end,measured_at_Is_year_start,measured_at_Hour,measured_at_Minute,measured_at_Second,measured_at_FracDay,measured_at_FracWeek
measured_at,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2017-12-14 06:00:00-05:00,6035.21404,2017,12,50,14,3,348,False,False,False,False,False,False,11,0,0,0.458333,0.494048
2017-12-14 06:15:00-05:00,6182.405506,2017,12,50,14,3,348,False,False,False,False,False,False,11,15,0,0.46875,0.495536
2017-12-14 06:30:00-05:00,6035.187942,2017,12,50,14,3,348,False,False,False,False,False,False,11,30,0,0.479167,0.497024
2017-12-14 06:45:00-05:00,6035.192571,2017,12,50,14,3,348,False,False,False,False,False,False,11,45,0,0.489583,0.498512
2017-12-14 07:00:00-05:00,6035.198581,2017,12,50,14,3,348,False,False,False,False,False,False,12,0,0,0.5,0.5


All of the attributes are now given in utc time. 

We can also use a column (not the index) with the same effect.

In [5]:
data_with_info2 = get_datetime_info(data.reset_index(), date_col='measured_at', timezone='America/New_York', drop = False)
data_with_info2.head()

sensor,measured_at,3,utc,local,measured_at_Year,measured_at_Month,measured_at_Week,measured_at_Day,measured_at_Dayofweek,measured_at_Dayofyear,measured_at_Is_month_end,measured_at_Is_month_start,measured_at_Is_quarter_end,measured_at_Is_quarter_start,measured_at_Is_year_end,measured_at_Is_year_start,measured_at_Hour,measured_at_Minute,measured_at_Second,measured_at_FracDay,measured_at_FracWeek
0,2017-12-14 06:00:00-05:00,6035.21404,2017-12-14 11:00:00,2017-12-14 06:00:00,2017,12,50,14,3,348,False,False,False,False,False,False,6,0,0,0.25,0.464286
1,2017-12-14 06:15:00-05:00,6182.405506,2017-12-14 11:15:00,2017-12-14 06:15:00,2017,12,50,14,3,348,False,False,False,False,False,False,6,15,0,0.260417,0.465774
2,2017-12-14 06:30:00-05:00,6035.187942,2017-12-14 11:30:00,2017-12-14 06:30:00,2017,12,50,14,3,348,False,False,False,False,False,False,6,30,0,0.270833,0.467262
3,2017-12-14 06:45:00-05:00,6035.192571,2017-12-14 11:45:00,2017-12-14 06:45:00,2017,12,50,14,3,348,False,False,False,False,False,False,6,45,0,0.28125,0.46875
4,2017-12-14 07:00:00-05:00,6035.198581,2017-12-14 12:00:00,2017-12-14 07:00:00,2017,12,50,14,3,348,False,False,False,False,False,False,7,0,0,0.291667,0.470238


In [6]:
np.all(np.equal(data_with_info.iloc[:, 3:].values, data_with_info2.iloc[:, 4:].values))

True

As we can see, both methods return the same values. 

We now have a dataframe with many date and time attributes that we can use as needed. This function is useful for rapidly creating new features for fitting a model or finding trends in data.