<a href="https://colab.research.google.com/github/Janhavik24/Machine_Learning/blob/main/Datetime_Feature_Engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Datetime Feature Engineering

In [1]:
import pandas as pd
import numpy as np
import datetime
from fastai.tabular import add_datepart

 **While using machine learning models in order to forecast time series datetime features can prove really useful. One can simply create them one by one by using pandas and datetime library or can use add_datepart package from fastai to automatically create the features.**
  
  
  **In this notebook I'm going to use three different methodologies to create datetime features. So lets start with traditional method;**

## Using our very own Pandas

In [2]:
#We will create a series of minutely datetime values starting from 2005 to 2021 as shown below with some random numbers as values
df = pd.DataFrame()
df['datetime'] = pd.date_range(start = '2001-01-01', end = '2021-12-31', freq = '1Min')
#Let's consider poisson distribution in order to draw a random sample
df['No. of visitors'] = np.random.poisson(lam=1.0, size=len(df))
df1 = df.copy()
df.head()

Unnamed: 0,datetime,No. of visitors
0,2001-01-01 00:00:00,2
1,2001-01-01 00:01:00,1
2,2001-01-01 00:02:00,3
3,2001-01-01 00:03:00,0
4,2001-01-01 00:04:00,2


While creating the dataframe as mentioned above, following alias for frequency can come handy;


**Alias**     **Description**





B   :      business day frequency  
C   :      custom business day frequency (experimental)  
D   :       calendar day frequency  
W   :      weekly frequency  
M   :      month end frequency  
BM  :      business month end frequency  
CBM :      custom business month end frequency  
MS  :      month start frequency  
BMS :      business month start frequency  
CBMS:      custom business month start frequency  
Q   :      quarter end frequency  
BQ  :      business quarter endfrequency  
QS  :      quarter start frequency  
BQS :      business quarter start frequency  
A   :      year end frequency  
BA  :      business year end frequency  
AS  :      year start frequency  
BAS :      business year start frequency  
BH  :      business hour frequency  
H   :      hourly frequency  
T, min :   minutely frequency  
S   :      secondly frequency  
L, ms:     milliseconds  
U, us:     microseconds  
N    :     nanoseconds  


In [3]:
#Now as we have our dataframe let's create features from the datetime column
df['date'] = df.datetime.dt.date                                                #This will extract date from given datetime value
df['time'] = df.datetime.dt.time                                                #This particular command wwill extract time from given datetime value
df['minute'] = df.datetime.dt.minute                                            #It will extract only the minute part of the datetime value
df['Hour'] = df.datetime.dt.hour                                                #It will extract only the hour part of the datetime value
df['day'] = df.datetime.dt.day                                                  #It will give the position of day in that particular month
df['day_of_week'] = df.datetime.dt.dayofweek                                    #It will give the position of day in that particular week
# df['day_of_month'] = df.datetime.dt.days_in_month                             #It will give the position of day in that particular month(alternative for day only)
df['day_of_year'] = df.datetime.dt.dayofyear                                    #It will give the position of day in that particular year
df['day_name'] = df.datetime.dt.day_name()                                      #It will give the name of days of week
df['week'] = df.datetime.dt.isocalendar().week                                  #It will give the week in which given dateime value falls
df['week_of_year'] = df.datetime.dt.weekofyear                                  #It will return the position of the week in the given year in which datetime value falls
df['month'] = df.datetime.dt.month                                              #It will return the value of month for given datetime value
df['month_name'] = df.datetime.dt.month_name()                                  #It will return name of the month
df['year'] = df.datetime.dt.year                                                #It will extract the year part from given datetime value
df.tail()

  if sys.path[0] == '':


Unnamed: 0,datetime,No. of visitors,date,time,minute,Hour,day,day_of_week,day_of_year,day_name,week,week_of_year,month,month_name,year
11043356,2021-12-30 23:56:00,0,2021-12-30,23:56:00,56,23,30,3,364,Thursday,52,52,12,December,2021
11043357,2021-12-30 23:57:00,3,2021-12-30,23:57:00,57,23,30,3,364,Thursday,52,52,12,December,2021
11043358,2021-12-30 23:58:00,2,2021-12-30,23:58:00,58,23,30,3,364,Thursday,52,52,12,December,2021
11043359,2021-12-30 23:59:00,1,2021-12-30,23:59:00,59,23,30,3,364,Thursday,52,52,12,December,2021
11043360,2021-12-31 00:00:00,2,2021-12-31,00:00:00,0,0,31,4,365,Friday,52,52,12,December,2021


Now as we saw the pandas commands to create features; but here we have to type separate command for each feature, thus let's move to more convinient way that will help us to quickly create features with least possible amount of code.

## Datetime library

In [4]:
df = df1.copy()
df['date'] = df.datetime.apply(lambda x: x.date())
df['month'] = df.datetime.apply(lambda x: x.month)
df['year'] = df.datetime.apply(lambda x: x.year)
df['week'] = df.datetime.apply(lambda x: x.weekday())
df['hour'] = df.datetime.apply(lambda x: x.hour)
df['time'] = df.datetime.apply(lambda x: x.time)
df['minute'] = df.datetime.apply(lambda x: x.minute)
df['second'] = df.datetime.apply(lambda x: x.second)
df['micosecond'] = df.datetime.apply(lambda x: x.microsecond)
df.head()

Unnamed: 0,datetime,No. of visitors,date,month,year,week,hour,time,minute,second,micosecond
0,2001-01-01 00:00:00,2,2001-01-01,1,2001,0,0,<built-in method time of Timestamp object at 0...,0,0,0
1,2001-01-01 00:01:00,1,2001-01-01,1,2001,0,0,<built-in method time of Timestamp object at 0...,1,0,0
2,2001-01-01 00:02:00,3,2001-01-01,1,2001,0,0,<built-in method time of Timestamp object at 0...,2,0,0
3,2001-01-01 00:03:00,0,2001-01-01,1,2001,0,0,<built-in method time of Timestamp object at 0...,3,0,0
4,2001-01-01 00:04:00,2,2001-01-01,1,2001,0,0,<built-in method time of Timestamp object at 0...,4,0,0


## add_datepart package from Fastai library

In [5]:
df = df1.copy()
df = add_datepart(df, 'datetime')
df.head()

  for n in attr: df[prefix + n] = getattr(field.dt, n.lower())


Unnamed: 0,No. of visitors,datetimeYear,datetimeMonth,datetimeWeek,datetimeDay,datetimeDayofweek,datetimeDayofyear,datetimeIs_month_end,datetimeIs_month_start,datetimeIs_quarter_end,datetimeIs_quarter_start,datetimeIs_year_end,datetimeIs_year_start,datetimeElapsed
0,2,2001,1,1,1,0,1,False,True,False,True,False,True,978307200
1,1,2001,1,1,1,0,1,False,True,False,True,False,True,978307260
2,3,2001,1,1,1,0,1,False,True,False,True,False,True,978307320
3,0,2001,1,1,1,0,1,False,True,False,True,False,True,978307380
4,2,2001,1,1,1,0,1,False,True,False,True,False,True,978307440


As we can see in just one line of code we created datetime features. But one can also note that mean while we have actually lost the original datetime column.