# TASK: To Extract Features From Date Variable

From a date variable, you can extract basic features like,
1. Month
2. Quarter
3. Semester
4. Year
5. Day
6. Day of the week
7. Weekend or not
8. Time difference in year / months / days etc

DataSet - Analytics Vidhya Website’s competition — Loan Default Challenge

In [1]:
import pandas as pd
import numpy as np 
import datetime


In [2]:
cols = ['Date.of.Birth', 'DisbursalDate']
data = pd.read_csv('/Users/adityaagarwal/Aditya/Jupyter Notebook/EDA On Loan Approval Dataset/Loan_Approval_train_sample.csv', usecols=cols, nrows = 100)
data

Unnamed: 0,Date.of.Birth,DisbursalDate
0,01-01-84,03-08-18
1,31-07-85,26-09-18
2,24-08-85,01-08-18
3,30-12-93,26-10-18
4,09-12-77,26-09-18
...,...,...
95,11-01-83,28-09-18
96,01-01-78,08-08-18
97,01-01-68,23-10-18
98,09-04-90,07-09-18


In [3]:
# To rename the date of birth and disbursed column

data = data.rename(columns = {"Date.of.Birth": "dob", "DisbursalDate": "disbursedate"})
data.columns

Index(['dob', 'disbursedate'], dtype='object')

## Conversion to DateTime Datatype

### 1) .to_datetime Method - 

In [41]:
data['dob']= pd.to_datetime(data['dob']) 
data['disbursedate']= pd.to_datetime(data['disbursedate'])

print(data.dtypes)

dob             datetime64[ns]
disbursedate    datetime64[ns]
dtype: object


### 2) Series.dt -

Some of its attributes are
1. pandas.Series.dt.year returns the year of the date time.
2. pandas.Series.dt.month returns the month of the date time.
3. pandas.Series.dt.day returns the day of the date time.
4. pandas.Series.dt.quarter returns the quarter of the date time.

In [84]:
data['month'] = data['dob'].dt.month
data[['dob', 'month']].head()          # Printing only dob and month columns

Unnamed: 0,dob,month
0,1984-01-01,1
1,1985-07-31,7
2,1985-08-24,8
3,1993-12-30,12
4,1977-09-12,9


In [51]:
data['day'] = data['dob'].dt.day

data['year'] = data['dob'].dt.month

data['quarter'] = data['dob'].dt.quarter

data.head(10)

Unnamed: 0,dob,disbursedate,month,day,year,quarter
0,1984-01-01,2018-03-08,1,1,1,1
1,1985-07-31,2018-09-26,7,31,7,3
2,1985-08-24,2018-01-08,8,24,8,3
3,1993-12-30,2018-10-26,12,30,12,4
4,1977-09-12,2018-09-26,9,12,9,3
5,1990-08-09,2018-09-19,8,9,8,3
6,1988-01-06,2018-09-23,1,6,1,1
7,1989-04-10,2018-09-16,4,10,4,2
8,1991-11-15,2018-05-09,11,15,11,4
9,2068-01-06,2018-09-16,1,6,1,1


##  3) Semester from Date - ***

We can calculate the semester by applying a simple condition using the isin function on the quarter variable as follows,

1. Quarter 1 & 2 as Semester 1
2. the rest, Quarter 3 & 4 as Semester 2

In [57]:
# np.where() and isin() method

data['semester'] = np.where(data['quarter'].isin([1,2]), 1,2)
data.head(10)

Unnamed: 0,dob,disbursedate,month,day,year,quarter,semester
0,1984-01-01,2018-03-08,1,1,1,1,1
1,1985-07-31,2018-09-26,7,31,7,3,2
2,1985-08-24,2018-01-08,8,24,8,3,2
3,1993-12-30,2018-10-26,12,30,12,4,2
4,1977-09-12,2018-09-26,9,12,9,3,2
5,1990-08-09,2018-09-19,8,9,8,3,2
6,1988-01-06,2018-09-23,1,6,1,1,1
7,1989-04-10,2018-09-16,4,10,4,2,1
8,1991-11-15,2018-05-09,11,15,11,4,2
9,2068-01-06,2018-09-16,1,6,1,1,1


In [59]:
# Day Of The Week

data['dayofweek'] = data['dob'].dt.dayofweek
data.head(10)

Unnamed: 0,dob,disbursedate,month,day,year,quarter,semester,dayofweek
0,1984-01-01,2018-03-08,1,1,1,1,1,6
1,1985-07-31,2018-09-26,7,31,7,3,2,2
2,1985-08-24,2018-01-08,8,24,8,3,2,5
3,1993-12-30,2018-10-26,12,30,12,4,2,3
4,1977-09-12,2018-09-26,9,12,9,3,2,0
5,1990-08-09,2018-09-19,8,9,8,3,2,3
6,1988-01-06,2018-09-23,1,6,1,1,1,2
7,1989-04-10,2018-09-16,4,10,4,2,1,0
8,1991-11-15,2018-05-09,11,15,11,4,2,4
9,2068-01-06,2018-09-16,1,6,1,1,1,4


Similarly, series.dt.weekday_name attribute returns an array the name of the day in a week.

In [83]:
# Was showing error

data['dayofweek_name'] = data['dob'].dt.weekday_name
data.head(10)

AttributeError: 'DatetimeProperties' object has no attribute 'weekday_name'

In [88]:
# isin([values to be checked for ], return if true, return if false)

data['isweekday'] = np.where(data['dayofweek'].isin([6,7]),1,0)
print(data[['dob', 'dayofweek', 'isweekday']])                  

          dob  dayofweek  isweekday
0  1984-01-01          6          1
1  1985-07-31          2          0
2  1985-08-24          5          0
3  1993-12-30          3          0
4  1977-09-12          0          0
..        ...        ...        ...
95 1983-11-01          1          0
96 1978-01-01          6          1
97 2068-01-01          6          1
98 1990-09-04          1          0
99 1998-02-24          1          0

[100 rows x 3 columns]


In [94]:
data['isweekday'].value_counts()

0    80
1    20
Name: isweekday, dtype: int64

In [81]:
# TO Know When is the day of the week is 6 or 7 i.e saturday or sunday

print(data[data['dayofweek'] == 6])

          dob disbursedate  month  day  year  quarter  semester  dayofweek  \
0  1984-01-01   2018-03-08      1    1     1        1         1          6   
12 1974-01-06   2018-08-30      1    6     1        1         1          6   
14 1973-02-18   2018-08-31      2   18     2        1         1          6   
17 1983-02-27   2018-09-20      2   27     2        1         1          6   
38 1977-06-19   2018-10-20      6   19     6        2         1          6   
40 2069-01-06   2018-04-10      1    6     1        1         1          6   
42 1982-06-20   2018-10-21      6   20     6        2         1          6   
49 1980-09-14   2018-10-24      9   14     9        3         2          6   
51 1971-04-25   2018-10-23      4   25     4        2         1          6   
56 1995-01-01   2018-07-09      1    1     1        1         1          6   
59 2064-01-06   2018-08-24      1    6     1        1         1          6   
62 1984-01-01   2018-09-23      1    1     1        1         1 

##  Difference between two dates -

In [82]:
print((datetime.datetime.today() - data['dob']).head())

0   13639 days 21:04:09.022861
1   13062 days 21:04:09.022861
2   13038 days 21:04:09.022861
3    9988 days 21:04:09.022861
4   15941 days 21:04:09.022861
Name: dob, dtype: timedelta64[ns]
