<br>

# <center> Date and Time Handling

<br>

---

<br>

<br>

## Import Libraries

In [2]:
# importing all the required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# importing modules from 'mltoolsh' loacal package
# Documentation : https://github.com/Shohrab-Hossain/mltoolsh
import mltoolsh.missingValues as _mv

<br>

## Dataset Overview

In [3]:
# dataset overview
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 396030 entries, 0 to 396029
Data columns (total 27 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   loan_amnt             396030 non-null  float64
 1   term                  396030 non-null  object 
 2   int_rate              396030 non-null  float64
 3   installment           396030 non-null  float64
 4   grade                 396030 non-null  object 
 5   sub_grade             396030 non-null  object 
 6   emp_title             373103 non-null  object 
 7   emp_length            377729 non-null  object 
 8   home_ownership        396030 non-null  object 
 9   annual_inc            396030 non-null  float64
 10  verification_status   396030 non-null  object 
 11  issue_d               396030 non-null  object 
 12  loan_status           396030 non-null  object 
 13  purpose               396030 non-null  object 
 14  title                 394275 non-null  object 
 15  

> comment : This dataset has 27 columns.

<br>

### The `'issue_d'` column will be used in this illustration to handle date and time data.


In [4]:
# viewing the column
df['issue_d']

0         Jan-2015
1         Jan-2015
2         Jan-2015
3         Nov-2014
4         Apr-2013
            ...   
396025    Oct-2015
396026    Feb-2015
396027    Oct-2013
396028    Aug-2012
396029    Jun-2010
Name: issue_d, Length: 396030, dtype: object

In [5]:
# transforming the value to pandas date_time
df['issue_d'] = df['issue_d'].apply(lambda x : pd.to_datetime(x))

In [6]:
# Checking the column 
df['issue_d']

0        2015-01-01
1        2015-01-01
2        2015-01-01
3        2014-11-01
4        2013-04-01
            ...    
396025   2015-10-01
396026   2015-02-01
396027   2013-10-01
396028   2012-08-01
396029   2010-06-01
Name: issue_d, Length: 396030, dtype: datetime64[ns]

comment : The column tranformed to pandas date_time format. Now we can access date and time of the values as we want.

In [7]:
# we want to keep month and year data. So creating two separate column,
# one for month and one for year
# and then the main 'issue_d' column will be deleted

df['issue_month'] = df['issue_d'].apply(lambda x : x.month)
df['issue_year']  = df['issue_d'].apply(lambda x : x.year)

In [8]:
# viewing the created month and year column
df[['issue_month', 'issue_year']] 

Unnamed: 0,issue_month,issue_year
0,1,2015
1,1,2015
2,1,2015
3,11,2014
4,4,2013
...,...,...
396025,10,2015
396026,2,2015
396027,10,2013
396028,8,2012


In [9]:
# droping the main 'issue_d' column
df.drop('issue_d', axis=1, inplace=True)

In [10]:
# checking the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 396030 entries, 0 to 396029
Data columns (total 28 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   loan_amnt             396030 non-null  float64
 1   term                  396030 non-null  object 
 2   int_rate              396030 non-null  float64
 3   installment           396030 non-null  float64
 4   grade                 396030 non-null  object 
 5   sub_grade             396030 non-null  object 
 6   emp_title             373103 non-null  object 
 7   emp_length            377729 non-null  object 
 8   home_ownership        396030 non-null  object 
 9   annual_inc            396030 non-null  float64
 10  verification_status   396030 non-null  object 
 11  loan_status           396030 non-null  object 
 12  purpose               396030 non-null  object 
 13  title                 394275 non-null  object 
 14  dti                   396030 non-null  float64
 15  

comment : Two new columns added and the original 'issue_d' column is dropped.

<br>

### The `'earliest_cr_line'` column will be used in this illustration to handle date and time data.


In [11]:
# viewing the column
df['earliest_cr_line']

0         Jun-1990
1         Jul-2004
2         Aug-2007
3         Sep-2006
4         Mar-1999
            ...   
396025    Nov-2004
396026    Feb-2006
396027    Mar-1997
396028    Nov-1990
396029    Sep-1998
Name: earliest_cr_line, Length: 396030, dtype: object

In [12]:
# transforming the value to pandas date_time
df['earliest_cr_line'] = df['earliest_cr_line'].apply(lambda x : pd.to_datetime(x))

In [13]:
# we want to keep only the year data. 
# so extracting the year data and replace the original 'earliest_cr_line' column data

df['earliest_cr_line'] = df['earliest_cr_line'].apply(lambda x : x.year)

In [14]:
# checking the column
df['earliest_cr_line']

0         1990
1         2004
2         2007
3         2006
4         1999
          ... 
396025    2004
396026    2006
396027    1997
396028    1990
396029    1998
Name: earliest_cr_line, Length: 396030, dtype: int64

comment : Now the column has the year data only.