#Exercise 12.03: Date manipulation on Financial Services Consumer Complaints
In this exercise, we will learn how to create new date-related features using pandas.

The dataset we will be using in this exercise is the Financial Services Customer Complaints dataset (seen in Chpater 4) and it can be found on our GitHub repository - https://github.com/PacktWorkshops/The-Data-Science-Workshop/blob/master/Chapter12/Dataset/Consumer_Complaints.csv

Note
The original dataset can be found here:
https://catalog.data.gov/dataset/consumer-complaint-database



1. Open on a new Colab notebook and import the pandas package

In [0]:
import pandas as pd

2. Assign the link to the dataset to a variable called 'file_url'




In [0]:
file_url = 'https://raw.githubusercontent.com/PacktWorkshops/The-Data-Science-Workshop/master/Chapter12/Dataset/Consumer_Complaints.csv'

3. Using the read_csv .method() from the package pandas, load the dataset into a new DataFrame called 'df'

In [21]:
df = pd.read_csv(file_url)

  interactivity=interactivity, compiler=compiler, result=result)


4. Display the first 5 rows using the .head() method

In [22]:
df.head()

Unnamed: 0,Complaint ID,Product,Sub-product,Issue,Sub-issue,State,ZIP code,Submitted via,Date received,Date sent to company,Company,Company response,Timely response?,Consumer disputed?
0,1114245,Debt collection,Medical,Disclosure verification of debt,Not given enough info to verify debt,FL,32219.0,Web,11/13/2014,11/13/2014,"Choice Recovery, Inc.",Closed with explanation,Yes,
1,1114488,Debt collection,Medical,Disclosure verification of debt,Right to dispute notice not received,TX,75006.0,Web,11/13/2014,11/13/2014,"Expert Global Solutions, Inc.",In progress,Yes,
2,1114255,Bank account or service,Checking account,Deposits and withdrawals,,NY,11102.0,Web,11/13/2014,11/13/2014,"FNIS (Fidelity National Information Services, ...",In progress,Yes,
3,1115106,Debt collection,"Other (phone, health club, etc.)",Communication tactics,Frequent or repeated calls,GA,31721.0,Web,11/13/2014,11/13/2014,"Expert Global Solutions, Inc.",In progress,Yes,
4,1115890,Credit reporting,,Incorrect information on credit report,Information is not mine,FL,33461.0,Web,11/12/2014,11/13/2014,TransUnion,In progress,Yes,


5. Print out the data types for each column using the .dtypes attribute

In [23]:
df.dtypes

Complaint ID              int64
Product                  object
Sub-product              object
Issue                    object
Sub-issue                object
State                    object
ZIP code                float64
Submitted via            object
Date received            object
Date sent to company     object
Company                  object
Company response         object
Timely response?         object
Consumer disputed?       object
dtype: object

The columns 'Date received' and 'Date sent to company' haven't been recognised as datetime so we need to manually convert them.

6. Convert the columns 'Date received' and 'Date sent to company' to datetime using the method pd.to_datetime()

In [0]:
df['Date received'] = pd.to_datetime(df['Date received'])
df['Date sent to company'] = pd.to_datetime(df['Date sent to company'])

7. Print out the data types for each column using the .dtypes attribute

In [25]:
df.dtypes

Complaint ID                     int64
Product                         object
Sub-product                     object
Issue                           object
Sub-issue                       object
State                           object
ZIP code                       float64
Submitted via                   object
Date received           datetime64[ns]
Date sent to company    datetime64[ns]
Company                         object
Company response                object
Timely response?                object
Consumer disputed?              object
dtype: object

Now these 2 columns have the right data types. Now let's create some new features from these 2 dates.

8. Create a new column called 'YearReceived' that will contain the year of each date using the .dt.year attribute

In [0]:
df['YearReceived'] = df['Date received'].dt.year

9. Create a new column called 'MonthReceived' that will contain the month of each date using the .dt.month attribute

In [0]:
df['MonthReceived'] = df['Date received'].dt.month

10. Create a new column called 'DayReceived' that will contain the day of the month for each date using the .dt.day attribute

In [0]:
df['DomReceived'] = df['Date received'].dt.day

11. Create a new column called 'DowReceived' that will contain the day of the week for each date using the .dt.dayofweek attribute

In [0]:
df['DowReceived'] = df['Date received'].dt.dayofweek

12. Display the first 5 rows using the .head() method

In [30]:
df.head()

Unnamed: 0,Complaint ID,Product,Sub-product,Issue,Sub-issue,State,ZIP code,Submitted via,Date received,Date sent to company,Company,Company response,Timely response?,Consumer disputed?,YearReceived,MonthReceived,DomReceived,DowReceived
0,1114245,Debt collection,Medical,Disclosure verification of debt,Not given enough info to verify debt,FL,32219.0,Web,2014-11-13,2014-11-13,"Choice Recovery, Inc.",Closed with explanation,Yes,,2014,11,13,3
1,1114488,Debt collection,Medical,Disclosure verification of debt,Right to dispute notice not received,TX,75006.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3
2,1114255,Bank account or service,Checking account,Deposits and withdrawals,,NY,11102.0,Web,2014-11-13,2014-11-13,"FNIS (Fidelity National Information Services, ...",In progress,Yes,,2014,11,13,3
3,1115106,Debt collection,"Other (phone, health club, etc.)",Communication tactics,Frequent or repeated calls,GA,31721.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3
4,1115890,Credit reporting,,Incorrect information on credit report,Information is not mine,FL,33461.0,Web,2014-11-12,2014-11-13,TransUnion,In progress,Yes,,2014,11,12,2


We can see we have successfully created the 4 new features: YearReceived,	MonthReceived, DayReceived and DowReceived. Now let's create another that will indicate if the date was during weekend or not.

13. Create a new column called 'IsWeekendReceived' that will contain binary values indicating if the column 'DowReceived' is over or equal to 5 (0 corresponds to Monday, 5 and 6 corresponds respectively to Saturday and Sunday)

In [0]:
df['IsWeekendReceived'] = df['DowReceived'] >= 5

14. Display the first 5 rows using the .head() method

In [32]:
df.head()

Unnamed: 0,Complaint ID,Product,Sub-product,Issue,Sub-issue,State,ZIP code,Submitted via,Date received,Date sent to company,Company,Company response,Timely response?,Consumer disputed?,YearReceived,MonthReceived,DomReceived,DowReceived,IsWeekendReceived
0,1114245,Debt collection,Medical,Disclosure verification of debt,Not given enough info to verify debt,FL,32219.0,Web,2014-11-13,2014-11-13,"Choice Recovery, Inc.",Closed with explanation,Yes,,2014,11,13,3,False
1,1114488,Debt collection,Medical,Disclosure verification of debt,Right to dispute notice not received,TX,75006.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3,False
2,1114255,Bank account or service,Checking account,Deposits and withdrawals,,NY,11102.0,Web,2014-11-13,2014-11-13,"FNIS (Fidelity National Information Services, ...",In progress,Yes,,2014,11,13,3,False
3,1115106,Debt collection,"Other (phone, health club, etc.)",Communication tactics,Frequent or repeated calls,GA,31721.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3,False
4,1115890,Credit reporting,,Incorrect information on credit report,Information is not mine,FL,33461.0,Web,2014-11-12,2014-11-13,TransUnion,In progress,Yes,,2014,11,12,2,False


We have created a new feature stating if each complaint was received during weekend or not. Now we are to feature engineer a new column with the numbers of days between 'Date sent to company' and 'Date received	'

15. Create a new column called 'RoutingDays' that will contain the difference between 'Date sent to company' and 'Date received '

In [0]:
df['RoutingDays'] = df['Date sent to company'] - df['Date received']

16. Print out the data type of the new column 'RoutingDays' using the attribute .dtype

In [34]:
df['RoutingDays'].dtype

dtype('<m8[ns]')

The results of substracting 2 datetime columns is a new column of type timedelta (duration). We need to convert into an int to get the number of days between these 2 days.

17. Transform the column 'RoutingDays' using the .dt.days attribute

In [0]:
df['RoutingDays'] = df['RoutingDays'].dt.days

18. Display the first 5 rows using the .head() method

In [36]:
df.head()


Unnamed: 0,Complaint ID,Product,Sub-product,Issue,Sub-issue,State,ZIP code,Submitted via,Date received,Date sent to company,Company,Company response,Timely response?,Consumer disputed?,YearReceived,MonthReceived,DomReceived,DowReceived,IsWeekendReceived,RoutingDays
0,1114245,Debt collection,Medical,Disclosure verification of debt,Not given enough info to verify debt,FL,32219.0,Web,2014-11-13,2014-11-13,"Choice Recovery, Inc.",Closed with explanation,Yes,,2014,11,13,3,False,0
1,1114488,Debt collection,Medical,Disclosure verification of debt,Right to dispute notice not received,TX,75006.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3,False,0
2,1114255,Bank account or service,Checking account,Deposits and withdrawals,,NY,11102.0,Web,2014-11-13,2014-11-13,"FNIS (Fidelity National Information Services, ...",In progress,Yes,,2014,11,13,3,False,0
3,1115106,Debt collection,"Other (phone, health club, etc.)",Communication tactics,Frequent or repeated calls,GA,31721.0,Web,2014-11-13,2014-11-13,"Expert Global Solutions, Inc.",In progress,Yes,,2014,11,13,3,False,0
4,1115890,Credit reporting,,Incorrect information on credit report,Information is not mine,FL,33461.0,Web,2014-11-12,2014-11-13,TransUnion,In progress,Yes,,2014,11,12,2,False,1


Congratulations! In this exercise, you put in practise the different techniques to feature engineer new variables from datatime columns from a real-world dataset. From the 2 columns  'Date sent to company' and 'Date received', you successfully created 6 new features.