<a href="https://colab.research.google.com/github/dphi-official/Machine_Learning_Bootcamp/blob/master/Dealing_With_Object_Type_DateTime_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Exploring the datesets with dates and times are often very cumbersome. Python has a datatype specifically designed for dates and times called 'datetime'. But you will findy many datasets in which dates are represented as string (or object), which makes it more difficult to handle. So, in this tutorial you will how to convert the 'object' type date to 'datetime' format and also learn different methods and functions related to it.

We will use the dataset - 'Chicago Crime Detection'. There are 9 columns in this dataset.
*  ID: Record Id
*  Date: Date of Crime
*  LocationDescription: Crime location
*  Arrest: If an arrest was made for that crime or not
*  Domestic: If the location is domestic or not
*  Beat: The beat where the incident occurred.  A beat is the smallest police geographic area 
*  District: Indicates the police district where the incident occurred.
*  CommunityArea: Indicates the community area where the incident occurred. Chicago has 77 community areas.
*  Year: Year the incident occurred

### Import Pandas Library and Load the Dataset

In [None]:
import pandas as pd

In [None]:
data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/Chicago_Crime_Detective.csv", nrows=2000)

In [None]:
data.head()

Unnamed: 0.1,Unnamed: 0,ID,Date,LocationDescription,Arrest,Domestic,Beat,District,CommunityArea,Year
0,0,8951354,12/31/12 23:15,STREET,False,False,623,6.0,69,2012
1,1,8951141,12/31/12 22:00,STREET,False,False,1213,12.0,24,2012
2,2,8952745,12/31/12 22:00,RESIDENTIAL YARD (FRONT/BACK),False,False,1622,16.0,11,2012
3,3,8952223,12/31/12 22:00,STREET,False,False,724,7.0,67,2012
4,4,8951608,12/31/12 21:30,STREET,False,False,211,2.0,35,2012


In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Unnamed: 0           2000 non-null   int64  
 1   ID                   2000 non-null   int64  
 2   Date                 2000 non-null   object 
 3   LocationDescription  2000 non-null   object 
 4   Arrest               2000 non-null   bool   
 5   Domestic             2000 non-null   bool   
 6   Beat                 2000 non-null   int64  
 7   District             1961 non-null   float64
 8   CommunityArea        2000 non-null   int64  
 9   Year                 2000 non-null   int64  
dtypes: bool(2), float64(1), int64(5), object(2)
memory usage: 129.0+ KB


You can observe above that the 'Date' column is of Dtype 'object' i.e. string. You can convert this to 'datetime' type using the method .to_datetime() of pandas and then use .DatetimeIndex() method to convert the column to Datetime Index. This helps you perform different other operations on the datetime column.

In [None]:
data.Date = pd.to_datetime(data.Date)
data.Date = pd.DatetimeIndex(data.Date)

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   Unnamed: 0           2000 non-null   int64         
 1   ID                   2000 non-null   int64         
 2   Date                 2000 non-null   datetime64[ns]
 3   LocationDescription  2000 non-null   object        
 4   Arrest               2000 non-null   bool          
 5   Domestic             2000 non-null   bool          
 6   Beat                 2000 non-null   int64         
 7   District             1961 non-null   float64       
 8   CommunityArea        2000 non-null   int64         
 9   Year                 2000 non-null   int64         
dtypes: bool(2), datetime64[ns](1), float64(1), int64(5), object(1)
memory usage: 129.0+ KB


Now you can see the 'Date' column is datetime type.

### Getting the day name of the particular date

In [None]:
data['weekday'] = data.Date.dt.day_name()

In [None]:
data.head()

Unnamed: 0.1,Unnamed: 0,ID,Date,LocationDescription,Arrest,Domestic,Beat,District,CommunityArea,Year,weekday
0,0,8951354,2012-12-31 23:15:00,STREET,False,False,623,6.0,69,2012,Monday
1,1,8951141,2012-12-31 22:00:00,STREET,False,False,1213,12.0,24,2012,Monday
2,2,8952745,2012-12-31 22:00:00,RESIDENTIAL YARD (FRONT/BACK),False,False,1622,16.0,11,2012,Monday
3,3,8952223,2012-12-31 22:00:00,STREET,False,False,724,7.0,67,2012,Monday
4,4,8951608,2012-12-31 21:30:00,STREET,False,False,211,2.0,35,2012,Monday


The rightmost column is the week day name of the given dates.

### Getting the day number

In [None]:
data['day'] = data.Date.dt.day

In [None]:
data.head()

Unnamed: 0.1,Unnamed: 0,ID,Date,LocationDescription,Arrest,Domestic,Beat,District,CommunityArea,Year,weekday,day
0,0,8951354,2012-12-31 23:15:00,STREET,False,False,623,6.0,69,2012,Monday,31
1,1,8951141,2012-12-31 22:00:00,STREET,False,False,1213,12.0,24,2012,Monday,31
2,2,8952745,2012-12-31 22:00:00,RESIDENTIAL YARD (FRONT/BACK),False,False,1622,16.0,11,2012,Monday,31
3,3,8952223,2012-12-31 22:00:00,STREET,False,False,724,7.0,67,2012,Monday,31
4,4,8951608,2012-12-31 21:30:00,STREET,False,False,211,2.0,35,2012,Monday,31


### Getting the month number

In [None]:
data['month'] = data.Date.dt.month

In [None]:
data.head()

Unnamed: 0.1,Unnamed: 0,ID,Date,LocationDescription,Arrest,Domestic,Beat,District,CommunityArea,Year,weekday,day,month
0,0,8951354,2012-12-31 23:15:00,STREET,False,False,623,6.0,69,2012,Monday,31,12
1,1,8951141,2012-12-31 22:00:00,STREET,False,False,1213,12.0,24,2012,Monday,31,12
2,2,8952745,2012-12-31 22:00:00,RESIDENTIAL YARD (FRONT/BACK),False,False,1622,16.0,11,2012,Monday,31,12
3,3,8952223,2012-12-31 22:00:00,STREET,False,False,724,7.0,67,2012,Monday,31,12
4,4,8951608,2012-12-31 21:30:00,STREET,False,False,211,2.0,35,2012,Monday,31,12
