### `Date and Time Variables`

- ***How to handle `Date` and `Time` based columns?***

- From `Date` column we can find many information about the days like *`(day, month, year, day of the week, quarter)`*
- From `Time` column we can find many information about the time like *`(hour, min, sec)`*
- Whenever dataset is imported using *`Pandas`* then by default the datatype of the *`date`* and *`time`* is of *`object`* type, i.e. *`string`*.
- So to perform any function related to date and time first we need to convert the column's datatype to *`datetime`*.

In [26]:
# importing the libraries

import pandas as pd
import numpy as np
import datetime

import warnings
warnings.filterwarnings('ignore')

In [2]:
# importing the datasets

date_df = pd.read_csv('datasets/orders.csv')
time_df = pd.read_csv('datasets/messages.csv')

In [3]:
# Checking the dataframes

date_df.head()

Unnamed: 0,date,product_id,city_id,orders
0,2019-12-10,5628,25,3
1,2018-08-15,3646,14,157
2,2018-10-23,1859,25,1
3,2019-08-17,7292,25,1
4,2019-01-06,4344,25,3


In [4]:
time_df.head()

Unnamed: 0,date,msg
0,2013-12-15 00:50:00,ищу на сегодня мужика 37
1,2014-04-29 23:40:00,ПАРЕНЬ БИ ИЩЕТ ДРУГА СЕЙЧАС!! СМС ММС 0955532826
2,2012-12-30 00:21:00,Днепр.м 43 позн.с д/ж *.о 067.16.34.576
3,2014-11-28 00:31:00,КИЕВ ИЩУ Д/Ж ДО 45 МНЕ СЕЙЧАС СКУЧНО 093 629 9...
4,2013-10-26 23:11:00,Зая я тебя никогда не обижу люблю тебя!) Даше


In [5]:
# Checking the informations

date_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   date        1000 non-null   object
 1   product_id  1000 non-null   int64 
 2   city_id     1000 non-null   int64 
 3   orders      1000 non-null   int64 
dtypes: int64(3), object(1)
memory usage: 31.4+ KB


In [6]:
time_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    1000 non-null   object
 1   msg     1000 non-null   object
dtypes: object(2)
memory usage: 15.8+ KB


### Working with *`Dates`*

In [7]:
# transforming the type of date column to datetime type

date_df['date'] = pd.to_datetime(date_df['date'])
date_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   date        1000 non-null   datetime64[ns]
 1   product_id  1000 non-null   int64         
 2   city_id     1000 non-null   int64         
 3   orders      1000 non-null   int64         
dtypes: datetime64[ns](1), int64(3)
memory usage: 31.4 KB


**Notes**

- So now the datatype of column *`date`* has changed to a *`datetime`* type.

#### 1. Extract `year`

- Here we need to use *`.dt.year`* with the column name.

In [8]:
date_df['year'] = date_df['date'].dt.year
date_df.head()

Unnamed: 0,date,product_id,city_id,orders,year
0,2019-12-10,5628,25,3,2019
1,2018-08-15,3646,14,157,2018
2,2018-10-23,1859,25,1,2018
3,2019-08-17,7292,25,1,2019
4,2019-01-06,4344,25,3,2019


#### 2. Extract `month`

- Here we need to use *`.dt.month`* with the column name.
- To see the month name we need to use *`.dt.month_name()`*

In [9]:
date_df['month'] = date_df['date'].dt.month
date_df.sample(5)

Unnamed: 0,date,product_id,city_id,orders,year,month
2,2018-10-23,1859,25,1,2018,10
371,2019-12-02,820,30,8,2019,12
299,2019-05-02,6438,24,8,2019,5
511,2018-12-02,1403,29,3,2018,12
581,2019-06-18,731,21,1,2019,6


In [10]:
date_df['name_of_month'] = date_df['date'].dt.month_name()
date_df.sample(5)

Unnamed: 0,date,product_id,city_id,orders,year,month,name_of_month
549,2018-11-11,6656,26,10,2018,11,November
901,2019-07-24,7148,16,9,2019,7,July
583,2019-05-17,2589,20,2,2019,5,May
53,2019-12-04,5193,16,2,2019,12,December
692,2019-02-18,7419,16,1,2019,2,February


#### 3. Extract `day`

- Here we need to use *`.dt.day`* with the column name to find the day of the month.
- To see which day of the week we need to use *`.dt.dayofweek`*. Here *`0`* means *`Monday`* and *`6`* means *`Sunday`*. 
- To see name of the day we need to use *`.dt.day_name()`*.
- To see which day of the year we need to use *`.dt.dayofyear`*

In [11]:
date_df['day'] = date_df['date'].dt.day
date_df.sample(5)

Unnamed: 0,date,product_id,city_id,orders,year,month,name_of_month,day
874,2019-05-26,2430,23,17,2019,5,May,26
427,2019-05-27,6072,26,60,2019,5,May,27
369,2019-01-02,2357,14,60,2019,1,January,2
6,2018-11-21,1282,26,1,2018,11,November,21
170,2019-03-25,6357,4,5,2019,3,March,25


In [12]:
date_df['weekday'] = date_df['date'].dt.dayofweek
date_df.sample(5)

Unnamed: 0,date,product_id,city_id,orders,year,month,name_of_month,day,weekday
67,2018-11-18,3604,16,5,2018,11,November,18,6
84,2018-07-30,7350,16,76,2018,7,July,30,0
101,2019-12-01,895,14,47,2019,12,December,1,6
940,2019-03-02,7208,6,1,2019,3,March,2,5
260,2019-02-04,4483,25,1,2019,2,February,4,0


In [13]:
date_df['dayname'] = date_df['date'].dt.day_name()
date_df.sample(5)

Unnamed: 0,date,product_id,city_id,orders,year,month,name_of_month,day,weekday,dayname
273,2019-11-03,149,14,24,2019,11,November,3,6,Sunday
895,2018-10-31,2714,28,8,2018,10,October,31,2,Wednesday
14,2019-05-29,3833,26,240,2019,5,May,29,2,Wednesday
116,2019-04-23,4523,13,1,2019,4,April,23,1,Tuesday
808,2019-03-04,995,25,1,2019,3,March,4,0,Monday


In [14]:
date_df['yearday'] = date_df['date'].dt.dayofyear
date_df.sample(5)

Unnamed: 0,date,product_id,city_id,orders,year,month,name_of_month,day,weekday,dayname,yearday
476,2019-04-04,5979,21,2,2019,4,April,4,3,Thursday,94
523,2019-06-18,3629,9,4,2019,6,June,18,1,Tuesday,169
562,2018-12-01,348,25,4,2018,12,December,1,5,Saturday,335
589,2019-01-02,1757,26,1,2019,1,January,2,2,Wednesday,2
996,2018-12-06,5521,7,1,2018,12,December,6,3,Thursday,340


***To check if that date is a `weekend` or not?***

In [19]:
# Here we will get only weekend as 'Yes' else the value will be 'No'

date_df['date_is_weekend'] = np.where(date_df['dayname'].isin(['Sunday', 'Saturday']), 'Yes','No')

date_df.drop(columns=['product_id','city_id','orders']).sample(10)

Unnamed: 0,date,year,month,name_of_month,day,weekday,dayname,yearday,date_is_weekend
283,2018-10-17,2018,10,October,17,2,Wednesday,290,No
152,2019-01-25,2019,1,January,25,4,Friday,25,No
618,2019-10-14,2019,10,October,14,0,Monday,287,No
83,2018-08-11,2018,8,August,11,5,Saturday,223,Yes
526,2018-09-29,2018,9,September,29,5,Saturday,272,Yes
497,2018-11-26,2018,11,November,26,0,Monday,330,No
54,2018-09-26,2018,9,September,26,2,Wednesday,269,No
470,2019-11-22,2019,11,November,22,4,Friday,326,No
847,2019-08-17,2019,8,August,17,5,Saturday,229,Yes
375,2019-07-14,2019,7,July,14,6,Sunday,195,Yes


#### 4. Extract `week of the year`

- Here we need to use *`.dt.week`* with the column name.

In [21]:
date_df['weeknumber'] = date_df['date'].dt.week
date_df.drop(columns=['product_id','city_id','orders']).sample(5)

Unnamed: 0,date,year,month,name_of_month,day,weekday,dayname,yearday,date_is_weekend,weeknumber
844,2019-04-03,2019,4,April,3,2,Wednesday,93,No,14
141,2018-12-17,2018,12,December,17,0,Monday,351,No,51
496,2019-03-16,2019,3,March,16,5,Saturday,75,Yes,11
743,2019-11-11,2019,11,November,11,0,Monday,315,No,46
266,2019-12-11,2019,12,December,11,2,Wednesday,345,No,50


#### 5. Extract `quarter`

- Here we need to use *`.dt.quarter`* with the column name.
- We can also extract the *`semester`* using the logic of the weekend.

In [23]:
date_df['quarternumber'] = date_df['date'].dt.quarter
date_df.drop(columns=['product_id','city_id','orders']).sample(5)

Unnamed: 0,date,year,month,name_of_month,day,weekday,dayname,yearday,date_is_weekend,weeknumber,quarternumber
468,2019-02-14,2019,2,February,14,3,Thursday,45,No,7,1
752,2019-06-21,2019,6,June,21,4,Friday,172,No,25,2
927,2018-12-04,2018,12,December,4,1,Tuesday,338,No,49,4
476,2019-04-04,2019,4,April,4,3,Thursday,94,No,14,2
931,2018-12-14,2018,12,December,14,4,Friday,348,No,50,4


In [25]:
# Extracting semester
# As the 1st 2 quarters are in 1st semester

date_df['semester'] = np.where(date_df['quarternumber'].isin([1, 2]), "1st Semester", "2nd Semester")
date_df.drop(columns=['product_id','city_id','orders', 'year', 'month']).sample(10)

Unnamed: 0,date,name_of_month,day,weekday,dayname,yearday,date_is_weekend,weeknumber,quarternumber,semester
340,2019-11-02,November,2,5,Saturday,306,Yes,44,4,2nd Semester
210,2018-10-13,October,13,5,Saturday,286,Yes,41,4,2nd Semester
707,2019-07-06,July,6,5,Saturday,187,Yes,27,3,2nd Semester
535,2019-10-26,October,26,5,Saturday,299,Yes,43,4,2nd Semester
473,2019-05-15,May,15,2,Wednesday,135,No,20,2,1st Semester
993,2019-10-08,October,8,1,Tuesday,281,No,41,4,2nd Semester
101,2019-12-01,December,1,6,Sunday,335,Yes,48,4,2nd Semester
934,2019-02-08,February,8,4,Friday,39,No,6,1,1st Semester
719,2018-11-14,November,14,2,Wednesday,318,No,46,4,2nd Semester
780,2018-11-07,November,7,2,Wednesday,311,No,45,4,2nd Semester


#### 6. Extract `Time elapsed between two dates`

- For this 1st import the *`datetime`* module
- Then do the substraction 

In [27]:
# Finding difference between today and each day of the 'date' column 

t_day = datetime.datetime.today()
t_day

datetime.datetime(2022, 11, 21, 12, 58, 45, 624705)

In [28]:
t_day - date_df['date']

0     1077 days 12:58:45.624705
1     1559 days 12:58:45.624705
2     1490 days 12:58:45.624705
3     1192 days 12:58:45.624705
4     1415 days 12:58:45.624705
                 ...           
995   1505 days 12:58:45.624705
996   1446 days 12:58:45.624705
997   1294 days 12:58:45.624705
998   1359 days 12:58:45.624705
999   1133 days 12:58:45.624705
Name: date, Length: 1000, dtype: timedelta64[ns]

In [29]:
# to see only the days

(t_day - date_df['date']).dt.days

0      1077
1      1559
2      1490
3      1192
4      1415
       ... 
995    1505
996    1446
997    1294
998    1359
999    1133
Name: date, Length: 1000, dtype: int64

### Working with *`time`*

In [30]:
# Converting the datatype

time_df['date'] = pd.to_datetime(time_df['date'])
time_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    1000 non-null   datetime64[ns]
 1   msg     1000 non-null   object        
dtypes: datetime64[ns](1), object(1)
memory usage: 15.8+ KB


#### 1. Extracting `hour, min and sec`

- To do this we use *`dt.hour`*, *`dt.minute`*, *`dt.second`*

In [31]:
time_df['hour'] = time_df['date'].dt.hour
time_df['min'] = time_df['date'].dt.minute
time_df['sec'] = time_df['date'].dt.second

time_df.sample(10)

Unnamed: 0,date,msg,hour,min,sec
755,2012-07-04 01:12:00,Обнаж М один дома ищу хочу Ж/Даму для встр!ммс...,1,12,0
867,2013-12-25 01:35:00,Ищу ДЕВ./ЖЕН. без комплексов для очень горячег...,1,35,0
179,2012-12-15 00:10:00,ИЩУ Ж/Д 30-45ЛЕТ.ДЛЯ ВСТРЕЧ.:)СНАЧАЛА СМС!КОСМ...,0,10,0
954,2015-08-28 01:11:00,ИЩУ ДЕВУШКУ ДЛЯ ВСТРЕЧ ОТНОШЕНИЙ ВОЗМОЖНО С/С.,1,11,0
970,2013-03-01 23:36:00,М. ПОЦЕЛУЮ НОЖКИ.... СТРОЙНОЙ СИМП. ДАМЕ О/С....,23,36,0
166,2013-02-08 00:10:00,М ищу Девш.Для.ЖИЗНИ..Красивую.и.простую.23/29...,0,10,0
638,2015-12-13 00:07:00,Ищу простую милую скромную девушку.мне 30,0,7,0
128,2012-04-17 00:40:00,ИЩУ ЖЕНЩИНУ КОТОРАЯ СМОЖЕТ УДИВИТЬ СМС 097 536...,0,40,0
786,2012-10-22 23:56:00,м 52 ищу не высокую молодую симпатичную не ...,23,56,0
607,2016-04-18 23:44:00,Приятный мужчина пригласит в гости пару МЖ. Дл...,23,44,0


#### 2. Extract `Time` part

- We need to use *`.dt.time`* to extract only the time part

In [32]:
time_df['time_part'] = time_df['date'].dt.time
time_df.head()

Unnamed: 0,date,msg,hour,min,sec,time_part
0,2013-12-15 00:50:00,ищу на сегодня мужика 37,0,50,0,00:50:00
1,2014-04-29 23:40:00,ПАРЕНЬ БИ ИЩЕТ ДРУГА СЕЙЧАС!! СМС ММС 0955532826,23,40,0,23:40:00
2,2012-12-30 00:21:00,Днепр.м 43 позн.с д/ж *.о 067.16.34.576,0,21,0,00:21:00
3,2014-11-28 00:31:00,КИЕВ ИЩУ Д/Ж ДО 45 МНЕ СЕЙЧАС СКУЧНО 093 629 9...,0,31,0,00:31:00
4,2013-10-26 23:11:00,Зая я тебя никогда не обижу люблю тебя!) Даше,23,11,0,23:11:00


**Finding the *`time difference`***

In [33]:
t_day - time_df['date']

0     3263 days 12:08:45.624705
1     3127 days 13:18:45.624705
2     3613 days 12:37:45.624705
3     2915 days 12:27:45.624705
4     3312 days 13:47:45.624705
                 ...           
995   3902 days 12:08:45.624705
996   3223 days 13:44:45.624705
997   3688 days 13:21:45.624705
998   3804 days 13:24:45.624705
999   3076 days 13:33:45.624705
Name: date, Length: 1000, dtype: timedelta64[ns]

In [34]:
# in seconds

(t_day - time_df['date'])/np.timedelta64(1,'s')

0      2.819669e+08
1      2.702207e+08
2      3.122087e+08
3      2.519009e+08
4      2.862065e+08
           ...     
995    3.371765e+08
996    2.785167e+08
997    3.186913e+08
998    3.287139e+08
999    2.658152e+08
Name: date, Length: 1000, dtype: float64

In [35]:
# in minutes

(t_day - time_df['date'])/np.timedelta64(1,'m')

0      4.699449e+06
1      4.503679e+06
2      5.203478e+06
3      4.198348e+06
4      4.770108e+06
           ...     
995    5.619609e+06
996    4.641945e+06
997    5.311522e+06
998    5.478565e+06
999    4.430254e+06
Name: date, Length: 1000, dtype: float64

In [36]:
# in hours

(t_day - time_df['date'])/np.timedelta64(1,'h')

0      78324.146007
1      75061.312674
2      86724.629340
3      69972.462674
4      79501.796007
           ...     
995    93660.146007
996    77365.746007
997    88525.362674
998    91309.412674
999    73837.562674
Name: date, Length: 1000, dtype: float64