# Working with Dates

In [1]:
# Load the various libraries we'll need
import numpy as np
import pandas as pd

### Read the data

First, we read a CSV file that lists some dates and times. These come from Transport for London's records of cycle rentals that were part of the model answer to Week 3's EDA exercise and `Start.Date` gives the date and time at which a given cycle-rental started, while `End.Date` tells when it finished.

In [2]:
rental_df = pd.read_csv( "SampleDateTimeData.csv" )
rental_df.head()

Unnamed: 0,Start.Date,End.Date
0,10/01/2016 12:12,10/01/2016 13:05
1,10/01/2016 14:30,10/01/2016 14:37
2,10/01/2016 14:57,10/01/2016 15:05
3,10/01/2016 15:17,10/01/2016 15:24
4,10/01/2016 21:49,10/01/2016 22:08


Next, we look at how Pandas has treated these data. It will have recorded them with a `dtype` of `object` and, for all practical purposes, they will be strings.

In [3]:
rental_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Start.Date  100 non-null    object
 1   End.Date    100 non-null    object
dtypes: object(2)
memory usage: 1.7+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Start.Date  100 non-null    object
 1   End.Date    100 non-null    object
dtypes: object(2)
memory usage: 1.7+ KB


### Convert the date-time strings to `Datetime64` objects

Pandas is set up to handle dates and times that are recorded with NumPy's `Datetime64` type. The code below converts the strings we've read into this more useful type. 

**N.B.** The function that does the conversion, `to_datetime(),` accepts a `format` argument that allows one to convert date-time strings different from the ones shown here. These format-specifying strings use the same conventions as the built-in python function `strftime()`: see the [documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) for a more complete account.

In [4]:
# Convert the dates into DateTime objects  
rental_df["Start.Date"] = pd.to_datetime( rental_df["Start.Date"], format='%d/%m/%Y %H:%M' )
rental_df["End.Date"] = pd.to_datetime( rental_df["End.Date"], format='%d/%m/%Y %H:%M' )

rental_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Start.Date  100 non-null    datetime64[ns]
 1   End.Date    100 non-null    datetime64[ns]
dtypes: datetime64[ns](2)
memory usage: 1.7 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Start.Date  100 non-null    datetime64[ns]
 1   End.Date    100 non-null    datetime64[ns]
dtypes: datetime64[ns](2)
memory usage: 1.7 KB


### Compute differences between times

One can easily compute differences between `Datetime64` objects and then express those differences in various units. The code below computes the difference between the `Start.Date` and `End.Date` columns and return the result as a Pandas `Series` of NumPy `timedelta64` objects. Internally, the time differences are expressed in nanoseconds (`ns`).

In [5]:
durations = rental_df['End.Date'] - rental_df['Start.Date']
durations.info()

<class 'pandas.core.series.Series'>
RangeIndex: 100 entries, 0 to 99
Series name: None
Non-Null Count  Dtype          
--------------  -----          
100 non-null    timedelta64[ns]
dtypes: timedelta64[ns](1)
memory usage: 928.0 bytes
<class 'pandas.core.series.Series'>
RangeIndex: 100 entries, 0 to 99
Series name: None
Non-Null Count  Dtype          
--------------  -----          
100 non-null    timedelta64[ns]
dtypes: timedelta64[ns](1)
memory usage: 928.0 bytes


Finally, we use the Pandas `dt` accessor (see the [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html) for more examples) on the `Series` of `timedelta64` objects to compute the duration of the rental in units of seconds, minutes and hours.

In [6]:
secs_per_min = 60 
secs_per_hour = 60 * 60 
rental_df['Duration.in.Secs'] = durations.dt.total_seconds() 
rental_df['Duration.in.Mins'] = durations.dt.total_seconds() / secs_per_min
rental_df['Duration.in.Hrs'] = durations.dt.total_seconds() / secs_per_hour

rental_df.head()

Unnamed: 0,Start.Date,End.Date,Duration.in.Secs,Duration.in.Mins,Duration.in.Hrs
0,2016-01-10 12:12:00,2016-01-10 13:05:00,3180.0,53.0,0.883333
1,2016-01-10 14:30:00,2016-01-10 14:37:00,420.0,7.0,0.116667
2,2016-01-10 14:57:00,2016-01-10 15:05:00,480.0,8.0,0.133333
3,2016-01-10 15:17:00,2016-01-10 15:24:00,420.0,7.0,0.116667
4,2016-01-10 21:49:00,2016-01-10 22:08:00,1140.0,19.0,0.316667


### Extract time of day

Sometimes you may want to extract the time of day from a `Datetime64` column. You might, for example, want to look at variation across the 24 hours of the day. The code below does this by constructing a Datetime object that corresponds to the instant the day begins &mdash; 2016-01-10 00:00:00, for example &mdash; then measures time differences as above.

In [7]:
def hoursPastMidnight( datetime_series ):
    dt_at_midnight = datetime_series.dt.floor('D') # Gets datetime64's, with H:M:S set to 0's
    td_series = (datetime_series - dt_at_midnight) # Produces a series of timedelta64 objects
    hours_after_midnight = td_series.dt.total_seconds() / secs_per_hour # Get time difference in hours
    return( hours_after_midnight )


In [8]:
# Add two column to the data frame: "ToD" stands for "time of day"
rental_df["Start ToD"] = hoursPastMidnight( rental_df["Start.Date"] )
rental_df["End ToD"] = hoursPastMidnight( rental_df["End.Date"] )
rental_df.head()

Unnamed: 0,Start.Date,End.Date,Duration.in.Secs,Duration.in.Mins,Duration.in.Hrs,Start ToD,End ToD
0,2016-01-10 12:12:00,2016-01-10 13:05:00,3180.0,53.0,0.883333,12.2,13.083333
1,2016-01-10 14:30:00,2016-01-10 14:37:00,420.0,7.0,0.116667,14.5,14.616667
2,2016-01-10 14:57:00,2016-01-10 15:05:00,480.0,8.0,0.133333,14.95,15.083333
3,2016-01-10 15:17:00,2016-01-10 15:24:00,420.0,7.0,0.116667,15.283333,15.4
4,2016-01-10 21:49:00,2016-01-10 22:08:00,1140.0,19.0,0.316667,21.816667,22.133333
