- **Author:** Aisling Towey
- **Date:** 2nd June 2021

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Import-Modules" data-toc-modified-id="Import-Modules-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Import Modules</a></span></li><li><span><a href="#Creating-Date-Time-Objects" data-toc-modified-id="Creating-Date-Time-Objects-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Creating Date Time Objects</a></span></li><li><span><a href="#DateTime-Arithmetic" data-toc-modified-id="DateTime-Arithmetic-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>DateTime Arithmetic</a></span></li><li><span><a href="#Timezones" data-toc-modified-id="Timezones-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Timezones</a></span></li><li><span><a href="#Working-with-Dates-and-Times-in-Pandas" data-toc-modified-id="Working-with-Dates-and-Times-in-Pandas-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Working with Dates and Times in Pandas</a></span></li></ul></div>

# Import Modules

A date is not a data type in python so we need to use other modules to work with them. There are many modules not built into python for dealing with dates such as Arrow and Delorean but the inbuilt modules allow you to do most of what you will need to do.

The main inbuilt modules to use are datetime and pytz. In this notebook we will look at how to use both of these modules and also how to work with datetimes in pandas.

Note - be aware of the following when using Pytz: https://blog.ganssle.io/articles/2018/03/pytz-fastest-footgun.html

In [1]:
from datetime import datetime, date, time, timedelta
import pytz
from pytz import timezone

# Creating Date Time Objects

First let's see how to create some datetime objects. We can use the datetime module to achieve this.

In [2]:
start_date_time = datetime(year=2021, month=5, day=30, hour=12, minute=15, second=15)
start_date_time

datetime.datetime(2021, 5, 30, 12, 15, 15)

We can use the date or time modules separately if we only want to get a date or time. We can also combine dates and times.

In [3]:
end_date = date(year=2021, month=7, day=5)
end_time = time(hour=16, minute=30, second=0)
end_date_time = datetime.combine(end_date, end_time)
end_date_time

datetime.datetime(2021, 7, 5, 16, 30)

We can get the date and time out of a datetime object.

In [4]:
print(end_date_time.date())
print(end_date_time.time())

2021-07-05
16:30:00


We can get the current date time as below.

In [5]:
print(datetime.now())
print(date.today())

2022-03-10 15:01:22.536777
2022-03-10


We may want to convert a string to datetime. We can use the .strptime() method for this. With .strptime() you tell Python what each of the parts of the string represents using different codes. See example below, you can find the codes here: https://www.programiz.com/python-programming/datetime/strptime

In [6]:
string_converted_time = datetime.strptime("30-05-2021 12:15:30", "%d-%m-%Y %H:%M:%S")
string_converted_time

datetime.datetime(2021, 5, 30, 12, 15, 30)

To convert this datetime to seconds (UTC) we can using the timestamp() method.

In [7]:
seconds_timestamp = datetime.timestamp(string_converted_time)
seconds_timestamp

1622373330.0

To convert UTC seconds to datetime we can use the fromtimestamp module().

In [8]:
datetime.fromtimestamp(seconds_timestamp)

datetime.datetime(2021, 5, 30, 12, 15, 30)

If you needed to change the datetime object back to a string you can use .strftime() instead of .strptime().
- datetime.strptime(): convert string to datetime object
- datetime.strftime(): convert datetime object to string

In [9]:
datetime.strftime(string_converted_time, "%d:%m:%Y %H:%M:%S")

'30:05:2021 12:15:30'

# DateTime Arithmetic

timedelta objects represent the difference / change in time between two datetime objects. If we take away one datetime object from another the result will be a timedelta object.

In [10]:
time_difference = end_date_time - start_date_time
time_difference

datetime.timedelta(days=36, seconds=15285)

We can use timedelta object to subtract or add to a datetime.

In [11]:
print(f'End Time: {end_date_time}')
new_end_time = end_date_time-timedelta(hours=24)
print(f'New End Time: {new_end_time}')

End Time: 2021-07-05 16:30:00
New End Time: 2021-07-04 16:30:00


# Timezones

The inbuilt pytz module can be used to deal with timezones. Let's look at the start_date_time. start_date_time is known as a "timezone naive" timestamp as it does not have any timezone information.

In [12]:
start_date_time

datetime.datetime(2021, 5, 30, 12, 15, 15)

Lets make start_date_time "timezone aware". We can do this using pytz timezones and the localize() method. When we use the localize() method the time will not change. It is assumed that the timezone you are assinging the time is the timezone the time was actually in even when it was naive.

In [13]:
start_date_time_aware = pytz.timezone('Europe/Moscow').localize(start_date_time)
start_date_time_aware # note the time has not changed but now this time is timezone aware

datetime.datetime(2021, 5, 30, 12, 15, 15, tzinfo=<DstTzInfo 'Europe/Moscow' MSK+3:00:00 STD>)

To view the names of all timezones we can use pytz.all_timezones().

In [14]:
pytz.all_timezones[0:5]

['Africa/Abidjan',
 'Africa/Accra',
 'Africa/Addis_Ababa',
 'Africa/Algiers',
 'Africa/Asmara']

Now we can change the timezone to any timezone we want.

In [15]:
start_date_time_aware.astimezone(pytz.timezone('US/Eastern'))

datetime.datetime(2021, 5, 30, 5, 15, 15, tzinfo=<DstTzInfo 'US/Eastern' EDT-1 day, 20:00:00 DST>)

# Working with Dates and Times in Pandas

We can easily work with datetimes in pandas dataframes. First we need to install and import pandas.

In [16]:
# !pip install pandas
import pandas as pd

Let's create a simple dataframe for us to work with.

In [17]:
df = pd.DataFrame({'date_time': ['1/6/2021 14:15:00', '2/6/2021 17:45:00', '3/6/2021 3:30:00', '3/2/2018 7:30:00', '1/10/2018 18:15:00'],
                   'count': [100, 200, 300, 1000, 500]})
df['date_time'] = pd.to_datetime(df['date_time']) 
# if the date is not in US format, we can add "dayfirst=True"
# we can also add format="%Y-%d-%m %H:%M:%S" depending on the structure of our timestamp string
df

Unnamed: 0,date_time,count
0,2021-01-06 14:15:00,100
1,2021-02-06 17:45:00,200
2,2021-03-06 03:30:00,300
3,2018-03-02 07:30:00,1000
4,2018-01-10 18:15:00,500


In pandas we use the dt accessor object to access the datetimelike properties of pandas Series. For example we can get the date and time individually from the datetime by using the dt accessor. We can get round datetimes up or down to the nearest hour or day using dt.floor() and dt.ceil(). There are multiple other attributes such as dt.dayofweek, dt.week. 

In [18]:
df['time'] = df['date_time'].dt.time
df['date'] = df['date_time'].dt.date
df['date_time_hour_floor'] = df['date_time'].dt.floor('H')
df['date_time_hour_ceiling'] = df['date_time'].dt.ceil('H')
df.head()

Unnamed: 0,date_time,count,time,date,date_time_hour_floor,date_time_hour_ceiling
0,2021-01-06 14:15:00,100,14:15:00,2021-01-06,2021-01-06 14:00:00,2021-01-06 15:00:00
1,2021-02-06 17:45:00,200,17:45:00,2021-02-06,2021-02-06 17:00:00,2021-02-06 18:00:00
2,2021-03-06 03:30:00,300,03:30:00,2021-03-06,2021-03-06 03:00:00,2021-03-06 04:00:00
3,2018-03-02 07:30:00,1000,07:30:00,2018-03-02,2018-03-02 07:00:00,2018-03-02 08:00:00
4,2018-01-10 18:15:00,500,18:15:00,2018-01-10,2018-01-10 18:00:00,2018-01-10 19:00:00


If we set the date_time column as the dataframe index we can select and filter data easier.

In [19]:
df = df.set_index(['date_time'])
df

Unnamed: 0_level_0,count,time,date,date_time_hour_floor,date_time_hour_ceiling
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-01-06 14:15:00,100,14:15:00,2021-01-06,2021-01-06 14:00:00,2021-01-06 15:00:00
2021-02-06 17:45:00,200,17:45:00,2021-02-06,2021-02-06 17:00:00,2021-02-06 18:00:00
2021-03-06 03:30:00,300,03:30:00,2021-03-06,2021-03-06 03:00:00,2021-03-06 04:00:00
2018-03-02 07:30:00,1000,07:30:00,2018-03-02,2018-03-02 07:00:00,2018-03-02 08:00:00
2018-01-10 18:15:00,500,18:15:00,2018-01-10,2018-01-10 18:00:00,2018-01-10 19:00:00


Now we can filter data with very little code. For example to get rows from 2021 we use the below. We can then perform aggregation and groupby operations on this filtered data.

In [20]:
df.loc['2021']

Unnamed: 0_level_0,count,time,date,date_time_hour_floor,date_time_hour_ceiling
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-01-06 14:15:00,100,14:15:00,2021-01-06,2021-01-06 14:00:00,2021-01-06 15:00:00
2021-02-06 17:45:00,200,17:45:00,2021-02-06,2021-02-06 17:00:00,2021-02-06 18:00:00
2021-03-06 03:30:00,300,03:30:00,2021-03-06,2021-03-06 03:00:00,2021-03-06 04:00:00


Working with timezones in pandas is similar to what we have already discussed. Let remove the date_time as an index and set as a standard column. We can then localize the date_time column.

In [21]:
df.reset_index(inplace=True)
df['date_time'] = df['date_time'].dt.tz_localize(pytz.timezone('Europe/Moscow'))
df.head()

Unnamed: 0,date_time,count,time,date,date_time_hour_floor,date_time_hour_ceiling
0,2021-01-06 14:15:00+03:00,100,14:15:00,2021-01-06,2021-01-06 14:00:00,2021-01-06 15:00:00
1,2021-02-06 17:45:00+03:00,200,17:45:00,2021-02-06,2021-02-06 17:00:00,2021-02-06 18:00:00
2,2021-03-06 03:30:00+03:00,300,03:30:00,2021-03-06,2021-03-06 03:00:00,2021-03-06 04:00:00
3,2018-03-02 07:30:00+03:00,1000,07:30:00,2018-03-02,2018-03-02 07:00:00,2018-03-02 08:00:00
4,2018-01-10 18:15:00+03:00,500,18:15:00,2018-01-10,2018-01-10 18:00:00,2018-01-10 19:00:00


Like previously we can then convert the date_time column to any timezone we want but we use dt.tz_covert() to achieve this.

In [22]:
df['date_time'] = df['date_time'].dt.tz_convert('US/Eastern')
df.head()

Unnamed: 0,date_time,count,time,date,date_time_hour_floor,date_time_hour_ceiling
0,2021-01-06 06:15:00-05:00,100,14:15:00,2021-01-06,2021-01-06 14:00:00,2021-01-06 15:00:00
1,2021-02-06 09:45:00-05:00,200,17:45:00,2021-02-06,2021-02-06 17:00:00,2021-02-06 18:00:00
2,2021-03-05 19:30:00-05:00,300,03:30:00,2021-03-06,2021-03-06 03:00:00,2021-03-06 04:00:00
3,2018-03-01 23:30:00-05:00,1000,07:30:00,2018-03-02,2018-03-02 07:00:00,2018-03-02 08:00:00
4,2018-01-10 10:15:00-05:00,500,18:15:00,2018-01-10,2018-01-10 18:00:00,2018-01-10 19:00:00


We can get rid of the timezone information by using dt.tz_localize(None).

In [23]:
df['date_time'] = df['date_time'].dt.tz_localize(None)
df.head()

Unnamed: 0,date_time,count,time,date,date_time_hour_floor,date_time_hour_ceiling
0,2021-01-06 06:15:00,100,14:15:00,2021-01-06,2021-01-06 14:00:00,2021-01-06 15:00:00
1,2021-02-06 09:45:00,200,17:45:00,2021-02-06,2021-02-06 17:00:00,2021-02-06 18:00:00
2,2021-03-05 19:30:00,300,03:30:00,2021-03-06,2021-03-06 03:00:00,2021-03-06 04:00:00
3,2018-03-01 23:30:00,1000,07:30:00,2018-03-02,2018-03-02 07:00:00,2018-03-02 08:00:00
4,2018-01-10 10:15:00,500,18:15:00,2018-01-10,2018-01-10 18:00:00,2018-01-10 19:00:00
