Following on from my last post where I discussed working with strings in Python Pandas. Today I am going to discuss working with dates.

Dates are one of the more frustrating elements that analysts and data people have to deal with. They are full of info but can be tricky to process.

In this blog we will cover the basics.


### The treasure trove of date/times

From date time we can get the year, month, day, hour, minute, second. Each of the date components can be represented as a string or numerically.

This incredible amount of insight that you can unravel from the data is what makes date and time so valuable. So lets get started.

### Representing dates

One of the issues with date is the different way dates are represented, examples include

YYYY/MM/DD
DD/MM/YYYY
MM/DD/YYYY

But there are far more ways that people represent dates

First we need to look at how Python's DateTime module works before you look at Pandas functions.

### Aware and Naive Objects

Datetimes can be Aware or Naive. Aware means they capture timezone tz data and daylight savings. Due to the extra complexity of Aware dates I will focus on Naive dates in this blog.

### Datetime module

We are going to be working with the Pythons datetime module. There are four classes that can be accessed from datetime these are:-
- datetime.date
- datetime.time
- datetime.datetime
- datetime.timedelta

Lets look at them one at a time.

## Working with Dates in Python

The date class in the datetime module uses the Gregorian calendar and gives access to year, month and day. Lets make a date object.

In [24]:
import pandas as pd
from datetime import date

d = date(2021,12,24)
print(d)
print(type(d))

2021-12-24
<class 'datetime.date'>


### Extracting features from datetime

Python makes it very easy to extract the year, month and day.

In [14]:
print(f'day {d.day}')
print(f'month {d.month}')
print(f'year {d.year}')

day 24
month 12
year 2021


## Working with Times in Python

The time class can be used from the Pythons Datetime module. Lets see how.

In [15]:
from datetime import time

t = time(17,24,20)
print(t)
print(type(t))

17:24:20
<class 'datetime.time'>


We can extract the hours, minutes and seconds from the time.

In [16]:
print(f'hours {t.hour}')
print(f'minutes {t.minute}')
print(f'seconds {t.second}')

hours 17
minutes 24
seconds 20


## Datetime in Python

Python has the option of combining date and time into one object. Lets see how.

In [17]:
from datetime import datetime

d1 = datetime(2021,12,24,17,24,20)
print(d1)
print(type(d1))

2021-12-24 17:24:20
<class 'datetime.datetime'>


## Separate out date and time from datetime

In [18]:
print(d1.date())
print(d1.time())

2021-12-24
17:24:20


## Get day of the week

There are two functions that can be used, weekday() or isoweekday(). Weekday returns an integer from 0 to 6. Isoweekday returns an integer from 1 to 7.

In [19]:
print(d1.weekday())
print(d1.isoweekday())

4
5


## Get the week number

In [20]:
d1.isocalendar()[1]

51

## Timedeltas in Python

Timedeltas are a way of storing an amount of time e.g. 2 hours or 3 days and 1 hour

There are a number of applications when timedeltas can be used one that comes to mind is length of exams.

To help understand the interplay between timedeltas and dates:-
- date + timedelta = date
- date + date   = timedelta

In [21]:
# lets use todays date and add a delta of 7 days
import datetime
tday = datetime.datetime(2021,12,24)
tdelta = datetime.timedelta(days=7)

tday + tdelta



datetime.datetime(2021, 12, 31, 0, 0)

Lets make a timedelta of 2 hours and 30 minutes.

In [22]:
datetime.timedelta(hours=2, minutes=30)

datetime.timedelta(seconds=9000)

# Datetime in Pandas

Now lets look at using datetime in Pandas.

The main method for converting dates to datetime is to_datetime()

You will note the date type is a timestamp, this is Pandas datetime equivalent.

### to_datetime()

In [25]:
date = pd.to_datetime('24th of December 2021')
print(date)
print(type(date))


2021-12-24 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'>


The to_datetime method can convert a large number of different string date formats but not all. So the format attribute can be used.

In [63]:
# date = pd.to_datetime('24~12~2021')
# print(date)


The above code will output an error because the date has tilde's as the separators which Pandas does not reconised. So lets pass the format= argument.

The codes can be found on this page
https://docs.python.org/3/library/datetime.html

In [30]:
date = pd.to_datetime('24~12~2021', format='%d~%m~%Y')
print(date)

2021-12-24 00:00:00


### to_timedelta

Timedelta's can be used to represent a duration, e.g. 1 day 3 hours 35 minutes

To capture timedeltas you can use the to_timedelta() method.

Timedelta objects display showing days, hours, minutes, seconds.

Lets create a timedelta object

In [50]:
td = pd.Timedelta('2 days 3 hours 10 minutes 5 seconds')
td

Timedelta('2 days 03:10:05')

Timedelta objects can be added or subtracted from each other or from a datetime object. Lets add a day to todays date.

In [48]:
datetime.datetime.now()+pd.to_timedelta(1, unit='D')

datetime.datetime(2021, 12, 29, 17, 10, 49, 740176)

### Extracting components from a timedelta object

It is possible to get the number of seconds or days from the timedelta object. The seconds will return the total time in seconds. The days will return the number of days.
But note if you want the full timedelta i.e. days and time represented in second you would need to use the total_seconds function.

In [54]:
print(td.seconds)
print(td.days)
print(td.total_seconds())

11405
2
184205.0


In [62]:
# lets test to see how many seconds in an hour
td2 = pd.Timedelta('1 hour')
print(f' seconds in an hour = {td2.total_seconds()}')
print((td2.total_seconds()/60)/60)

 seconds in an hour = 3600.0
1.0


### The dt accessor

So once you have a datetime series in Pandas you can then use the dt accessor to access the series in a vectorised manner meaning you will not need to iterate through the rows. the dt accessor is similar to the str accessor I talked about in my last blog.

Lets import the taxis dataset

In [None]:
import seaborn as sns
import pandas as pd

taxis = sns.load_dataset('taxis')
taxis.head(3)

Unnamed: 0,pickup,dropoff,passengers,distance,fare,tip,tolls,total,color,payment,pickup_zone,dropoff_zone,pickup_borough,dropoff_borough
0,2019-03-23 20:21:09,2019-03-23 20:27:24,1,1.6,7.0,2.15,0.0,12.95,yellow,credit card,Lenox Hill West,UN/Turtle Bay South,Manhattan,Manhattan
1,2019-03-04 16:11:55,2019-03-04 16:19:00,1,0.79,5.0,0.0,0.0,9.3,yellow,cash,Upper West Side South,Upper West Side South,Manhattan,Manhattan
2,2019-03-27 17:53:01,2019-03-27 18:00:25,1,1.37,7.5,2.36,0.0,14.16,yellow,credit card,Alphabet City,West Village,Manhattan,Manhattan


In [None]:
taxis.dtypes[0:2]

pickup     object
dropoff    object
dtype: object

The dates are not being read in as dates. If reading data in with read_csv you can use parse_dates. But in the case we are going to use to_datetime function.

In [None]:
taxis['pickup'] = pd.to_datetime(taxis['pickup'])
taxis['dropoff'] = pd.to_datetime(taxis['dropoff'])

In [None]:
taxis.dtypes[0:2]

pickup     datetime64[ns]
dropoff    datetime64[ns]
dtype: object

So now we have our datetime columns we are ready to use the dt accessor. Using the dt accessor there are a large number of objects you can access including:-
- day
- date
- time
- minute
- second
- day_of_week
- day_of_year
- and the list continues.


Using the day object it returns the day of the month. You will see it matches with the dataframe we loaded above.

In [None]:
taxis['dropoff'].dt.day.head(3)

0    23
1     4
2    27
Name: dropoff, dtype: int64

Using the time object it returns the time. Again you will see it matches with the dataframe we loaded above.

In [None]:
taxis['dropoff'].dt.time.head(3)

0    20:27:24
1    16:19:00
2    18:00:25
Name: dropoff, dtype: object

So that is all for today and hopefully gives you a good start working with datetime like objects in Python Pandas. 

Note if you need to use timezone the Python docs suggest using the pytz library as it holds a database of timezones. More info see this link.

https://docs.python.org/3.2/library/datetime.html?highlight=datetime

That is all for today. Hope you enjoyed.