# Date and time variables in Pandas

23 March 2019

This notebook collects my current understanding of how to work with time variables in Pandas

References: 
- http://strftime.org/
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html
- https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

In [75]:
import pandas as pd

## Intro

Let's start with some Pandas series that consist of strings with dates and times in different formats

In [3]:
dates1 = pd.Series(['3/5/1972', '24/5/1984', '1/11/1977'])
dates2 = pd.Series(['17 May 1982', '12 June 2016'])
dates3 = pd.Series(['23/5/1984 17:31:07'])
dates4 = pd.Series(['8 November 1948', '3/6/1983'])
dates5 = pd.Series(['12 Nov 1912', '8 July 1723', '3/7/1982', '12   May, 1932'])

Now we convert each of these to a Pandas datetime series using the to_datetime method. We specify the input along with the format that the datetime values are expressed in.

In [4]:
dates1_pd = pd.to_datetime(dates1, format="%d/%m/%Y")  # day, month, year
dates2_pd = pd.to_datetime(dates2, format="%d %B %Y")  # day, month name, year
# day, month year and hour, minute, seconds
dates3_pd = pd.to_datetime(dates3, format="%d/%m/%Y %H:%M:%S")
# infer the datetime format based on the structure of the input string
dates4_pd = pd.to_datetime(dates4, infer_datetime_format=True, dayfirst=True)
dates5_pd = pd.to_datetime(dates5, infer_datetime_format=True, dayfirst=True)

print(dates1_pd)
print(dates2_pd)
print(dates3_pd)
print(dates4_pd)
print(dates5_pd)

0   1972-05-03
1   1984-05-24
2   1977-11-01
dtype: datetime64[ns]
0   1982-05-17
1   2016-06-12
dtype: datetime64[ns]
0   1984-05-23 17:31:07
dtype: datetime64[ns]
0   1948-11-08
1   1983-06-03
dtype: datetime64[ns]
0   1912-11-12
1   1723-07-08
2   1982-07-03
3   1932-05-12
dtype: datetime64[ns]


## Working with Timestamps and extracting time information

Rather than writing a datetime as a string then converting it to a datetime type, we can create a series of datetime values directly using the Timestamp function. We can specifiy the year, month, day, hour, minute and second. 

Below we do this using keyword arguments.

In [5]:
dates6 = pd.Series([pd.Timestamp(year=1972, day=3, month=5), pd.Timestamp(year=1984, month=5, day=24), 
               pd.Timestamp(year=1977, month=11, day=1)])

Here we specify a datetime value another way, without used keyword arguments.

In [6]:
pd.Timestamp(1984, 5, 24, 9, 45, 21)

Timestamp('1984-05-24 09:45:21')

The <i>date_range</i> method allows us to create a series of datetime values between a start and end date with a specified frequency, e.g., days, months, hours etc. Let's a series consisting of all days between the release of Ghostbusters and today's date.

In [7]:
date_range_egon = pd.date_range(start='11/15/1984', end='23/03/2019', freq='D')

In [8]:
date_range_egon

DatetimeIndex(['1984-11-15', '1984-11-16', '1984-11-17', '1984-11-18',
               '1984-11-19', '1984-11-20', '1984-11-21', '1984-11-22',
               '1984-11-23', '1984-11-24',
               ...
               '2019-03-14', '2019-03-15', '2019-03-16', '2019-03-17',
               '2019-03-18', '2019-03-19', '2019-03-20', '2019-03-21',
               '2019-03-22', '2019-03-23'],
              dtype='datetime64[ns]', length=12547, freq='D')

We can put this data into a data frame and then extract the day and month names, using the <i>day_name()</i> and <i>month_name()</i> methods.

In [8]:
# convert to Pandas dataframe
df = pd.DataFrame(data={'Date': date_range_egon, 'Day': date_range_egon.day_name(), 
                        'Month': date_range_egon.month_name()})

It's sometimes useful to set the index of the dataframe to the datetime value. This then allows us to slice the dataframe based on an interval of date values.

In [9]:
# set the index to datetime value
df = df.set_index('Date')

# slice the dataframe based on time period
df['1987-12-01':'1987-12-24']

Unnamed: 0_level_0,Day,Month
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1987-12-01,Tuesday,December
1987-12-02,Wednesday,December
1987-12-03,Thursday,December
1987-12-04,Friday,December
1987-12-05,Saturday,December
1987-12-06,Sunday,December
1987-12-07,Monday,December
1987-12-08,Tuesday,December
1987-12-09,Wednesday,December
1987-12-10,Thursday,December


So which day of the week was Ghostbusters released on?

In [10]:
df['1984-11-15':'1984-11-15']

Unnamed: 0_level_0,Day,Month
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1984-11-15,Thursday,November


## Using accessors to extract days, years and months

Once a field has been converted to a datetime series, we can extract the day, year, month etc using <i>accessor</i> methods. I'm still learning about these. A nice explanation is available here: https://realpython.com/python-pandas-tricks/

In [77]:
# given a dataframe of datetime like values...
dt1 = pd.to_datetime(dates3, format="%d/%m/%Y %H:%M:%S")

# extract the time components using the dt accessor
print(dt1)
print(dt1.dt.minute)
print(dt1.dt.hour)
print(dt1.dt.day)
print(dt1.dt.day_name())
print(dt1.dt.month)
print(dt1.dt.month_name())
print(dt1.dt.year)
print(dt1.dt.quarter)

0   1984-05-23 17:31:07
dtype: datetime64[ns]
0    31
dtype: int64
0    17
dtype: int64
0    23
dtype: int64
0    Wednesday
dtype: object
0    5
dtype: int64
0    May
dtype: object
0    1984
dtype: int64
0    2
dtype: int64


## Time differences

Taking the difference between two Timestamp objects returns a Timedelta object.

In [14]:
# the number of days between when the date that Ghostbusters was released and today
time1 = pd.Timestamp(2019, 3, 23)
time2 = pd.Timestamp(1984, 11, 15)

# returns a Timedelta object
time_delta1 = (time1 - time2)

In [23]:
# how many days difference. Returns an integer
print(time_delta1.days)

# how many years difference?
print(time_delta1.days / 365)

12546
34.37260273972603


## Timezones

Finally let's look at timezones.

In [78]:
from pytz import all_timezones

In [80]:
# All the different timezones we can use
all_timezones[0:390]

['Africa/Abidjan',
 'Africa/Accra',
 'Africa/Addis_Ababa',
 'Africa/Algiers',
 'Africa/Asmara',
 'Africa/Asmera',
 'Africa/Bamako',
 'Africa/Bangui',
 'Africa/Banjul',
 'Africa/Bissau',
 'Africa/Blantyre',
 'Africa/Brazzaville',
 'Africa/Bujumbura',
 'Africa/Cairo',
 'Africa/Casablanca',
 'Africa/Ceuta',
 'Africa/Conakry',
 'Africa/Dakar',
 'Africa/Dar_es_Salaam',
 'Africa/Djibouti',
 'Africa/Douala',
 'Africa/El_Aaiun',
 'Africa/Freetown',
 'Africa/Gaborone',
 'Africa/Harare',
 'Africa/Johannesburg',
 'Africa/Juba',
 'Africa/Kampala',
 'Africa/Khartoum',
 'Africa/Kigali',
 'Africa/Kinshasa',
 'Africa/Lagos',
 'Africa/Libreville',
 'Africa/Lome',
 'Africa/Luanda',
 'Africa/Lubumbashi',
 'Africa/Lusaka',
 'Africa/Malabo',
 'Africa/Maputo',
 'Africa/Maseru',
 'Africa/Mbabane',
 'Africa/Mogadishu',
 'Africa/Monrovia',
 'Africa/Nairobi',
 'Africa/Ndjamena',
 'Africa/Niamey',
 'Africa/Nouakchott',
 'Africa/Ouagadougou',
 'Africa/Porto-Novo',
 'Africa/Sao_Tome',
 'Africa/Timbuktu',
 'Africa/

If it's midday in Adelaide, what time is it in Buenos Aires?

In [83]:
# define the midday time in Adelaide and 'localize' it to the Adelaide timezone
time_adelaide_midday = pd.Timestamp(year=2019, month=3, day=23, hour=12)
time_adelaide_midday = time_adelaide_midday.tz_localize('Australia/Adelaide')

In [85]:
# convert from one timezone to another
time_adelaide_midday_BA = time_adelaide_midday.tz_convert('America/Argentina/Buenos_Aires')
print(time_adelaide_midday_BA)

2019-03-22 22:30:00-03:00


In [87]:
# Now suppose we have a series in UTC
times = pd.date_range('2019/02/15', '2019/03/17', tz='UTC')

# What are these values in Adelaide time?
times.tz_convert('Australia/Adelaide')

DatetimeIndex(['2019-02-15 10:30:00+10:30', '2019-02-16 10:30:00+10:30',
               '2019-02-17 10:30:00+10:30', '2019-02-18 10:30:00+10:30',
               '2019-02-19 10:30:00+10:30', '2019-02-20 10:30:00+10:30',
               '2019-02-21 10:30:00+10:30', '2019-02-22 10:30:00+10:30',
               '2019-02-23 10:30:00+10:30', '2019-02-24 10:30:00+10:30',
               '2019-02-25 10:30:00+10:30', '2019-02-26 10:30:00+10:30',
               '2019-02-27 10:30:00+10:30', '2019-02-28 10:30:00+10:30',
               '2019-03-01 10:30:00+10:30', '2019-03-02 10:30:00+10:30',
               '2019-03-03 10:30:00+10:30', '2019-03-04 10:30:00+10:30',
               '2019-03-05 10:30:00+10:30', '2019-03-06 10:30:00+10:30',
               '2019-03-07 10:30:00+10:30', '2019-03-08 10:30:00+10:30',
               '2019-03-09 10:30:00+10:30', '2019-03-10 10:30:00+10:30',
               '2019-03-11 10:30:00+10:30', '2019-03-12 10:30:00+10:30',
               '2019-03-13 10:30:00+10:30', '2019-0