<div align="center"> <h1>Dates and times in Python and Pandas</h1>
    <h2><a href="...">Richard Leibrandt</a></h2>
</div>

The packages 'datetime' and 'time' allow us to deal with dates and times in Python. The Pandas package also has some methods and classes dedicated to dates and time. 

In [1]:
import datetime as dt
import time as tm
import pandas as pd
import numpy as np

now = dt.datetime.now()  # actual date and time

In [2]:
print(now)
print(now.year)

2021-12-02 21:01:17.294541
2021


```Timedelta``` objects facilitate adding and subtracting dates.

In [3]:
fivedays = dt.timedelta(days=5)

In [4]:
now-fivedays

datetime.datetime(2021, 11, 27, 21, 1, 17, 294541)

Dates are easily converted to strings and strings with dates are easily parsed.

The following symbols are used in formatting dates and times:

- Year: `%Y` (4 digits). `%y` (2 digits); 00-69 -> 2000-2069, 70-99 -> 1970-1999.
- Month: `%m` (2 digits), `%b` (abbreviated name in current locale), `%B` (full name in current locale).
- Day: `%d` (2 digits), `%e` (optional leading space)
- Hour: `%H`
- Minutes: `%M`
- Seconds: `%S` (integer seconds), `%OS` (partial seconds)
- Time zone: `%Z` (as name, e.g. America/Chicago), `%z` (as offset from UTC, e.g. +0800)
- Non-digits: `%. skips one non-digit character, `%*` skips any number of non-digit characters.
- Shortcuts:
       - `%D` = `%m%d%y`
       - `%F` = `%Y-%m-%d`
       - `%R` = `%H:%M`
       - `%T` = `%H:%M:%S`
       - `%x` = `%y%m%d`


In [5]:
dt.datetime.strftime(now, '%F %T')

'2021-12-02 21:01:17'

In [6]:
now.strftime('%b, %d. %Y')

'Dec, 02. 2021'

The inverse operation (the shortcuts do not work with ```strptime```):

In [7]:
onedate = dt.datetime.strptime('2016-07-04 14:45:31', '%Y-%m-%d %H:%M:%S' )
onedate

datetime.datetime(2016, 7, 4, 14, 45, 31)

In [8]:
type(onedate)

datetime.datetime

In Pandas, it's also easy to parse dates into datetime objects.

In [9]:
import urllib.request
urllib.request.urlretrieve("https://vincentarelbundock.github.io/Rdatasets/csv/Ecdat/Garch.csv", "Exchange_rates.csv")
rates=pd.read_csv("Exchange_rates.csv")
rates.head()

Unnamed: 0.1,Unnamed: 0,date,day,dm,ddm,bp,cd,dy,sf
0,1,800102,wednesday,0.5861,,2.249,0.8547,0.004206,0.6365
1,2,800103,thursday,0.5837,-0.004103,2.2365,0.8552,0.004187,0.6357
2,3,800104,friday,0.5842,0.000856,2.241,0.8566,0.004269,0.6355
3,4,800107,monday,0.5853,0.001881,2.2645,0.8538,0.004315,0.6373
4,5,800108,tuesday,0.5824,-0.004967,2.256,0.8553,0.004257,0.6329


In [10]:
rates=rates[["date","dm","bp"]] # keep only these three columns
rates.date=pd.to_datetime(rates.date, format="%y%m%d")
rates.head()

Unnamed: 0,date,dm,bp
0,1980-01-02,0.5861,2.249
1,1980-01-03,0.5837,2.2365
2,1980-01-04,0.5842,2.241
3,1980-01-07,0.5853,2.2645
4,1980-01-08,0.5824,2.256


The 'date' column has now a ```datetime64``` format.

In [11]:
rates.dtypes

date    datetime64[ns]
dm             float64
bp             float64
dtype: object

To take advantage of some Pandas possibilities we need to set the 'date' column as index.

In [12]:
rates=rates.set_index("date")

In [13]:
rates.index

DatetimeIndex(['1980-01-02', '1980-01-03', '1980-01-04', '1980-01-07',
               '1980-01-08', '1980-01-09', '1980-01-10', '1980-01-11',
               '1980-01-14', '1980-01-15',
               ...
               '1987-05-08', '1987-05-11', '1987-05-12', '1987-05-13',
               '1987-05-14', '1987-05-15', '1987-05-18', '1987-05-19',
               '1987-05-20', '1987-05-21'],
              dtype='datetime64[ns]', name='date', length=1867, freq=None)

### Time slicing and subsetting

Once we have a Dataframe with a 'DatetimeIndex', we can easily select periods of time. Slicing works also with datetime objects.

We can select one year:

In [14]:
rates.loc["1980"].head()

Unnamed: 0_level_0,dm,bp
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1980-01-02,0.5861,2.249
1980-01-03,0.5837,2.2365
1980-01-04,0.5842,2.241
1980-01-07,0.5853,2.2645
1980-01-08,0.5824,2.256


In [15]:
rates.loc["1980"].tail()

Unnamed: 0_level_0,dm,bp
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1980-12-24,0.5144,2.37
1980-12-26,0.5128,2.379
1980-12-29,0.5121,2.373
1980-12-30,0.5095,2.387
1980-12-31,0.5062,2.3875


We can select 2 months:

In [16]:
rates["1980-01":"1980-02"].head()

Unnamed: 0_level_0,dm,bp
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1980-01-02,0.5861,2.249
1980-01-03,0.5837,2.2365
1980-01-04,0.5842,2.241
1980-01-07,0.5853,2.2645
1980-01-08,0.5824,2.256


In [17]:
rates["1980-01":"1980-02"].tail()

Unnamed: 0_level_0,dm,bp
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1980-02-25,0.567,2.27
1980-02-26,0.5668,2.279
1980-02-27,0.5683,2.284
1980-02-28,0.567,2.283
1980-02-29,0.5614,2.258


### Resampling

```resample``` is a very useful method in Pandas. It is a kind of time-based ```groupby```.

The mean values by week can be calculated like this:

In [18]:
rates.resample('W').mean().head()

Unnamed: 0_level_0,dm,bp
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1980-01-06,0.584667,2.242167
1980-01-13,0.58274,2.26
1980-01-20,0.58014,2.2789
1980-01-27,0.57686,2.2737
1980-02-03,0.57512,2.2659


Sum of 2 day periods:

In [19]:
rates.resample('2D').sum().head()

Unnamed: 0_level_0,dm,bp
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1980-01-02,1.1698,4.4855
1980-01-04,0.5842,2.241
1980-01-06,0.5853,2.2645
1980-01-08,1.1658,4.521
1980-01-10,1.1626,4.5145


As in the aggregation example, many statistics can be defined:

In [20]:
rates.resample('2W').agg([np.mean, np.min, np.max]).head()

Unnamed: 0_level_0,dm,dm,dm,bp,bp,bp
Unnamed: 0_level_1,mean,amin,amax,mean,amin,amax
date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1980-01-06,0.584667,0.5837,0.5861,2.242167,2.2365,2.249
1980-01-20,0.58144,0.5787,0.5853,2.26945,2.2505,2.2835
1980-02-03,0.57599,0.5733,0.5778,2.2698,2.253,2.278
1980-02-17,0.57498,0.5735,0.5774,2.302,2.2895,2.315
1980-03-02,0.568333,0.5614,0.5729,2.276222,2.258,2.286


In [21]:
rates.resample('6M').agg({'dm': np.min, 'bp': np.max}).head()

Unnamed: 0_level_0,dm,bp
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1980-01-31,0.574,2.2835
1980-07-31,0.506,2.396
1981-01-31,0.4681,2.4545
1981-07-31,0.4035,2.353
1982-01-31,0.3883,1.966
