In [1]:
import pandas as pd
import numpy as np
import datetime

# Overview

`pandas` captures 4 general time related concepts:

- **Date times:** A specific date and time with timezone support. Similar to datetime.datetime from the standard library.
- **Time deltas:** An absolute time duration. Similar to datetime.timedelta from the standard library.
- **Time spans:** A span of time defined by a point in time and its associated frequency.
- **Date offsets:** A relative time duration that respects calendar arithmetic. Similar to dateutil.relativedelta.relativedelta from the dateutil package.

### Parsing time series information from various sources and formats

In [2]:
dti = pd.to_datetime(['1/1/2020', 
                      np.datetime64('2020-01-01'),
                      datetime.datetime(2020, 1, 1)])
   
dti

DatetimeIndex(['2020-01-01', '2020-01-01', '2020-01-01'], dtype='datetime64[ns]', freq=None)

### Generate sequences of fixed-frequency dates and time spans

In [3]:
dti = pd.date_range('2020-01-01', periods=5, freq='H')
dti

DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 01:00:00',
               '2020-01-01 02:00:00', '2020-01-01 03:00:00',
               '2020-01-01 04:00:00'],
              dtype='datetime64[ns]', freq='H')

### Manipulating and converting date times with timezone information

In [4]:
dti = dti.tz_localize('UTC')
dti

DatetimeIndex(['2020-01-01 00:00:00+00:00', '2020-01-01 01:00:00+00:00',
               '2020-01-01 02:00:00+00:00', '2020-01-01 03:00:00+00:00',
               '2020-01-01 04:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='H')

In [5]:
dti.tz_convert('US/Pacific')

DatetimeIndex(['2019-12-31 16:00:00-08:00', '2019-12-31 17:00:00-08:00',
               '2019-12-31 18:00:00-08:00', '2019-12-31 19:00:00-08:00',
               '2019-12-31 20:00:00-08:00'],
              dtype='datetime64[ns, US/Pacific]', freq='H')

### Resampling or converting a time series to a particular frequency

In [6]:
idx = pd.date_range('2020-01-01', periods=8, freq='H')

ts = pd.Series(range(len(idx)), index=idx)
ts

2020-01-01 00:00:00    0
2020-01-01 01:00:00    1
2020-01-01 02:00:00    2
2020-01-01 03:00:00    3
2020-01-01 04:00:00    4
2020-01-01 05:00:00    5
2020-01-01 06:00:00    6
2020-01-01 07:00:00    7
Freq: H, dtype: int64

In [7]:
ts.resample('2H').mean()

2020-01-01 00:00:00    0.5
2020-01-01 02:00:00    2.5
2020-01-01 04:00:00    4.5
2020-01-01 06:00:00    6.5
Freq: 2H, dtype: float64

### Performing date and time arithmetic with absolute or relative time increments

In [8]:
friday = pd.Timestamp('2020-01-03')
friday.day_name()

'Friday'

In [9]:
saturday = friday + pd.Timedelta('1 day') # Adding 1 day
saturday.day_name()

'Saturday'

In [10]:
monday = friday + pd.offsets.BDay() # Adding 1 business-day / weekday
monday.day_name()

'Monday'

>pandas represents null date times, time deltas, and time spans as `NaT` which is useful for representing missing or null date like values and behaves similar as `np.nan` does for float data.

In [11]:
pd.Timestamp(pd.NaT)

NaT

# 1. Timestamps vs. Time Spans

## Timestamps
Timestamped data is the most basic type of time series data that associates values with points in time. For pandas objects it means using the points in time.

In [12]:
pd.Timestamp('2020-05-01')

Timestamp('2020-05-01 00:00:00')

In [13]:
pd.Timestamp(2020,3,31,12,59,59)

Timestamp('2020-03-31 12:59:59')

## Timespans
Timespans are handy for representing things like change variables. The span represented by `Period` can be specified explicitly, or inferred from datetime string format.

In [14]:
pd.Period('2020-01')

Period('2020-01', 'M')

In [15]:
pd.Period('2020-01', freq='H')

Period('2020-01-01 00:00', 'H')

>`Timestamp` and `Period` can serve as an index. Lists of Timestamp and Period are automatically coerced to `DatetimeIndex` and `PeriodIndex` respectively.

# Converting to timestamps
To convert a Series or list-like object of date-like objects e.g. strings, epochs, or a mixture, you can use the `to_datetime` function. 

When passed a Series, this returns a Series (with the same index), while a list-like is converted to a DatetimeIndex:

In [16]:
pd.to_datetime(pd.Series(['Dec 31, 2019', '2020-01-01', None]))

0   2019-12-31
1   2020-01-01
2          NaT
dtype: datetime64[ns]

In [17]:
 pd.to_datetime(['2020/03/30', '2020.3.31', '04-01-2020'])

DatetimeIndex(['2020-03-30', '2020-03-31', '2020-04-01'], dtype='datetime64[ns]', freq=None)

>If you use dates which start with the day first (i.e. European style), you can pass the `dayfirst` flag:

In [18]:
pd.to_datetime(['2020/03/30', '2020.3.31', '04-01-2020'], dayfirst=True)

DatetimeIndex(['2020-03-30', '2020-03-31', '2020-01-04'], dtype='datetime64[ns]', freq=None)

>You can also use the `DatetimeIndex` constructor directly

In [19]:
pd.DatetimeIndex(['2020-01-01', '2020-01-03', '2020-01-05'])

DatetimeIndex(['2020-01-01', '2020-01-03', '2020-01-05'], dtype='datetime64[ns]', freq=None)

>The string ‘infer’ can be passed in order to set the frequency of the index as the inferred frequency upon creation:

In [20]:
pd.DatetimeIndex(['2020-01-01', '2020-01-03', '2020-01-05'], freq='infer')

DatetimeIndex(['2020-01-01', '2020-01-03', '2020-01-05'], dtype='datetime64[ns]', freq='2D')

## Providing a format argument

In [21]:
pd.to_datetime('2020/03/31', format='%Y/%m/%d')

Timestamp('2020-03-31 00:00:00')

In [22]:
pd.to_datetime('31-03-2020 12:59', format='%d-%m-%Y %H:%M')

Timestamp('2020-03-31 12:59:00')

## Assembling datetime from multiple DataFrame columns

In [23]:
df = pd.DataFrame({'year': [2019, 2020],
                   'month': [3, 4],
                   'day': [31, 1],
                   'hour': [23, 0]})
df

Unnamed: 0,year,month,day,hour
0,2019,3,31,23
1,2020,4,1,0


In [24]:
pd.to_datetime(df)

0   2019-03-31 23:00:00
1   2020-04-01 00:00:00
dtype: datetime64[ns]

In [25]:
pd.to_datetime(df[['year', 'month', 'day']])

0   2019-03-31
1   2020-04-01
dtype: datetime64[ns]

## Invalid data
The default behavior, `errors='raise'`, is to raise when unparseable:

In [26]:
pd.to_datetime(['2020/03/31', 'asd'])

ParserError: Unknown string format: asd

>Pass `errors='ignore'` to return the original input when unparseable

In [27]:
pd.to_datetime(['2020/03/31', 'asd'], errors='ignore')

Index(['2020/03/31', 'asd'], dtype='object')

>Pass `errors='coerce'` to convert unparseable data to NaT (not a time)

In [28]:
pd.to_datetime(['2020/03/31', 'asd'], errors='coerce')

DatetimeIndex(['2020-03-31', 'NaT'], dtype='datetime64[ns]', freq=None)

## Epoch timestamps

*pandas* supports converting integer or float epoch times to `Timestamp` and `DatetimeIndex`. The default unit is nanoseconds, since that is how Timestamp objects are stored internally. However, epochs are often stored in another `unit` which can be specified. These are computed from the starting point specified by the `origin` parameter.

In [29]:
pd.Timestamp(1585659540000000000)

Timestamp('2020-03-31 12:59:00')

In [30]:
pd.to_datetime(1585659540000000000)

Timestamp('2020-03-31 12:59:00')

In [31]:
pd.to_datetime(1585659540000, unit='ms')

Timestamp('2020-03-31 12:59:00')

In [32]:
pd.to_datetime(1585659540, unit='s')

Timestamp('2020-03-31 12:59:00')

## From timestamps to epoch
This can be done by subtracting the epoch (midnight at January 1, 1970 UTC) and then floor dividing by the “unit”.

In [33]:
(pd.to_datetime(1585659540, unit='s')-pd.Timestamp('1970-01-01')) // pd.Timedelta('1s')

1585659540

In [34]:
(pd.to_datetime('2020-03-31 12:59')-pd.Timestamp('1970-01-01')) // pd.Timedelta('1ns')

1585659540000000000

## Using the origin Parameter
Using the `origin` parameter, one can specify an alternative starting point for creation of a `DatetimeIndex`. 

In [35]:
pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1900-01-01'))

DatetimeIndex(['1900-01-02', '1900-01-03', '1900-01-04'], dtype='datetime64[ns]', freq=None)

>The default is set at `origin='unix'`, which defaults to `1970-01-01 00:00:00`. Commonly called **‘unix epoch’** or **POSIX time**.

# 3. Generating ranges of timestamps
We can use the `date_range()` and `bdate_range()` functions to create a `DatetimeIndex`. The default frequency for date_range is a *calendar day* while the default for bdate_range is a *business day*:

In [36]:
start = datetime.datetime(2019, 1, 1)
end = datetime.datetime(2020, 12, 31)

index = pd.date_range(start, end)
index

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',
               '2019-01-09', '2019-01-10',
               ...
               '2020-12-22', '2020-12-23', '2020-12-24', '2020-12-25',
               '2020-12-26', '2020-12-27', '2020-12-28', '2020-12-29',
               '2020-12-30', '2020-12-31'],
              dtype='datetime64[ns]', length=731, freq='D')

In [37]:
index = pd.bdate_range(start, end)
index

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
               '2019-01-11', '2019-01-14',
               ...
               '2020-12-18', '2020-12-21', '2020-12-22', '2020-12-23',
               '2020-12-24', '2020-12-25', '2020-12-28', '2020-12-29',
               '2020-12-30', '2020-12-31'],
              dtype='datetime64[ns]', length=523, freq='B')

>Convenience functions like `date_range` and `bdate_range` can utilize a variety of **frequency aliases**:

In [38]:
 pd.date_range(start, periods=20, freq='Q')

DatetimeIndex(['2019-03-31', '2019-06-30', '2019-09-30', '2019-12-31',
               '2020-03-31', '2020-06-30', '2020-09-30', '2020-12-31',
               '2021-03-31', '2021-06-30', '2021-09-30', '2021-12-31',
               '2022-03-31', '2022-06-30', '2022-09-30', '2022-12-31',
               '2023-03-31', '2023-06-30', '2023-09-30', '2023-12-31'],
              dtype='datetime64[ns]', freq='Q-DEC')

In [39]:
 pd.date_range(start, periods=1000, freq='SMS')

DatetimeIndex(['2019-01-01', '2019-01-15', '2019-02-01', '2019-02-15',
               '2019-03-01', '2019-03-15', '2019-04-01', '2019-04-15',
               '2019-05-01', '2019-05-15',
               ...
               '2060-04-01', '2060-04-15', '2060-05-01', '2060-05-15',
               '2060-06-01', '2060-06-15', '2060-07-01', '2060-07-15',
               '2060-08-01', '2060-08-15'],
              dtype='datetime64[ns]', length=1000, freq='SMS-15')

>Specifying `start`, `end`, and `periods` will generate a range of evenly spaced dates from *start* to *end* inclusively, with *periods* number of elements in the resulting `DatetimeIndex`

In [40]:
pd.date_range('2019-01-01', '2019-01-05', periods=10)

DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 10:40:00',
               '2019-01-01 21:20:00', '2019-01-02 08:00:00',
               '2019-01-02 18:40:00', '2019-01-03 05:20:00',
               '2019-01-03 16:00:00', '2019-01-04 02:40:00',
               '2019-01-04 13:20:00', '2019-01-05 00:00:00'],
              dtype='datetime64[ns]', freq=None)

## Custom frequency ranges
`bdate_range` can also generate a range of custom frequency dates by using the `weekmask` and `holidays` parameters. These parameters will only be used if a custom frequency string is passed.

In [41]:
weekmask = 'Mon Wed Fri'
holidays = [datetime.datetime(2011, 1, 5), datetime.datetime(2011, 3, 14)]

pd.bdate_range(start, end, freq='C', weekmask=weekmask, holidays=holidays)

DatetimeIndex(['2019-01-02', '2019-01-04', '2019-01-07', '2019-01-09',
               '2019-01-11', '2019-01-14', '2019-01-16', '2019-01-18',
               '2019-01-21', '2019-01-23',
               ...
               '2020-12-09', '2020-12-11', '2020-12-14', '2020-12-16',
               '2020-12-18', '2020-12-21', '2020-12-23', '2020-12-25',
               '2020-12-28', '2020-12-30'],
              dtype='datetime64[ns]', length=313, freq='C')

In [42]:
pd.bdate_range(start, end, freq='CBMS', weekmask=weekmask)

DatetimeIndex(['2019-01-02', '2019-02-01', '2019-03-01', '2019-04-01',
               '2019-05-01', '2019-06-03', '2019-07-01', '2019-08-02',
               '2019-09-02', '2019-10-02', '2019-11-01', '2019-12-02',
               '2020-01-01', '2020-02-03', '2020-03-02', '2020-04-01',
               '2020-05-01', '2020-06-01', '2020-07-01', '2020-08-03',
               '2020-09-02', '2020-10-02', '2020-11-02', '2020-12-02'],
              dtype='datetime64[ns]', freq='CBMS')

# 4. Timestamp limitations
Since pandas represents timestamps in nanosecond resolution, the time span that can be represented using a 64-bit integer is limited to approximately 584 years

In [43]:
pd.Timestamp.min

Timestamp('1677-09-21 00:12:43.145225')

In [44]:
pd.Timestamp.max

Timestamp('2262-04-11 23:47:16.854775807')

# 5. Indexing
One of the main uses for `DatetimeIndex` is as an index for pandas objects. The `DatetimeIndex` class contains many time series related optimizations:

- A large range of dates for various offsets are pre-computed and cached under the hood in order to make generating subsequent date ranges very fast (just have to grab a slice).

- Fast shifting using the `shift` and `tshift` method on pandas objects.

- Unioning of overlapping `DatetimeIndex` objects with the same frequency is very fast (important for fast data alignment).

- Quick access to date fields via properties such as year, month, etc.

- Regularization functions like `snap` and very fast `asof` logic.

## Partial string indexing
Dates and strings that parse to timestamps can be passed as indexing parameters

In [45]:
rng = pd.date_range(start, end, freq='BM')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2019-01-31    0.095329
2019-02-28    0.365400
2019-03-29    0.651379
2019-04-30   -1.135346
2019-05-31   -0.827473
2019-06-28    0.478391
2019-07-31   -0.543468
2019-08-30   -0.228812
2019-09-30   -0.088337
2019-10-31   -0.917577
2019-11-29    0.055996
2019-12-31   -1.899579
2020-01-31    0.552926
2020-02-28    0.774582
2020-03-31   -0.411508
2020-04-30    0.447038
2020-05-29    1.417899
2020-06-30    0.385781
2020-07-31   -0.154286
2020-08-31   -0.260893
2020-09-30   -0.675569
2020-10-30   -0.562471
2020-11-30   -1.229840
2020-12-31   -0.629252
Freq: BM, dtype: float64

In [46]:
ts['1/31/2019']

0.09532866302675405

In [47]:
ts['12/01/2019':'6/30/2020']

2019-12-31   -1.899579
2020-01-31    0.552926
2020-02-28    0.774582
2020-03-31   -0.411508
2020-04-30    0.447038
2020-05-29    1.417899
2020-06-30    0.385781
Freq: BM, dtype: float64

In [48]:
ts['2020']

2020-01-31    0.552926
2020-02-28    0.774582
2020-03-31   -0.411508
2020-04-30    0.447038
2020-05-29    1.417899
2020-06-30    0.385781
2020-07-31   -0.154286
2020-08-31   -0.260893
2020-09-30   -0.675569
2020-10-30   -0.562471
2020-11-30   -1.229840
2020-12-31   -0.629252
Freq: BM, dtype: float64

In [49]:
ts['2019-7']

2019-07-31   -0.543468
Freq: BM, dtype: float64

In [50]:
dft = pd.DataFrame(np.random.randn(100000, 1), columns=['A'],
                   index=pd.date_range('20200101', periods=100000, freq='T'))
    
dft

Unnamed: 0,A
2020-01-01 00:00:00,1.861497
2020-01-01 00:01:00,0.136712
2020-01-01 00:02:00,0.541990
2020-01-01 00:03:00,-0.676198
2020-01-01 00:04:00,-0.799406
...,...
2020-03-10 10:35:00,-0.914937
2020-03-10 10:36:00,-0.687969
2020-03-10 10:37:00,-0.387948
2020-03-10 10:38:00,0.555134


In [51]:
dft['2020-1':'2020-2']

Unnamed: 0,A
2020-01-01 00:00:00,1.861497
2020-01-01 00:01:00,0.136712
2020-01-01 00:02:00,0.541990
2020-01-01 00:03:00,-0.676198
2020-01-01 00:04:00,-0.799406
...,...
2020-02-29 23:55:00,2.194334
2020-02-29 23:56:00,0.983598
2020-02-29 23:57:00,0.834255
2020-02-29 23:58:00,0.353407


In [52]:
dft['2020-2':'2020-2-29 23:01']

Unnamed: 0,A
2020-02-01 00:00:00,0.111047
2020-02-01 00:01:00,0.242512
2020-02-01 00:02:00,0.701159
2020-02-01 00:03:00,-0.933691
2020-02-01 00:04:00,-0.207295
...,...
2020-02-29 22:57:00,-0.109458
2020-02-29 22:58:00,-0.950517
2020-02-29 22:59:00,0.702719
2020-02-29 23:00:00,-0.716263


## Slice vs. exact match
The same string used as an indexing parameter can be treated either as a slice or as an exact match depending on the resolution of the index. If the string is less accurate than the index, it will be treated as a slice, otherwise as an exact match.

In [53]:
series_minute = pd.Series([1, 2, 3],
                           pd.DatetimeIndex(['2011-12-31 23:59:00',
                                             '2012-01-01 00:00:00',
                                             '2012-01-01 00:02:00']))
   
series_minute.index.resolution

'minute'

In [54]:
# A timestamp string less accurate than a minute gives a Series object.
series_minute['2011-12-31 23']

2011-12-31 23:59:00    1
dtype: int64

In [55]:
# A timestamp string with minute resolution (or more accurate), gives a scalar instead: not casted to a slice.
series_minute['2011-12-31 23:59:00']

1

>Note that `DatetimeIndex` resolution cannot be less precise than day

In [56]:
series_monthly = pd.Series([1, 2, 3], pd.DatetimeIndex(['2011-12', '2012-01', '2012-02'])) 

series_monthly.index.resolution

'day'

## Exact indexing
Indexing with `Timestamp` or `datetime` objects is exact, because the objects have exact meaning. 