# Based on: [Time series / date functionality](https://pandas.pydata.org/docs/user_guide/timeseries.html)

In [1]:
import pandas as pd
import numpy as np
import datetime

# 1. Overview

`pandas` captures 4 general time related concepts:

- **Date times:** 
  - *specific date and time* with *timezone* support
  - similar to `datetime.datetime`.
- **Time deltas:**
  - *absolute time duration*
  - similar to `datetime.timedelta`.
- **Time spans:**
  - *span of time*, defined by a point in time and its associated frequency.
- **Date offsets:**
  - *relative time duration* that respects *calendar arithmetic*
  - similar to `dateutil.relativedelta.relativedelta` from the `dateutil` package.

*Null* date times, time deltas, and time spans are represented as `NaT`.


## Timestamps vs. Time Spans

Timestamps **associate values with points in time**. Single values are instances of `Timestamp`, and sequences are instances of `DatetimeIndex`.

Timespans **represent periods**, and are more suited for things like change variables. Scalars are instances of `Period`, and sequences are instances of `PeriodIndex`.


# 2. Converting to Timestamps

In most cases, you'd wish to use [`pd.to_datetime`][dt]. It provides more options e.g for dates with the day coming first, set `dayfirst=True`.

`pd.Timestamp` can be used for scalars, and `pd.DatetimeIndex` for indices.

[dt]: https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#pandas.to_datetime

In [2]:
pd.to_datetime(["20220101", "2022-01-02", "2022/01/03"])

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03'], dtype='datetime64[ns]', freq=None)

In [3]:
pd.to_datetime(["01-01-22", "02/01/22"], dayfirst=True)

DatetimeIndex(['2022-01-01', '2022-01-02'], dtype='datetime64[ns]', freq=None)

In [4]:
pd.Timestamp("2022-01-01")

Timestamp('2022-01-01 00:00:00')

In [5]:
pd.DatetimeIndex(["20220101", "2022-01-02", "2022/01/03"], freq="infer")

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03'], dtype='datetime64[ns]', freq='D')

## 2.1 Specifying a format

`format` ensures specific parsing, and could potentially speed up conversion.

In [6]:
pd.to_datetime(["01012022", "02012022", "03012022"], format="%d%m%Y")

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03'], dtype='datetime64[ns]', freq=None)

## 2.2 Assembling datetime from multiple DataFrame columns

A DataFrame of `int` or `str` values can be assembled into a datetime Series.

The columns should be appropriately named as `year`, `month`, `day`, (optional: `hour`, `minute`, `second`, `millisecond`, `microsecond`, `nanosecond`).

In [7]:
df = pd.DataFrame(
    [[1, 1, 2022, 7, 1], [2, 1, 2022, 7, 0], [3, 1, 2022, 7, 2]],
    columns=["day", "month", "year", "hour", "minute"],
)
df

Unnamed: 0,day,month,year,hour,minute
0,1,1,2022,7,1
1,2,1,2022,7,0
2,3,1,2022,7,2


In [8]:
pd.to_datetime(df)

0   2022-01-01 07:01:00
1   2022-01-02 07:00:00
2   2022-01-03 07:02:00
dtype: datetime64[ns]

## 2.3 Invalid data

- `errors="raise"` (default) throws an error when input can't be parsed.
- `errors="ignore"` returns the original input.
- `errors="coerce"` converts unparseable input to `NaT`.

In [9]:
try:
    pd.to_datetime(["01-01-2022", "abcd"])
except Exception as error:
    print(repr(error))

ParserError('Unknown string format: %s', 'abcd')


In [10]:
pd.to_datetime(["01-01-2022", "abcd"], errors="ignore")

Index(['01-01-2022', 'abcd'], dtype='object')

In [11]:
pd.to_datetime(["01-01-2022", "abcd"], errors="coerce")

DatetimeIndex(['2022-01-01', 'NaT'], dtype='datetime64[ns]', freq=None)

## 2.4 Epoch timestamps

`int` or `float` epoch times can be converted to `Timestamp` and `DatetimeIndex`. You can specify the `unit` (*nanoseconds* default - how timestamps are stored internally).

The starting point is set by the `origin` parameter. The default is set at `origin='unix'` i.e. `1970-01-01 00:00:00` commonly called *unix epoch* or *POSIX time*.

Setting `tz` in epoch timestamps raises an error. Localize after creation.

>**NOTE:**Conversion of `float` epoch times can lead to inaccurate results. Python floats have about 15 digits precision in decimal. Rounding during conversion from float to high precision `Timestamp` is unavoidable.

In [12]:
pd.to_datetime([1655000000, 1656000000, 1657000000], unit='s')

DatetimeIndex(['2022-06-12 02:13:20', '2022-06-23 16:00:00',
               '2022-07-05 05:46:40'],
              dtype='datetime64[ns]', freq=None)

In [13]:
pd.to_datetime([1, 2, 3], origin=pd.Timestamp('01-01-2022'))

DatetimeIndex(['2022-01-01 00:00:00.000000001',
               '2022-01-01 00:00:00.000000002',
               '2022-01-01 00:00:00.000000003'],
              dtype='datetime64[ns]', freq=None)

In [14]:
pd.Timestamp(1655000000000000000).tz_localize("Africa/Nairobi")

Timestamp('2022-06-12 02:13:20+0300', tz='Africa/Nairobi')

## 2.5 From Timestamp to epoch

Subtract the *epoch* (usually midnight at January 1, 1970 UTC) and then floor divide by the `unit`.

In [15]:
stamp = pd.Timestamp(1655000000, unit='s')
stamp

Timestamp('2022-06-12 02:13:20')

In [16]:
(stamp - pd.Timestamp("1970-01-01")) / pd.Timedelta("1s")

1655000000.0