#### <img src=images/logo.png align='right' width=250>

# Time Data with Pandas

## Goal

There are various ways to work with time series data. Notably, the Pandas library, which is the core data manipulation and analysis library for Python, has some *amazing* utilities for dealing with time data. The goal of this notebook is to familiarise ourselves with how Pandas can be used to work with time data.

## Program
- [Time Stamps](#timestamps)
    - [<mark>Exercise: Time Stamps</mark>](#ex)
- [Formatting Timestamps](#f)
    - [<mark>Exercise: Convert a string to a timestamp</mark>](#ex-conv)
- [Timezones](#tz)
- [Summary](#con)



In [1]:
import pandas as pd
import numpy as np

<a id='timestamps'></a>

## Timestamps in Pandas
![footer_logo](images/clock.jpeg)

In pandas, specific times are represented as **timestamps**.

Pandas can create datetime data from strings formated as `'yyyy-mm-ddThh:mm:ss:ms'` using `pd.Timestamp()`. 

In [3]:
type('2022-09-30T18:34:56')

str

In [5]:
print(pd.Timestamp('2022-09-30T18:34:56'))

2022-09-30 18:34:56


The date units are years (‘Y’), months (‘M’), weeks (‘W’), and days (‘D’), while the time units are hours (‘h’) in 24 hour format, minutes (‘m’), seconds (‘s’), milliseconds (‘ms’). Note that time units are combined with date units using `'T'`. A timestamp is the pandas equivalent of Python’s Datetime and is interchangeable with it in most cases.

Pandas Timestamps support a wide range of [operations](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html).

We can access various parameters about the date...

In [6]:
pd.Timestamp('2022-02-20T18:34:56').weekofyear

7

In [7]:
pd.Timestamp('2022-02-20T18:34:56').quarter

1

In [8]:
pd.Timestamp('2022-02-20T18:34:56').month

2

Use specific time related methods...

In [9]:
pd.Timestamp('2023-02-20T18:34:56').month_name()

'February'

In [10]:
pd.Timestamp('2023-02-20T18:34:56').day_name()

'Monday'

In [11]:
pd.Timestamp('2023-02-20T18:34:56').replace(year=2020)

Timestamp('2020-02-20 18:34:56')

Find differences in time between timestamps...

In [14]:
(pd.Timestamp('2023-02-20T18:34:56') - pd.Timestamp('2020-02-18T18:24:32')).days

1098

<a id='ex'></a>
### <mark>Exercise: Investigate the timestamp features and methods

We've seen a few examples, but let's investigate further.
- What day of the year is it today
- Are we in a leap year?
- How long is it until a Public Holiday (e.g. Christmas)?


In [15]:
current_date = pd.Timestamp('today')
current_date

Timestamp('2022-12-12 08:49:08.270905')

In [16]:
current_date.dayofyear

346

In [17]:
current_date.day_name()

'Monday'

In [19]:
current_date.is_leap_year

False

In [20]:
(pd.Timestamp('2022-12-25')- current_date).days

12

<a id='f'></a>
## Formatting timestamps with Pandas

Pandas can recognize timestamps written in various common formats with `pd.to_datetime`, even if the data contains a mix of formats:

In [21]:
pd.to_datetime(['2022-03-27','April 1, 2022', '25/05/2022'])

  pd.to_datetime(['2022-03-27','April 1, 2022', '25/05/2022'])


DatetimeIndex(['2022-03-27', '2022-04-01', '2022-05-25'], dtype='datetime64[ns]', freq=None)

*`DatetimeIndex` is an array of Timestamps, which also contains metadata about it*

Pandas will try to recognize the right format in a sequence of timestamps, though in some cases it may read data not as  you may have intended:

In [22]:
pd.to_datetime(['02/01/2022','03/01/2022'])

DatetimeIndex(['2022-02-01', '2022-03-01'], dtype='datetime64[ns]', freq=None)

In [27]:
pd.to_datetime(['02/01/2022','03/01/2022'], dayfirst=True)

DatetimeIndex(['2022-01-02', '2022-01-03'], dtype='datetime64[ns]', freq=None)

To avoid such confusion, Timestamps can be explicitly formated with the `format` option. We recommend to always do this, rather than relying on Pandas figuring it out right for you. This way you can easily deal with even quite unconventional formats:

In [24]:
pd.to_datetime('02__2023__01', format='%m__%Y__%d')

Timestamp('2023-02-01 00:00:00')

In [25]:
pd.to_datetime('02__2023__01-01-0203', format='%m__%Y__%d-%H-%M%S')

Timestamp('2023-02-01 01:02:03')

<a id='ex-conv'></a>

### <mark>Exercise: Convert a string to a timestamp</mark>

Use formatting to get a Timestamp from the string below.

In [28]:
weird_date = 'YEAR2023MONTH02DAY21'

In [29]:
pd.to_datetime(weird_date, format='YEAR%YMONTH%mDAY%d')

Timestamp('2023-02-21 00:00:00')

Another good thing about fixed formatting - we want to know if single dates are unexpectedly in a wrong format. With a fixed formatting we will get an error raised for each such case rather than relying on Pandas figuring it out for us.

<a id='tz'></a>
## Timezones

We can include a [time zone](https://cheat.readthedocs.io/en/latest/python/timezones.html) with our timestamps.

In [30]:
current_date = pd.Timestamp('today')
current_date

Timestamp('2022-12-12 09:11:21.204907')

Timestamps can be converted to other time zones using `.tz_convert('tz_name')`:

In [31]:
current_date = pd.Timestamp('today', tz='Europe/Amsterdam')
current_date

Timestamp('2022-12-12 10:11:40.033739+0100', tz='Europe/Amsterdam')

In [32]:
current_date.tz_convert('Asia/Tokyo')

Timestamp('2022-12-12 18:11:40.033739+0900', tz='Asia/Tokyo')

We can even handle periods of daylight saving.

For example, notice the UTC with an offset of one hour is the same time as London in winter…

In [33]:
pd.Timestamp('2023-1-1 00:00').tz_localize('Europe/London')

Timestamp('2023-01-01 00:00:00+0000', tz='Europe/London')

In [34]:
pd.Timestamp('2023-1-1 00:00').tz_localize('Etc/GMT')

Timestamp('2023-01-01 00:00:00+0000', tz='Etc/GMT')

but how this changes with daylight saving time in summer…

In [36]:
pd.Timestamp('2023-6-1 00:00').tz_localize('Europe/London')

Timestamp('2023-06-01 00:00:00+0100', tz='Europe/London')

In [37]:
pd.Timestamp('2023-6-1 00:00').tz_localize('Etc/GMT')

Timestamp('2023-06-01 00:00:00+0000', tz='Etc/GMT')

<a id='con'></a>
## Conclusion

The pandas library has some great way of working with datetime fields. So far we have seen the `pandas.Timestamp`:
- How Pandas can work with specific times (Timestamps).
- How to format Timestamps and convert between timezones.