# Time Zones

Why are time zones important?
* Daylight savings time
* Human behaviour due to business hours, circadian rhythm, etc.
* Daily aggregation need timezones, esp. when comparing across time zones or doing aggregations across business hours or so.
* Comparing data from different time zones
  * Have you ever missed a meeting (or a train) because of a time zone mix-up? At least one of us has 😅

Luckily, Pandas, and Pandera too, offers a lot of functionality to deal with time zones.

## Time zone recap

* Timestamps (datetimes) may be timezone-aware or timezone-naive.
* `UTC` is a widely accepted timezone standard. It's almost the same as GMT.
* Timezones are specified as `UTC+/-<offset>` or by name, e.g. `Europe/Prague`.
* Names timezone generally do not have a fixes offset, because:
  * daylight savings time (yes, to be cancelled in the EU in 2021),
  * the offset may change over time, not always by a full hour.

## Pandas timezone basics

* `tz_localize` - convert timezone-naive to timezone-aware.
* `tz_convert` - convert timezone-aware to another timezone (or to naive).
* `.dt` accessor for `Series` based operations (i.e. on columns too) - access datetime properties, e.g. `.dt.tz` to access timezone.
  * `.dt` is not needed for operations on datetime indexes.

See [Pandas timezone handling section](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#time-zone-handling) for more details.


In [None]:
import pandas as pd

In [None]:
dti = pd.date_range("2018-01-01", periods=3, freq="H")
dti


In [None]:
dti_utc = dti.tz_localize("UTC")
dti_utc

In [None]:
dti_us = dti_utc.tz_convert("US/Pacific")
dti_us

You can also use `tz_convert` and `tz_localize` on `DataFrame`s, but you may need to specify the axis and / or index level.
This is particularly useful when working with multi-indexes, as you will see later.

In [None]:
pd.DataFrame({"A": range(len(dti_us))}, index=dti_us).tz_convert("Asia/Tokyo", axis="index")

**Exercise:** 

*Important❗:* Before starting this exercise, checkout specific versions of some of the files in the `weatherlyser` package
as the current version contains the solution to this exercise already 😏.

To do this, run the following command:

In [None]:
!git checkout origin/remove-solution-05 -- weatherlyser/loader.py weatherlyser/pa_models.py



1. Add time zone information the the output of the `load_chmi_data` function in [`loader`](weatherlyser/loader.py) module. 
    - You can assume the time zone is always `Europe/Prague`. 
2. Modify the `CHMIDailyDataFrame` model in [`pa_models`](weatherlyser/pa_models.py) module to check the right time zone is used.
3. Make sure the tests are passing by running `pytest tests/test_chmi_loader.py`


After the exercise, *restore* the original version of the files by running:


In [5]:
!git checkout main -- weatherlyser/loader.py weatherlyser/pa_models.py

### Pandera's coerce behaviour 

It's works spending a bit of time to understand Pandera's coerce behaviour when dealing with time zones.

Let's assume a model like this:

In [None]:
import pandera as pa
import pandas as pd
from pandera.typing import Index
from typing import Annotated

class TimeIndexedDF(pa.DataFrameModel):
    A: int
    # timezone-naive index
    time: Index[pd.Timestamp] = pa.Field(coerce=True, nullable=False)
    # timezone-aware index
    # time: Index[Annotated[pd.DatetimeTZDtype, "ns", "Europe/Prague"]] = pa.Field(coerce=True, nullable=False)

In [None]:
# select a time index that includes a DST transition from summer to winter time
time_index = pd.date_range("2018-10-28", periods=5, freq="H", tz="Europe/Prague")
# this will make the time index timezone-naive
# time_index = time_index.tz_localize(None)

time_df = pd.DataFrame({"A": range(len(time_index))}, index=time_index)
time_df

In [None]:
TimeIndexedDF.validate(time_df)

What happened? Pandera coerced to *timezone-naive* index! And converted the timezone to UTC first!

**Exercise:**
1. Try to play with different combinations the `coerce` value in `TimeIndexedDF` model, tz-naive or tz-aware target index type, and tz-aware vs tz-naive `time_index` in the validated dataframe (by either changing the values or commenting / uncommenting relevant lines in the notebook).
2. For `time: Index[Annotated[pd.DatetimeTZDtype, "ns", "Europe/Prague"]] = pa.Field(coerce=True, nullable=False)` version of the model and tz-naive `time_index` input, verify the validation works correctly.
3. For the same combination as in 2., try to find an input that would be ambiguous for the coercion and thus validation would fail. (Hint: The DST change is important here.)

**Exercise:**
1. Use the `processors.get_seasons` function to create a function than *adds* a `season` column to a dataframe with a time index.
2. Create a Pandera dataframe model for the input and output of this new function. Enforce "Europe/Prague" timezone for the input and output.
3. Add `@pa.check_types` decorator to the function.
4. Try to interactively use the new function on the CHMI Ruzyně dataset.

*Optional:*

1. Create some test(s) for the new function.