## Working with time zones

In [3]:
from datetime import date,datetime

import polars as pl

## Creating a simple `DataFrame`

We create a `DataFrame` that has a single value in the `date` column - 1970/1/1 00:00:00

This date is the origin point for Unix timestamps.

In [4]:
microseconds_per_hour = 3600 * 1e6

In [8]:
df = pl.DataFrame({
    "datetime": [datetime(1970,1,1)]
}).with_columns(
    pl.col("datetime").to_physical().alias("hours") / microseconds_per_hour
)

df

datetime,hours
datetime[μs],f64
1970-01-01 00:00:00,0.0


By default a `pl.Datetime` is **time zone-naive** - it has no time zone attached. 

Implicitly, however, a time zone-naive value is implicitly in the UTC time zone as `1970-01-01 00:00:00` as it corresponds to a timestamp of 0.

## Specify a time zone for a given datetime

If we know that the datetimes are not UTC but actually record a local datetime in a time zone we can specify the time zone with `dt.replace_time_zone`

In [12]:
df.with_columns(
    [
        pl.col("datetime").dt.replace_time_zone("America/New_York").alias("tz_local"),
        pl.col("datetime")
        .dt.replace_time_zone("America/New_York")
        .to_physical()
        .alias("tz_local_hours")
        / microseconds_per_hour,
    ]
)

datetime,hours,tz_local,tz_local_hours
datetime[μs],f64,"datetime[μs, America/New_York]",f64
1970-01-01 00:00:00,0.0,1970-01-01 00:00:00 EST,5.0


`replace_time_zone` won't change the time but adding a timezone sign on it.

## Change the time zone for a given Unix timestamp 

Now we know that the original data was recorded in Unix timestamps and so is in the UTC timezone. 

We want to know what local time that UTC timestamp corresponds to in New York.

In [14]:
df.with_columns(
    pl.col("datetime").dt.replace_time_zone("UTC")
).with_columns(
    [
        pl.col("datetime").dt.convert_time_zone("America/New_York").alias("tz_local"),
        pl.col("datetime")
        .dt.convert_time_zone("America/New_York")
        .to_physical()
        .alias("tz_local_hours")
        / microseconds_per_hour,
    ]
)

datetime,hours,tz_local,tz_local_hours
"datetime[μs, UTC]",f64,"datetime[μs, America/New_York]",f64
1970-01-01 00:00:00 UTC,0.0,1969-12-31 19:00:00 EST,0.0


`Convert_time_zone` likes you take an airplane from a country to another one and cross the time zone area.

The time will be changed to the destination time.

Remove time zone

In [17]:
df.with_columns(pl.col("datetime").dt.replace_time_zone("UTC")).with_columns(
    [
        pl.col("datetime").dt.replace_time_zone(None).alias("no_tz"),
        pl.col("datetime").dt.replace_time_zone(None).to_physical().alias("no_tz_hours")
        / microseconds_per_hour,
    ]
)

datetime,hours,no_tz,no_tz_hours
"datetime[μs, UTC]",f64,datetime[μs],f64
1970-01-01 00:00:00 UTC,0.0,1970-01-01 00:00:00,0.0


### Summary of the methods

| Method |Datetime | Timestamp|
|---|---|---|
| `dt.replace_time_zone` | No change | Changes timestamp |
| `dt.convert_time_zone` | Changes by offset| No change |

Make an example:

It's `2026/01/28 21:00:00` now.

When I call `dt.replace_time_zone` and designate any other country, the datetime is still at `2026/01/28 21:00:00` with different timezone.

However, when I call `dt.convert_time_zone`, the datetime will totally be changed if I choose the timezone which is not the same to my are.

## Filtering time zone datetimes

To filter a datetime with a time zone we need to specify the time zone in the `filter`.

Use the `zoneinfo` library that is built into Python to specify the time zone.

In [21]:
from zoneinfo import ZoneInfo

start = datetime(1970,1,1)
stop = datetime(1970,1,1,7)

pl.DataFrame(
    {
        "date": pl.datetime_range(
            start,
            stop,
            interval="1h",
            eager=True
        )
    }
).with_columns(
    pl.col("date").dt.replace_time_zone("America/New_York").alias("nyc")
).filter(
    pl.col("nyc") < datetime(1970,1,1,6, tzinfo=ZoneInfo("America/New_York"))
)

date,nyc
datetime[μs],"datetime[μs, America/New_York]"
1970-01-01 00:00:00,1970-01-01 00:00:00 EST
1970-01-01 01:00:00,1970-01-01 01:00:00 EST
1970-01-01 02:00:00,1970-01-01 02:00:00 EST
1970-01-01 03:00:00,1970-01-01 03:00:00 EST
1970-01-01 04:00:00,1970-01-01 04:00:00 EST
1970-01-01 05:00:00,1970-01-01 05:00:00 EST


## Exercises


## Exercise 1
Create a `DataFrame` with a `date` column at monthly intervals from 1st September 2025 to 1st December 2025

In [27]:
start = datetime(2025, 9, 1)
stop = datetime(2025, 12, 1)

pl.DataFrame({"date": pl.datetime_range(start, stop, interval="1mo", eager=True)})

date
datetime[μs]
2025-09-01 00:00:00
2025-10-01 00:00:00
2025-11-01 00:00:00
2025-12-01 00:00:00


Transform the `date` column so that the datetimes are local to Johannesburg.

In [25]:
pl.DataFrame(
    {
        "date": pl.datetime_range(start, stop, interval="1mo", eager=True)
    }
).with_columns(
    pl.col("date").dt.replace_time_zone("Africa/Johannesburg")
)

date
"datetime[μs, Africa/Johannesburg]"
2025-09-01 00:00:00 SAST
2025-10-01 00:00:00 SAST
2025-11-01 00:00:00 SAST
2025-12-01 00:00:00 SAST


Add a column with the integer representation called `date_p`

In [28]:
pl.DataFrame(
    {
        "date": pl.datetime_range(start, stop, interval="1mo", eager=True)
    }
).with_columns(
    pl.col("date").dt.replace_time_zone("Africa/Johannesburg")
).with_columns(
    pl.col("date").to_physical().alias("date_p")
)

date,date_p
"datetime[μs, Africa/Johannesburg]",i64
2025-09-01 00:00:00 SAST,1756677600000000
2025-10-01 00:00:00 SAST,1759269600000000
2025-11-01 00:00:00 SAST,1761948000000000
2025-12-01 00:00:00 SAST,1764540000000000


Add a column called `date_dublin` with the local time in Dublin

In [29]:
pl.DataFrame(
    {
        "date": pl.datetime_range(start, stop, interval="1mo", eager=True)
    }
).with_columns(
    pl.col("date").dt.replace_time_zone("Africa/Johannesburg")
).with_columns(
    pl.col("date").to_physical().alias("date_p")
).with_columns(
    pl.col("date").dt.convert_time_zone("Europe/Dublin").alias("date_dublin")
)

date,date_p,date_dublin
"datetime[μs, Africa/Johannesburg]",i64,"datetime[μs, Europe/Dublin]"
2025-09-01 00:00:00 SAST,1756677600000000,2025-08-31 23:00:00 IST
2025-10-01 00:00:00 SAST,1759269600000000,2025-09-30 23:00:00 IST
2025-11-01 00:00:00 SAST,1761948000000000,2025-10-31 22:00:00 GMT
2025-12-01 00:00:00 SAST,1764540000000000,2025-11-30 22:00:00 GMT


Add a column called `offset` that shows the offset between Johannesburg and Dublin.

In [30]:
pl.DataFrame(
    {
        "date": pl.datetime_range(start, stop, interval="1mo", eager=True)
    }
).with_columns(
    pl.col("date").dt.replace_time_zone("Africa/Johannesburg")
).with_columns(
    pl.col("date").to_physical().alias("date_p")
).with_columns(
    pl.col("date").dt.convert_time_zone("Europe/Dublin").alias("date_dublin")
).with_columns(
    (pl.col("date") - pl.col("date_dublin").dt.replace_time_zone("Africa/Johannesburg")).alias("offset")
)

date,date_p,date_dublin,offset
"datetime[μs, Africa/Johannesburg]",i64,"datetime[μs, Europe/Dublin]",duration[μs]
2025-09-01 00:00:00 SAST,1756677600000000,2025-08-31 23:00:00 IST,1h
2025-10-01 00:00:00 SAST,1759269600000000,2025-09-30 23:00:00 IST,1h
2025-11-01 00:00:00 SAST,1761948000000000,2025-10-31 22:00:00 GMT,2h
2025-12-01 00:00:00 SAST,1764540000000000,2025-11-30 22:00:00 GMT,2h


Why does the offset change over the months?

A: Daylight savings time

### Exercise 2
You have a weather station that records temperature at hourly intervals. 

The device records data in UTC.

In [32]:
import numpy as np

start = datetime(2020, 9, 1)
stop = datetime(2020, 9, 2)

pl.DataFrame({"date": pl.datetime_range(start, stop, "1h", eager=True)}).with_columns(
    # We use a cosine function with a period of 24 hours to generate a fake temperature cycle
    (
        25
        + 4 * ((2 * np.pi * pl.col("date").to_physical() / (24 * 60 * 60 * 1e6))).cos()
    ).alias("temperature")
)

date,temperature
datetime[μs],f64
2020-09-01 00:00:00,29.0
2020-09-01 01:00:00,28.863703
2020-09-01 02:00:00,28.464102
2020-09-01 03:00:00,27.828427
2020-09-01 04:00:00,27.0
2020-09-01 05:00:00,26.035276
2020-09-01 06:00:00,25.0
2020-09-01 07:00:00,23.964724
2020-09-01 08:00:00,23.0
2020-09-01 09:00:00,22.171573


From the output we can see that the device is not located in the UTC time zone as the highest temperature is at night and the lowest is in the afternoon.

Change the time zone to correspond with location that has higher temperatures in the late afternoon and lower temperatures in the early night (<a href="https://docs.rs/chrono-tz/latest/chrono_tz/enum.Tz.html" target="_blank">there are obviously many such locations, you mainly need to figure out whether to go east or west!</a>
).

In [43]:
pl.DataFrame(
    {
        "date": pl.datetime_range(start, stop, "1h", eager=True)
    }
).with_columns(
    (
        25 + 4 * ((2 * np.pi * pl.col("date").to_physical() / (24 * 60 * 60 * 1e6))).cos()
    ).alias("temperature")
).with_columns(
    pl.col("date").dt.replace_time_zone("UTC").dt.convert_time_zone("Brazil/West")
)

date,temperature
"datetime[μs, Brazil/West]",f64
2020-08-31 20:00:00 -04,29.0
2020-08-31 21:00:00 -04,28.863703
2020-08-31 22:00:00 -04,28.464102
2020-08-31 23:00:00 -04,27.828427
2020-09-01 00:00:00 -04,27.0
2020-09-01 01:00:00 -04,26.035276
2020-09-01 02:00:00 -04,25.0
2020-09-01 03:00:00 -04,23.964724
2020-09-01 04:00:00 -04,23.0
2020-09-01 05:00:00 -04,22.171573
