# Pandas Timezones

See Chris Albon's [post](https://chrisalbon.com/machine_learning/preprocessing_dates_and_times/convert_pandas_column_timezone/) for reference.

In [1]:
from IPython.display import display # neat trick to display dataframes
import pandas as pd

### convert naive datetimeindex to timezone

works for both `pd.date_range` and `pd.DatetimeIndex`

In [2]:
date_range = pd.date_range(start='2015', end='2016', freq='h')[:-1]

display(date_range)

display(date_range.tz_localize('UTC'))

DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00',
               '2015-01-01 02:00:00', '2015-01-01 03:00:00',
               '2015-01-01 04:00:00', '2015-01-01 05:00:00',
               '2015-01-01 06:00:00', '2015-01-01 07:00:00',
               '2015-01-01 08:00:00', '2015-01-01 09:00:00',
               ...
               '2015-12-31 14:00:00', '2015-12-31 15:00:00',
               '2015-12-31 16:00:00', '2015-12-31 17:00:00',
               '2015-12-31 18:00:00', '2015-12-31 19:00:00',
               '2015-12-31 20:00:00', '2015-12-31 21:00:00',
               '2015-12-31 22:00:00', '2015-12-31 23:00:00'],
              dtype='datetime64[ns]', length=8760, freq='H')

DatetimeIndex(['2015-01-01 00:00:00+00:00', '2015-01-01 01:00:00+00:00',
               '2015-01-01 02:00:00+00:00', '2015-01-01 03:00:00+00:00',
               '2015-01-01 04:00:00+00:00', '2015-01-01 05:00:00+00:00',
               '2015-01-01 06:00:00+00:00', '2015-01-01 07:00:00+00:00',
               '2015-01-01 08:00:00+00:00', '2015-01-01 09:00:00+00:00',
               ...
               '2015-12-31 14:00:00+00:00', '2015-12-31 15:00:00+00:00',
               '2015-12-31 16:00:00+00:00', '2015-12-31 17:00:00+00:00',
               '2015-12-31 18:00:00+00:00', '2015-12-31 19:00:00+00:00',
               '2015-12-31 20:00:00+00:00', '2015-12-31 21:00:00+00:00',
               '2015-12-31 22:00:00+00:00', '2015-12-31 23:00:00+00:00'],
              dtype='datetime64[ns, UTC]', length=8760, freq='H')

### Convert index from UTC to specific timezone

Let's see a case where we have a column of datetimes (in UTC) and a column of timezones, and we want to convert each timezone into the corresponding timezone.

* start and end can be pd.datetime objects as well
* `DatetimeIndex` can be already generated for a specific timezone

In [3]:
date_range = pd.date_range(start='2015', end='2016', freq='h', tz='UTC')[:-1]

display(date_range)

display(date_range.tz_convert('Europe/London'))

DatetimeIndex(['2015-01-01 00:00:00+00:00', '2015-01-01 01:00:00+00:00',
               '2015-01-01 02:00:00+00:00', '2015-01-01 03:00:00+00:00',
               '2015-01-01 04:00:00+00:00', '2015-01-01 05:00:00+00:00',
               '2015-01-01 06:00:00+00:00', '2015-01-01 07:00:00+00:00',
               '2015-01-01 08:00:00+00:00', '2015-01-01 09:00:00+00:00',
               ...
               '2015-12-31 14:00:00+00:00', '2015-12-31 15:00:00+00:00',
               '2015-12-31 16:00:00+00:00', '2015-12-31 17:00:00+00:00',
               '2015-12-31 18:00:00+00:00', '2015-12-31 19:00:00+00:00',
               '2015-12-31 20:00:00+00:00', '2015-12-31 21:00:00+00:00',
               '2015-12-31 22:00:00+00:00', '2015-12-31 23:00:00+00:00'],
              dtype='datetime64[ns, UTC]', length=8760, freq='H')

DatetimeIndex(['2015-01-01 00:00:00+00:00', '2015-01-01 01:00:00+00:00',
               '2015-01-01 02:00:00+00:00', '2015-01-01 03:00:00+00:00',
               '2015-01-01 04:00:00+00:00', '2015-01-01 05:00:00+00:00',
               '2015-01-01 06:00:00+00:00', '2015-01-01 07:00:00+00:00',
               '2015-01-01 08:00:00+00:00', '2015-01-01 09:00:00+00:00',
               ...
               '2015-12-31 14:00:00+00:00', '2015-12-31 15:00:00+00:00',
               '2015-12-31 16:00:00+00:00', '2015-12-31 17:00:00+00:00',
               '2015-12-31 18:00:00+00:00', '2015-12-31 19:00:00+00:00',
               '2015-12-31 20:00:00+00:00', '2015-12-31 21:00:00+00:00',
               '2015-12-31 22:00:00+00:00', '2015-12-31 23:00:00+00:00'],
              dtype='datetime64[ns, Europe/London]', length=8760, freq='H')

### Convert df column to respective timezones

* using `np.random.choice` returns an array and looses the timezone awareness!
* `tz_localize` can be used on a df column by using the `.dt` method.
* `tz_convert` does not accept a list (array, list or anything else) of timezones, so we need to use `apply`. It's not blazing fast.

In [4]:
import numpy as np # only needed to choose random timezones
import pytz # only needed to get a list of available timezones

In [5]:
date_range = pd.date_range(start='2015', end='2016', freq='h')[:-1]

size = len(date_range)

df = pd.DataFrame({
    'datetime': np.random.choice(date_range, size), # random datetimes from the interval
    'timezone': np.random.choice(pytz.all_timezones, size) # random timezones
})

# tz localize works on columns if we use .dt
df['datetime'] = df['datetime'].dt.tz_localize('UTC')

display(df.head())

# convert each datetime to the corresponding timezone
df['local_datetime'] = df.apply(
    lambda row: row['datetime'].tz_convert(row['timezone']), 
    axis=1
)

display(df.head())

Unnamed: 0,datetime,timezone
0,2015-06-30 04:00:00+00:00,America/Santo_Domingo
1,2015-09-20 22:00:00+00:00,Africa/Porto-Novo
2,2015-03-19 05:00:00+00:00,America/Swift_Current
3,2015-04-30 13:00:00+00:00,Navajo
4,2015-09-14 06:00:00+00:00,Asia/Yerevan


Unnamed: 0,datetime,timezone,local_datetime
0,2015-06-30 04:00:00+00:00,America/Santo_Domingo,2015-06-30 00:00:00-04:00
1,2015-09-20 22:00:00+00:00,Africa/Porto-Novo,2015-09-20 23:00:00+01:00
2,2015-03-19 05:00:00+00:00,America/Swift_Current,2015-03-18 23:00:00-06:00
3,2015-04-30 13:00:00+00:00,Navajo,2015-04-30 07:00:00-06:00
4,2015-09-14 06:00:00+00:00,Asia/Yerevan,2015-09-14 10:00:00+04:00
