# Giant Mess - Dealing with Timezones and Daylight Saving Time in Python
## - I went to sleep at 2 am on November 1... - Um, which one?
<img src='images/time.jpg'></img>
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://www.pexels.com/@andrey-grushnikov-223358?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>Andrey Grushnikov</a>
        on 
        <a href='https://www.pexels.com/photo/black-and-white-photo-of-clocks-707676/?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>Pexels</a>
    </strong>
</figcaption>

### Introduction <small id='intro'></small>

The first mechanical clock was invented in the 17th century. During those times, everyone set their clock so that noon was directly overhead and this was not a big problem because everyone traveled on foot and horses. Later, telegraphs and trains were invented and you could suddenly travel around and communicate with people hundreds of miles away in a significantly short amount of time. This led to people realizing they had to have a standard time so, in the 1900s, the greatest leaders and thinkers from developed countries gathered around and divided the world into 24 time zones. 

Greenwich, London got be 0 o'clock because they standardized their time first and the countries west to the UK set their clocks earlier than London and the countries to the West set later. This standardization is also knows as UTC (Universal Coordinate Time). In other words, if you Google 'UTC time now', it will show the time in London.

Today, data moves around the world in the blink of an eye and if date and time information is not timezone aware, it will create a whole host of problems for programmers. In this post, we will talk about how to address such problems, including Daylight Saving Time.

### Timezones <small id='tz'></small>

To start working with timezones, let's import the necessary modules from `datetime`:

> Before moving on, I highly recommend to read my [previous](https://towardsdatascience.com/date-and-time-objects-in-python-everything-you-need-to-know-10aa3bf121be?source=your_stories_page-------------------------------------) article on datetimes and timedeltas if you haven't already.

In [1]:
from datetime import datetime, timedelta, timezone

We will need `datetime` to work with DateTime objects, `timedelta` to work with time durations and `timezone` to create time differences between UTC standard time. To practice, I will load a subset of the Ford GoBike data of San Francisco bay area which contains start and end dates for rides and ride duration:

In [2]:
import pandas as pd

rides = pd.read_csv('data/tripdata.csv',
                    usecols=['duration_sec', 'start_time', 'end_time'],
                    parse_dates=['start_time', 'end_time'])
rides.head()

Unnamed: 0,duration_sec,start_time,end_time
0,80110,2017-12-31 16:57:39.654,2018-01-01 15:12:50.245
1,78800,2017-12-31 15:56:34.842,2018-01-01 13:49:55.617
2,45768,2017-12-31 22:45:48.411,2018-01-01 11:28:36.883
3,62172,2017-12-31 17:31:10.636,2018-01-01 10:47:23.531
4,43603,2017-12-31 14:23:14.001,2018-01-01 02:29:57.571


In [3]:
rides.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 519700 entries, 0 to 519699
Data columns (total 3 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   duration_sec  519700 non-null  int64         
 1   start_time    519700 non-null  datetime64[ns]
 2   end_time      519700 non-null  datetime64[ns]
dtypes: datetime64[ns](2), int64(1)
memory usage: 11.9 MB


To keep the examples short and easy to understand, let's isolate the earliest recorded bike rental from the dataset:

In [4]:
rides.nsmallest(1, 'start_time')['start_time']

519697   2017-06-28 09:47:36.347
Name: start_time, dtype: datetime64[ns]

We will now store it as a `datetime` object, ignoring the milliseconds:

In [5]:
dt = datetime(2017, 6, 28, 9, 47, 36)
print(dt)

2017-06-28 09:47:36


As I said, the data was recorded in San Francisco. San Francisco's timezone has a -7 hour UTC offset meaning it is 7 hours behind the UTC, London time. To show this in code, we need to create a `timezone` object:

In [6]:
sf_tz = timezone(timedelta(hours=-7))
print(type(sf_tz))

<class 'datetime.timezone'>


`timezone` function accepts a `timedelta` object which helps to translate your datetime into UTC. We can specify what time zone the clock was in when it recorded the earliest bike rental by using `tzinfo` argument:

In [7]:
dt = datetime(2017, 6, 28, 9, 47, 36, tzinfo=sf_tz)

Now if we print it, the datetime includes the UTC offset:

In [8]:
print(dt)

2017-06-28 09:47:36-07:00


> Note that `tzinfo` does not change the time itself but shows the UTC offset only.

Making `datetime` 'aware' of its time zone enables us to ask new questions. Since `dt` now knows the difference between its own time and UTC, it can calculate the time difference between other time zones as well. For example, let's find out what the time was in India when the first bike rental started. 

We first store the UTC offset of India, which is 5 hours and 30 minutes:

In [9]:
# Store Indian standard time offset
ist = timezone(timedelta(hours=5, minutes=30))

To change the time zone of datetime we can use `astimezone()` method:

In [10]:
print('Printing the same moment in 2 different timezones...\n')
print('In San Francisco:', dt, '\n')
print('        In India:', dt.astimezone(ist))

Printing the same moment in 2 different timezones...

In San Francisco: 2017-06-28 09:47:36-07:00 

        In India: 2017-06-28 22:17:36+05:30


This time `.astimezone` changes the time itself to show the time in India. The new time is the same moment (the start of the earliest rental) in India time with UTC offset of +5 hours and 30 minutes.

Finally, there is an important difference between adjusting the timezone and changing the UTC offset. You can change the UTC offset directly with `.replace()` method:

In [11]:
new_tz = timezone(timedelta(hours=-3))
print('With old UTC offset:', dt)
print('With new UTC offset:', dt.replace(tzinfo=new_tz))

With old UTC offset: 2017-06-28 09:47:36-07:00
With new UTC offset: 2017-06-28 09:47:36-03:00


With `.replace()`, clock stays the same but the offset has shifted. Now, if we call `.astimezone()`, it will change both the UTC offset and the time itself:

In [12]:
print(dt.astimezone(timezone.utc))

2017-06-28 16:47:36+00:00


`timezone.utc` is a convenient attribute to shift any datetime to UTC timezone (not offset). These terms can be pretty confusing at first but by taking a random date and playing around to shift its timezone and changing the offsets will give you a nice intuition.

### Timezones database

Now that you know how UTC offsets and timezones work, let's talk about how to use timezones in practice. Here is a picture of all the timezones as of 2017:
<img src='images/timezones.png'></img>
<figcaption style="text-align: center;">
    <strong>
        Image by 
        <a href='https://upload.wikimedia.org/wikipedia/commons/c/cb/2017a.png'>Wikipedia</a>
    </strong>
</figcaption>

The zones cut across and within countries. You cannot possibly know all the UTC offsets. Thankfully, there is a database called `tz` which is updated 3-4 times a year as timezones change. 

This database is accessible in Python thought the built-in `dateutil` package:

In [13]:
from dateutil import tz

To illustrate it in practice, let's create a timezone object that corresponds to where our bikes data come from:

In [14]:
sf_tz = tz.gettz('America/Creston')

Within `tz`, timezones are defined first by the continent they are on and by the nearest major city (separated by a slash). For example, our bike data has a UTC offset of -7 hours and to get its official timezone name, you can go to [this](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) Wikipedia page and get the official `tz` name for -7 UTC offset.
Sometimes, you can get the time zone offset by using the city name directly but there are times where you have to refer to the above link to find the official time zone name in `tz` format.

Here are some more examples:
- Asia/Tashkent
- America/New_York
- Europe/London
- America/Mexico_City
- Pacific/Kiritimati
- America/Chicago

Now, instead of specifying the timezone by hand, we pass the object we got from `America/Creston` to the `datatime`:

In [15]:
sf_tz = tz.gettz('America/Creston')
dt = datetime(2017, 6, 28, 9, 47, 36, tzinfo=sf_tz) # Earliest rental
print(dt)

2017-06-28 09:47:36-07:00


### Daylight Saving Time <small id='dst'></small>

As if 24 timezones were not enough, some countries change their clocks twice a year to create longer summer evenings. 
> You know what is worse than jumbling up all the clocks in the world? It is jumbling up some of them! - Joe Hanson

Some countries have a yearly practice of shifting their clocks one hour forward in the spring and an hour back in the fall. This practice is called Daylight Saving Time because it 'supposedly' creates longer summer evenings. Dealing with DST is one the most challenging tasks in time-series analysis. Please watch [this](https://www.youtube.com/watch?v=bMrb56dDpic) YouTube video about the topic more. To keep it simple, let's look at an example:

On March 8, this year, at 01:59:59, the clocks in countries where DST is in practice was made to jump to 03:00:00 causing time to 'spring ahead' by an hour (not going into the details of why). This is a huge phenomenon for all the programmers and others 
alike.

To understand what happens, consider this visual:
<img src='images/spring.png'></img>

The green dashed lines represented the 7 hour difference before the 'spring forward'. After the jump, the black dash represents a new 6 hour difference. Let's see this in code:

In [16]:
# 1:59 AM before spring forward
sa_159_am = datetime(2020, 4, 8, 1, 59, 59)
print(sa_159_am.isoformat())
# 3:00 AM after spring forward
sa_300_am = datetime(2020, 4, 8, 3, 0, 0)
print(sa_300_am.isoformat())
# Find their difference in seconds
(sa_300_am - sa_159_am).total_seconds()

2020-04-08T01:59:59
2020-04-08T03:00:00


3601.0

Since, we did not set the UTC offset we are getting 1 hour and 1 second difference where in fact, it was just 1 second. Let's add the UTC offset and try the computation again:

In [17]:
# 1:59 AM before spring forward with tz
est = timezone(timedelta(hours=-7))
sa_159_am_tz = datetime(2020, 4, 8, 1, 59, 59, tzinfo=est)
print(sa_159_am_tz.isoformat())
# 3:00 AM after spring forward with tz
edt = timezone(timedelta(hours=-6))
sa_300_am_tz = datetime(2020, 4, 8, 3, 0, 0, tzinfo=edt)
print(sa_300_am_tz.isoformat())
# Find their difference in seconds
(sa_300_am_tz - sa_159_am_tz).total_seconds()

2020-04-08T01:59:59-07:00
2020-04-08T03:00:00-06:00


1.0

As expected, it now returns 1 second difference. Putting UTC offsets enabled us to put things in perspective.

But we cannot always now when the cutoff is since the dates change every year. Thankfully, `tz` can handle daylight saving under the hood, yay!

Let's perform the above operation with `tz`:

In [18]:
# Store the timezone for San Francisco
dst = tz.gettz('America/Creston')
# 1:59 AM before spring forward with tz
sa_159_dst = datetime(2020, 4, 8, 1, 59, 59, tzinfo=dst)
print(sa_159_am_tz.isoformat())
# 3:00 AM after spring forward with tz
sa_300_dst = datetime(2020, 4, 8, 3, 0, 0, tzinfo=dst)
print(sa_300_dst.isoformat())
(sa_300_dst - sa_159_dst).total_seconds()

2020-04-08T01:59:59-07:00
2020-04-08T03:00:00-07:00


3601.0

The same process applies for DTS ending in the fall. This year, DST ended on November 1 at 2 am. Actually, the clocks in countries with DST hit the 2 am twice. The first time, the clock goes from 2:00 to 2:59 in the normal way and instead of hitting three, the clock goes back to 2 and then continues. The UTC offset also changes back to -7 hours in San Fransisco from -6 hours.

Daylight Saving Time, not easy!