# Chapter 12: Date and Time Manipulation

If we are dealing with timezone, we want three things  
- time
- timezone
- offset

If we are only concerned with duration, we can just use the UTC time or seconds since UNIX epoch

A general recommendation is to store dates in UTC times and then convert them to local time as needed

In [1]:
import pandas as pd
import numpy as np

url = "http://github.com/mattharrison/datasets/raw/master/data/vehicles.csv.zip"
df = pd.read_csv(url)
city_mpg = df.city08
highway_mpg = df.highway08

  df = pd.read_csv(url)


## 12.2 Loading UTC Time Data

In [4]:
col = pd.Series(['2015-03-08 08:00:00+00:00',
                 '2015-03-08 09:00:00+00:00',
                 '2015-03-08 09:30:00+00:00',
                 '2015-03-08 06:30:00+00:00',
                 '2015-11-01 07:00:00+00:00',
                 '2015-11-01 07:30:00+00:00',
                 '2015-11-01 08:00:00+00:00',
                 '2015-11-01 08:30:00+00:00',
                 '2015-11-01 08:00:00+00:00',
                 '2015-11-01 08:30:00+00:00',
                 '2015-11-01 09:00:00+00:00',
                 '2015-11-01 09:30:00+00:00',
                 '2015-11-01 10:30:00+00:00'])

In [5]:
col

0     2015-03-08 08:00:00+00:00
1     2015-03-08 09:00:00+00:00
2     2015-03-08 09:30:00+00:00
3     2015-03-08 06:30:00+00:00
4     2015-11-01 07:00:00+00:00
5     2015-11-01 07:30:00+00:00
6     2015-11-01 08:00:00+00:00
7     2015-11-01 08:30:00+00:00
8     2015-11-01 08:00:00+00:00
9     2015-11-01 08:30:00+00:00
10    2015-11-01 09:00:00+00:00
11    2015-11-01 09:30:00+00:00
12    2015-11-01 10:30:00+00:00
dtype: object

In [7]:
# converts to utc time
utc_s = pd.to_datetime(col, utc=True)
utc_s

0    2015-03-08 08:00:00+00:00
1    2015-03-08 09:00:00+00:00
2    2015-03-08 09:30:00+00:00
3    2015-03-08 06:30:00+00:00
4    2015-11-01 07:00:00+00:00
5    2015-11-01 07:30:00+00:00
6    2015-11-01 08:00:00+00:00
7    2015-11-01 08:30:00+00:00
8    2015-11-01 08:00:00+00:00
9    2015-11-01 08:30:00+00:00
10   2015-11-01 09:00:00+00:00
11   2015-11-01 09:30:00+00:00
12   2015-11-01 10:30:00+00:00
dtype: datetime64[ns, UTC]

- Once we have dates stored as UTC, we can leverage the ``.dt`` attribute

In [8]:
# convert series to America/Denver timezone
utc_s.dt.tz_convert('America/Denver')

0    2015-03-08 01:00:00-07:00
1    2015-03-08 03:00:00-06:00
2    2015-03-08 03:30:00-06:00
3    2015-03-07 23:30:00-07:00
4    2015-11-01 01:00:00-06:00
5    2015-11-01 01:30:00-06:00
6    2015-11-01 01:00:00-07:00
7    2015-11-01 01:30:00-07:00
8    2015-11-01 01:00:00-07:00
9    2015-11-01 01:30:00-07:00
10   2015-11-01 02:00:00-07:00
11   2015-11-01 02:30:00-07:00
12   2015-11-01 03:30:00-07:00
dtype: datetime64[ns, America/Denver]

## 12.3 Loading Local Time Data

If we want to load local date information, we need 
- date
- offset
- timezone

In [9]:
time = pd.Series(['2015-03-08 01:00:00',
                  '2015-03-08 01:30:00',
                  '2015-03-08 02:00:00',
                  '2015-03-08 02:30:00',
                  '2015-03-08 03:00:00',
                  '2015-03-08 03:30:00',
                  '2015-11-01 00:30:00',
                  '2015-11-01 01:00:00',
                  '2015-11-01 01:30:00',
                  '2015-11-01 02:00:00',
                  '2015-11-01 02:30:00',
                  '2015-11-01 03:00:00'])

In [10]:
offset = pd.Series([-7, -7, -7, -7, -7, -7, -6. -6, -6, -6, -6, -6])

## 12.6 Manipulating Dates

In [13]:
url = 'https://github.com/mattharrison/datasets/raw/master/data/alta-noaa-1980-2019.csv'
alta_df = pd.read_csv(url)

In [15]:
alta_df.head()

Unnamed: 0,STATION,NAME,LATITUDE,LONGITUDE,ELEVATION,DATE,DAPR,DASF,MDPR,MDSF,...,SNWD,TMAX,TMIN,TOBS,WT01,WT03,WT04,WT05,WT06,WT11
0,USC00420072,"ALTA, UT US",40.5905,-111.6369,2660.9,1980-01-01,,,,,...,29.0,38.0,25.0,25.0,,,,,,
1,USC00420072,"ALTA, UT US",40.5905,-111.6369,2660.9,1980-01-02,,,,,...,34.0,27.0,18.0,18.0,,,,,,
2,USC00420072,"ALTA, UT US",40.5905,-111.6369,2660.9,1980-01-03,,,,,...,30.0,27.0,12.0,18.0,,,,,,
3,USC00420072,"ALTA, UT US",40.5905,-111.6369,2660.9,1980-01-04,,,,,...,30.0,31.0,18.0,27.0,,,,,,
4,USC00420072,"ALTA, UT US",40.5905,-111.6369,2660.9,1980-01-05,,,,,...,30.0,34.0,26.0,34.0,,,,,,


In [17]:
# convert to datetime
dates = pd.to_datetime(alta_df.DATE)
dates

0       1980-01-01
1       1980-01-02
2       1980-01-03
3       1980-01-04
4       1980-01-05
           ...    
14155   2019-09-03
14156   2019-09-04
14157   2019-09-05
14158   2019-09-06
14159   2019-09-07
Name: DATE, Length: 14160, dtype: datetime64[ns]

- Having type of date as ``datetime64[ns]`` gives us access to the ``.dt`` attribute

In [19]:
dates.dt.day_name('')

0          Tuesday
1        Wednesday
2         Thursday
3           Friday
4         Saturday
           ...    
14155      Tuesday
14156    Wednesday
14157     Thursday
14158       Friday
14159     Saturday
Name: DATE, Length: 14160, dtype: object

In [20]:
dates.dt.strftime('%d/%m/%y')

0        01/01/80
1        02/01/80
2        03/01/80
3        04/01/80
4        05/01/80
           ...   
14155    03/09/19
14156    04/09/19
14157    05/09/19
14158    06/09/19
14159    07/09/19
Name: DATE, Length: 14160, dtype: object