Time series data is an important form of structured data in many different fields, such
as finance, economics, ecology, neuroscience, and physics. Anything that is observed
or measured at many points in time forms a time series. Many time series are fixed
frequency, which is to say that data points occur at regular intervals according to some
rule, such as every 15 seconds, every 5 minutes, or once per month. Time series can
also be irregular without a fixed unit of time or offset between units.`

How you mark and refer to time series data depends on the application, and you may have one of the following:
1. Timestamps, specific instants in time
2. Fixed periods, such as the month January 2007 or the full year 2010
3. Intervals of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals
4. Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time (e.g., the diameter of a cookie baking each second since being placed in the oven)


In [1]:
import numpy as np
import pandas as pd

In [2]:
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# 11.1 Date and Time Data Types and Tools

In [3]:
from datetime import datetime

In [4]:
now = datetime.now()

In [5]:
print(now)

2020-07-24 11:19:23.472291


In [6]:
now.year

2020

In [7]:
now.day

24

In [8]:
now.month

7

In [9]:
now.year, now.day, now.month

(2020, 24, 7)

In [10]:
delta = datetime(2020, 7, 20) - datetime(2015, 2, 15, 7, 18)
delta

datetime.timedelta(days=1981, seconds=60120)

In [11]:
delta.days

1981

In [12]:
delta.seconds

60120

In [13]:
from datetime import timedelta

In [14]:
start = datetime(1998, 8, 24)

In [15]:
start + timedelta(-2)

datetime.datetime(1998, 8, 22, 0, 0)

In [16]:
start - 2 * timedelta(-2)

datetime.datetime(1998, 8, 28, 0, 0)

*See Table 11-1. Types in datetime module*

![Types in datetime module](Img/11.1.png)

## Converting Between String and Datetime

In [17]:
stamp = datetime(2020, 8, 24)

In [18]:
str(stamp)

'2020-08-24 00:00:00'

In [19]:
# Converting stamp-time into stamp-str
stamp.strftime('%Y-%m-%d')

'2020-08-24'

*See Table 11-2 for a complete list of the format codes present in Chp2*

In [20]:
value = '2020-08-24'

In [21]:
stamp = datetime.strptime(value, '%Y-%m-%d')

In [22]:
stamp

datetime.datetime(2020, 8, 24, 0, 0)

In [23]:
datelists = ['2020-02-24', '2019-08-15', '2018-04-18']

In [24]:
[datetime.strptime(x, '%Y-%m-%d') for x in datelists]   

[datetime.datetime(2020, 2, 24, 0, 0),
 datetime.datetime(2019, 8, 15, 0, 0),
 datetime.datetime(2018, 4, 18, 0, 0)]

**datetime.strptime** is a good way to parse a date with a known format. However, it
can be a bit annoying to have to write a format spec each time, especially for common
date formats. In this case, you can use the **parser.parse** method in the third-party
dateutil package (this is installed automatically when you install pandas):

In [25]:
from dateutil.parser import parse

In [26]:
parse('1998-08-24')

datetime.datetime(1998, 8, 24, 0, 0)

In [27]:
parse('August 24 1998 4:03 AM')

datetime.datetime(1998, 8, 24, 4, 3)

In [28]:
parse('26/12/2011')

datetime.datetime(2011, 12, 26, 0, 0)

In [29]:
parse('6/12/2011', dayfirst=True)

datetime.datetime(2011, 12, 6, 0, 0)

In [30]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']

In [31]:
stamp = pd.to_datetime(datestrs)
stamp

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

In [32]:
stamp = pd.to_datetime(datestrs + [None])
stamp

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [33]:
stamp[2]

NaT

**NaT** (Not a Time) is pandas’s null value for timestamp data.

*See Table 11-3. Locale-specific date formatting*

![Locale-specific date formatting](Img/11.3.png)

# 11.2 Time Series Basics

A basic kind of time series object in pandas is a Series indexed by timestamps, which
is often represented external to pandas as Python strings or datetime objects

In [34]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7),
         datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]

In [35]:
ts = pd.Series(np.random.randn(6), index=dates)

In [36]:
ts

2011-01-02   -0.699464
2011-01-05   -0.244297
2011-01-07   -0.090951
2011-01-08    0.470042
2011-01-10   -0.793374
2011-01-12    1.393132
dtype: float64

In [37]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [38]:
ts[::2]

2011-01-02   -0.699464
2011-01-07   -0.090951
2011-01-10   -0.793374
dtype: float64

In [39]:
ts + ts[::2]

2011-01-02   -1.398929
2011-01-05         NaN
2011-01-07   -0.181902
2011-01-08         NaN
2011-01-10   -1.586747
2011-01-12         NaN
dtype: float64

In [40]:
ts.index.dtype

dtype('<M8[ns]')

In [41]:
stamp = ts.index[0]

In [42]:
stamp

Timestamp('2011-01-02 00:00:00')

## Indexing, Selection, Subsetting

In [43]:
stamp = ts.index[2]

In [44]:
ts[stamp]

-0.0909509204561012

In [45]:
ts

2011-01-02   -0.699464
2011-01-05   -0.244297
2011-01-07   -0.090951
2011-01-08    0.470042
2011-01-10   -0.793374
2011-01-12    1.393132
dtype: float64

In [46]:
ts['1/10/2011'] #you can also pass a string that is interpretable as a date

-0.7933736746179036

In [47]:
ts['20110110'] #you can also pass a string that is interpretable as a date

-0.7933736746179036

In [48]:
lts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2010', periods=1000))

In [49]:
lts

2010-01-01   -0.704709
2010-01-02    1.837383
2010-01-03    0.341071
2010-01-04    0.097742
2010-01-05   -0.407447
                ...   
2012-09-22   -0.059790
2012-09-23   -0.798842
2012-09-24    0.753077
2012-09-25    1.906760
2012-09-26    0.259137
Freq: D, Length: 1000, dtype: float64

In [50]:
lts['2011']

2011-01-01   -0.645778
2011-01-02   -0.418975
2011-01-03    1.311760
2011-01-04   -0.008814
2011-01-05   -1.828030
                ...   
2011-12-27   -1.085650
2011-12-28   -1.127711
2011-12-29   -0.755337
2011-12-30   -0.928072
2011-12-31   -0.132062
Freq: D, Length: 365, dtype: float64

In [51]:
lts['2011-08']

2011-08-01    1.335937
2011-08-02   -0.780176
2011-08-03    0.484633
2011-08-04   -1.480517
2011-08-05   -1.021160
2011-08-06    1.671527
2011-08-07    0.062445
2011-08-08   -2.455676
2011-08-09    0.895652
2011-08-10   -0.455883
2011-08-11    0.227093
2011-08-12   -1.926669
2011-08-13   -1.114323
2011-08-14    0.177597
2011-08-15   -0.830138
2011-08-16    0.319179
2011-08-17    1.270361
2011-08-18    1.022314
2011-08-19   -0.933780
2011-08-20   -0.019454
2011-08-21    0.649744
2011-08-22   -1.018291
2011-08-23   -1.439373
2011-08-24   -0.437573
2011-08-25   -0.020791
2011-08-26   -0.775818
2011-08-27   -0.326353
2011-08-28   -0.903611
2011-08-29   -0.757056
2011-08-30    0.289583
2011-08-31   -1.465563
Freq: D, dtype: float64

In [52]:
lts[datetime(2011, 6, 1):]

2011-06-01    0.668646
2011-06-02    1.938513
2011-06-03    0.089917
2011-06-04    0.120984
2011-06-05    1.295813
                ...   
2012-09-22   -0.059790
2012-09-23   -0.798842
2012-09-24    0.753077
2012-09-25    1.906760
2012-09-26    0.259137
Freq: D, Length: 484, dtype: float64

In [53]:
lts['2011-06-01':]

2011-06-01    0.668646
2011-06-02    1.938513
2011-06-03    0.089917
2011-06-04    0.120984
2011-06-05    1.295813
                ...   
2012-09-22   -0.059790
2012-09-23   -0.798842
2012-09-24    0.753077
2012-09-25    1.906760
2012-09-26    0.259137
Freq: D, Length: 484, dtype: float64

In [54]:
lts['2011-08-25':'2012-08-24']

2011-08-25   -0.020791
2011-08-26   -0.775818
2011-08-27   -0.326353
2011-08-28   -0.903611
2011-08-29   -0.757056
                ...   
2012-08-20    0.136220
2012-08-21   -0.106766
2012-08-22    1.510062
2012-08-23   -1.600990
2012-08-24    0.816536
Freq: D, Length: 366, dtype: float64

As before, you can pass either a string date, **datetime**, or timestamp. Remember that
slicing in this manner produces views on the source time series like slicing NumPy
arrays. This means that no data is copied and modifications on the slice will be reflec‐
ted in the original data.

There is an equivalent instance method, truncate, that slices a Series between two
dates

In [55]:
lts.truncate(before='2012-08-01')

2012-08-01   -0.792636
2012-08-02   -0.559781
2012-08-03    1.586659
2012-08-04   -2.100676
2012-08-05    1.013318
2012-08-06    1.060719
2012-08-07   -1.027838
2012-08-08    0.492660
2012-08-09    0.098847
2012-08-10    0.890107
2012-08-11    1.734957
2012-08-12    0.809867
2012-08-13   -0.772731
2012-08-14    1.258249
2012-08-15    0.046875
2012-08-16   -2.215258
2012-08-17   -0.235623
2012-08-18   -0.984774
2012-08-19   -0.737829
2012-08-20    0.136220
2012-08-21   -0.106766
2012-08-22    1.510062
2012-08-23   -1.600990
2012-08-24    0.816536
2012-08-25   -0.615603
2012-08-26   -0.157884
2012-08-27    0.361055
2012-08-28   -1.629209
2012-08-29    0.210977
2012-08-30    1.522766
2012-08-31    1.374927
2012-09-01   -1.155710
2012-09-02    1.538894
2012-09-03    1.231100
2012-09-04   -0.560104
2012-09-05   -0.403327
2012-09-06   -0.647365
2012-09-07   -0.074291
2012-09-08   -0.207328
2012-09-09    0.474394
2012-09-10   -0.007620
2012-09-11   -2.181787
2012-09-12   -0.605994
2012-09-13 

In [56]:
dates = pd.date_range('2012-08-05', periods=100, freq='W-WED')

In [57]:
long_df = pd.DataFrame(np.random.randn(100,4), index=dates, columns=['NY', 'KHI', 'DEL', 'ANK'])

In [58]:
long_df

Unnamed: 0,NY,KHI,DEL,ANK
2012-08-08,-1.249639,-1.525235,0.420306,-0.728155
2012-08-15,-0.330611,-0.141644,-0.317756,1.424945
2012-08-22,-1.223875,0.397424,-0.339153,1.490641
2012-08-29,-0.023177,0.424403,-0.341877,-0.361665
2012-09-05,-0.311759,2.038592,0.194954,0.228798
...,...,...,...,...
2014-06-04,-1.798256,0.487355,-0.679435,0.206497
2014-06-11,-1.565414,-1.148157,-0.020673,-0.329483
2014-06-18,0.919673,0.451790,1.553959,-1.005787
2014-06-25,0.204322,-1.017883,-0.727417,1.310982


In [59]:
long_df.loc['2013-08']

Unnamed: 0,NY,KHI,DEL,ANK
2013-08-07,-0.035862,-0.226761,-0.160398,1.368164
2013-08-14,-0.246879,1.30964,-0.163837,0.141942
2013-08-21,-0.532962,0.423667,-0.550816,2.442256
2013-08-28,0.000713,-2.008464,-2.25786,-0.019136


## Time Series with Duplicate Indices

In some applications, there may be multiple data observations falling on a particular
timestamp

In [60]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', '1/2/2000', '1/3/2000'])

In [61]:
dates

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-02', '2000-01-02',
               '2000-01-03'],
              dtype='datetime64[ns]', freq=None)

In [62]:
dup_ts = pd.Series(np.random.randn(len(dates)), index=dates)
dup_ts

2000-01-01   -1.991019
2000-01-02    0.901127
2000-01-02    0.070804
2000-01-02    1.429537
2000-01-03    0.057691
dtype: float64

In [63]:
dup_ts.index.is_unique #means that index has dupplicate value

False

In [64]:
dup_ts['2000-01-02']

2000-01-02    0.901127
2000-01-02    0.070804
2000-01-02    1.429537
dtype: float64

In [65]:
dup_ts.groupby(level=0).count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

# 11.3 Date Ranges, Frequencies, and Shifting

In [66]:
ts = long_df['NY']

In [67]:
ts

2012-08-08   -1.249639
2012-08-15   -0.330611
2012-08-22   -1.223875
2012-08-29   -0.023177
2012-09-05   -0.311759
                ...   
2014-06-04   -1.798256
2014-06-11   -1.565414
2014-06-18    0.919673
2014-06-25    0.204322
2014-07-02    0.234815
Freq: W-WED, Name: NY, Length: 100, dtype: float64

In [68]:
resampler = ts.resample('D') #The string 'D' is interpreted as daily frequency.

## Generating Date Ranges

In [69]:
index = pd.date_range(start='1998-08-24', end='1998-12-24')

In [70]:
index

DatetimeIndex(['1998-08-24', '1998-08-25', '1998-08-26', '1998-08-27',
               '1998-08-28', '1998-08-29', '1998-08-30', '1998-08-31',
               '1998-09-01', '1998-09-02',
               ...
               '1998-12-15', '1998-12-16', '1998-12-17', '1998-12-18',
               '1998-12-19', '1998-12-20', '1998-12-21', '1998-12-22',
               '1998-12-23', '1998-12-24'],
              dtype='datetime64[ns]', length=123, freq='D')

In [71]:
pd.date_range(start='1998-08-24', periods=20)

DatetimeIndex(['1998-08-24', '1998-08-25', '1998-08-26', '1998-08-27',
               '1998-08-28', '1998-08-29', '1998-08-30', '1998-08-31',
               '1998-09-01', '1998-09-02', '1998-09-03', '1998-09-04',
               '1998-09-05', '1998-09-06', '1998-09-07', '1998-09-08',
               '1998-09-09', '1998-09-10', '1998-09-11', '1998-09-12'],
              dtype='datetime64[ns]', freq='D')

In [72]:
pd.date_range(end='2020-07-20', periods=366)

DatetimeIndex(['2019-07-21', '2019-07-22', '2019-07-23', '2019-07-24',
               '2019-07-25', '2019-07-26', '2019-07-27', '2019-07-28',
               '2019-07-29', '2019-07-30',
               ...
               '2020-07-11', '2020-07-12', '2020-07-13', '2020-07-14',
               '2020-07-15', '2020-07-16', '2020-07-17', '2020-07-18',
               '2020-07-19', '2020-07-20'],
              dtype='datetime64[ns]', length=366, freq='D')

In [73]:
#'BM' frequency is business end of month
pd.date_range(end='2020-07-20', periods=15, freq='BM')

DatetimeIndex(['2019-04-30', '2019-05-31', '2019-06-28', '2019-07-31',
               '2019-08-30', '2019-09-30', '2019-10-31', '2019-11-29',
               '2019-12-31', '2020-01-31', '2020-02-28', '2020-03-31',
               '2020-04-30', '2020-05-29', '2020-06-30'],
              dtype='datetime64[ns]', freq='BM')

*See Table 11-4. Base time series frequencies (not comprehensive)*

![Base time series frequencies](Img/11.4.png)

In [74]:
pd.date_range(start='20/07/2020 13:02:54', periods=10)

DatetimeIndex(['2020-07-20 13:02:54', '2020-07-21 13:02:54',
               '2020-07-22 13:02:54', '2020-07-23 13:02:54',
               '2020-07-24 13:02:54', '2020-07-25 13:02:54',
               '2020-07-26 13:02:54', '2020-07-27 13:02:54',
               '2020-07-28 13:02:54', '2020-07-29 13:02:54'],
              dtype='datetime64[ns]', freq='D')

In [75]:
pd.date_range(start='20/07/2020 13:02:54', periods=10, normalize=True)

DatetimeIndex(['2020-07-20', '2020-07-21', '2020-07-22', '2020-07-23',
               '2020-07-24', '2020-07-25', '2020-07-26', '2020-07-27',
               '2020-07-28', '2020-07-29'],
              dtype='datetime64[ns]', freq='D')

# Frequencies and Date Offsets

In [76]:
pd.date_range('20/07/2020', '23/07/2020', freq='4H')

DatetimeIndex(['2020-07-20 00:00:00', '2020-07-20 04:00:00',
               '2020-07-20 08:00:00', '2020-07-20 12:00:00',
               '2020-07-20 16:00:00', '2020-07-20 20:00:00',
               '2020-07-21 00:00:00', '2020-07-21 04:00:00',
               '2020-07-21 08:00:00', '2020-07-21 12:00:00',
               '2020-07-21 16:00:00', '2020-07-21 20:00:00',
               '2020-07-22 00:00:00', '2020-07-22 04:00:00',
               '2020-07-22 08:00:00', '2020-07-22 12:00:00',
               '2020-07-22 16:00:00', '2020-07-22 20:00:00',
               '2020-07-23 00:00:00'],
              dtype='datetime64[ns]', freq='4H')

In [77]:
pd.date_range('20/07/2020', '23/07/2020', freq='1H30Min')

DatetimeIndex(['2020-07-20 00:00:00', '2020-07-20 01:30:00',
               '2020-07-20 03:00:00', '2020-07-20 04:30:00',
               '2020-07-20 06:00:00', '2020-07-20 07:30:00',
               '2020-07-20 09:00:00', '2020-07-20 10:30:00',
               '2020-07-20 12:00:00', '2020-07-20 13:30:00',
               '2020-07-20 15:00:00', '2020-07-20 16:30:00',
               '2020-07-20 18:00:00', '2020-07-20 19:30:00',
               '2020-07-20 21:00:00', '2020-07-20 22:30:00',
               '2020-07-21 00:00:00', '2020-07-21 01:30:00',
               '2020-07-21 03:00:00', '2020-07-21 04:30:00',
               '2020-07-21 06:00:00', '2020-07-21 07:30:00',
               '2020-07-21 09:00:00', '2020-07-21 10:30:00',
               '2020-07-21 12:00:00', '2020-07-21 13:30:00',
               '2020-07-21 15:00:00', '2020-07-21 16:30:00',
               '2020-07-21 18:00:00', '2020-07-21 19:30:00',
               '2020-07-21 21:00:00', '2020-07-21 22:30:00',
               '2020-07-

### Week of month dates

One useful frequency class is “week of month,” starting with WOM. This enables you to
get dates like the third Friday of each month

In [78]:
pd.date_range('2020-01-21', '2020-12-21', freq='WOM-1FRI')

DatetimeIndex(['2020-02-07', '2020-03-06', '2020-04-03', '2020-05-01',
               '2020-06-05', '2020-07-03', '2020-08-07', '2020-09-04',
               '2020-10-02', '2020-11-06', '2020-12-04'],
              dtype='datetime64[ns]', freq='WOM-1FRI')

## Shifting (Leading and Lagging) Data

“Shifting” refers to moving data backward and forward through time. Both Series and
DataFrame have a shift method for doing naive shifts forward or backward, leaving
the index unmodified

In [79]:
ts = pd.Series(np.random.randn(6), index=pd.date_range('1/1/2000', periods=6, freq='M'))

In [80]:
ts

2000-01-31   -0.142454
2000-02-29   -0.376542
2000-03-31    0.155485
2000-04-30    0.738031
2000-05-31    0.543161
2000-06-30    0.520999
Freq: M, dtype: float64

In [81]:
ts.shift(2)

2000-01-31         NaN
2000-02-29         NaN
2000-03-31   -0.142454
2000-04-30   -0.376542
2000-05-31    0.155485
2000-06-30    0.738031
Freq: M, dtype: float64

In [82]:
ts.shift(-2)

2000-01-31    0.155485
2000-02-29    0.738031
2000-03-31    0.543161
2000-04-30    0.520999
2000-05-31         NaN
2000-06-30         NaN
Freq: M, dtype: float64

In [83]:
ts / ts.shift(1) - 1    

2000-01-31         NaN
2000-02-29    1.643256
2000-03-31   -1.412930
2000-04-30    3.746625
2000-05-31   -0.264041
2000-06-30   -0.040800
Freq: M, dtype: float64

In [84]:
#if the frequency is known, it can be passed to shift to advance the timestamps instead of data
ts.shift(2, freq='M')

2000-03-31   -0.142454
2000-04-30   -0.376542
2000-05-31    0.155485
2000-06-30    0.738031
2000-07-31    0.543161
2000-08-31    0.520999
Freq: M, dtype: float64

In [85]:
ts.shift(3, freq='D')

2000-02-03   -0.142454
2000-03-03   -0.376542
2000-04-03    0.155485
2000-05-03    0.738031
2000-06-03    0.543161
2000-07-03    0.520999
dtype: float64

In [86]:
ts.shift(1, freq='90T') #The T here stands for minutes. (90T -> 90 Minutes)

2000-01-31 01:30:00   -0.142454
2000-02-29 01:30:00   -0.376542
2000-03-31 01:30:00    0.155485
2000-04-30 01:30:00    0.738031
2000-05-31 01:30:00    0.543161
2000-06-30 01:30:00    0.520999
Freq: M, dtype: float64

### Shifting dates with offsets

In [87]:
from pandas.tseries.offsets import MonthEnd, Day

In [88]:
now = datetime(2020, 11, 17)
now

datetime.datetime(2020, 11, 17, 0, 0)

In [89]:
now + 3 * Day()

Timestamp('2020-11-20 00:00:00')

In [90]:
now + MonthEnd()

Timestamp('2020-11-30 00:00:00')

In [91]:
now + MonthEnd(2)

Timestamp('2020-12-31 00:00:00')

In [92]:
now + MonthEnd(3)

Timestamp('2021-01-31 00:00:00')

In [93]:
offset = MonthEnd()

In [94]:
offset.rollforward(now)

Timestamp('2020-11-30 00:00:00')

In [95]:
offset.rollback(now)

Timestamp('2020-10-31 00:00:00')

In [96]:
ts = pd.Series(np.random.randn(10), index=pd.date_range('22/07/2020', periods=10, freq='4d'))

In [97]:
ts

2020-07-22    1.571859
2020-07-26   -0.195805
2020-07-30    0.941375
2020-08-03    1.298535
2020-08-07    0.265367
2020-08-11   -0.917245
2020-08-15    1.606037
2020-08-19    1.178282
2020-08-23   -1.313130
2020-08-27   -0.344982
Freq: 4D, dtype: float64

In [98]:
ts.groupby(offset.rollforward).count()

2020-07-31    3
2020-08-31    7
dtype: int64

In [99]:
ts.groupby(offset.rollforward).mean()

2020-07-31    0.772476
2020-08-31    0.253266
dtype: float64

In [100]:
ts.resample('M').mean()

2020-07-31    0.772476
2020-08-31    0.253266
Freq: M, dtype: float64

In [101]:
ts.resample('M').count()

2020-07-31    3
2020-08-31    7
Freq: M, dtype: int64

# 11.4 Time Zone Handling

In [102]:
import pytz

In [103]:
pytz.common_timezones[-5:]

['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']

In [104]:
tz = pytz.timezone('America/New_York')
tz

<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>

## Time Zone Localization and Conversion

In [105]:
ts = pd.Series(np.random.randn(10), index=pd.date_range('22/07/2020 11:45', periods=10, freq='D'))

In [106]:
ts

2020-07-22 11:45:00    0.465961
2020-07-23 11:45:00   -0.863147
2020-07-24 11:45:00   -2.243858
2020-07-25 11:45:00    0.137871
2020-07-26 11:45:00   -0.684332
2020-07-27 11:45:00   -1.353112
2020-07-28 11:45:00   -0.332444
2020-07-29 11:45:00   -0.665797
2020-07-30 11:45:00    0.538120
2020-07-31 11:45:00    0.212053
Freq: D, dtype: float64

In [107]:
print(ts.index.tz)

None


In [108]:
ts_UTC = pd.Series(np.random.randn(10), index=pd.date_range('22/07/2020 11:45', periods=10, freq='D', tz='UTC'))

In [109]:
ts_UTC

2020-07-22 11:45:00+00:00   -1.420882
2020-07-23 11:45:00+00:00    0.533981
2020-07-24 11:45:00+00:00   -0.455333
2020-07-25 11:45:00+00:00   -1.568052
2020-07-26 11:45:00+00:00   -1.130101
2020-07-27 11:45:00+00:00   -0.295514
2020-07-28 11:45:00+00:00    1.155441
2020-07-29 11:45:00+00:00    0.354106
2020-07-30 11:45:00+00:00   -1.002802
2020-07-31 11:45:00+00:00   -0.530954
Freq: D, dtype: float64

Conversion from naive to localized is handled by the tz_localize method

In [110]:
ts_UTC = ts.tz_localize('UTC')

In [111]:
ts_UTC

2020-07-22 11:45:00+00:00    0.465961
2020-07-23 11:45:00+00:00   -0.863147
2020-07-24 11:45:00+00:00   -2.243858
2020-07-25 11:45:00+00:00    0.137871
2020-07-26 11:45:00+00:00   -0.684332
2020-07-27 11:45:00+00:00   -1.353112
2020-07-28 11:45:00+00:00   -0.332444
2020-07-29 11:45:00+00:00   -0.665797
2020-07-30 11:45:00+00:00    0.538120
2020-07-31 11:45:00+00:00    0.212053
Freq: D, dtype: float64

In [112]:
ts_UTC.index.tz

<UTC>

Once a time series has been localized to a particular time zone, it can be converted to
another time zone with tz_convert:

In [113]:
ts_UTC.tz_convert('America/New_York')

2020-07-22 07:45:00-04:00    0.465961
2020-07-23 07:45:00-04:00   -0.863147
2020-07-24 07:45:00-04:00   -2.243858
2020-07-25 07:45:00-04:00    0.137871
2020-07-26 07:45:00-04:00   -0.684332
2020-07-27 07:45:00-04:00   -1.353112
2020-07-28 07:45:00-04:00   -0.332444
2020-07-29 07:45:00-04:00   -0.665797
2020-07-30 07:45:00-04:00    0.538120
2020-07-31 07:45:00-04:00    0.212053
Freq: D, dtype: float64

In [114]:
ts_eastern = ts.tz_localize('America/New_York')

In [115]:
ts_eastern

2020-07-22 11:45:00-04:00    0.465961
2020-07-23 11:45:00-04:00   -0.863147
2020-07-24 11:45:00-04:00   -2.243858
2020-07-25 11:45:00-04:00    0.137871
2020-07-26 11:45:00-04:00   -0.684332
2020-07-27 11:45:00-04:00   -1.353112
2020-07-28 11:45:00-04:00   -0.332444
2020-07-29 11:45:00-04:00   -0.665797
2020-07-30 11:45:00-04:00    0.538120
2020-07-31 11:45:00-04:00    0.212053
Freq: D, dtype: float64

In [116]:
ts_eastern.tz_convert('UTC')

2020-07-22 15:45:00+00:00    0.465961
2020-07-23 15:45:00+00:00   -0.863147
2020-07-24 15:45:00+00:00   -2.243858
2020-07-25 15:45:00+00:00    0.137871
2020-07-26 15:45:00+00:00   -0.684332
2020-07-27 15:45:00+00:00   -1.353112
2020-07-28 15:45:00+00:00   -0.332444
2020-07-29 15:45:00+00:00   -0.665797
2020-07-30 15:45:00+00:00    0.538120
2020-07-31 15:45:00+00:00    0.212053
Freq: D, dtype: float64

In [117]:
ts_eastern.tz_convert('Europe/Berlin')

2020-07-22 17:45:00+02:00    0.465961
2020-07-23 17:45:00+02:00   -0.863147
2020-07-24 17:45:00+02:00   -2.243858
2020-07-25 17:45:00+02:00    0.137871
2020-07-26 17:45:00+02:00   -0.684332
2020-07-27 17:45:00+02:00   -1.353112
2020-07-28 17:45:00+02:00   -0.332444
2020-07-29 17:45:00+02:00   -0.665797
2020-07-30 17:45:00+02:00    0.538120
2020-07-31 17:45:00+02:00    0.212053
Freq: D, dtype: float64

In [118]:
ts_eastern.tz_convert('Asia/Karachi')

2020-07-22 20:45:00+05:00    0.465961
2020-07-23 20:45:00+05:00   -0.863147
2020-07-24 20:45:00+05:00   -2.243858
2020-07-25 20:45:00+05:00    0.137871
2020-07-26 20:45:00+05:00   -0.684332
2020-07-27 20:45:00+05:00   -1.353112
2020-07-28 20:45:00+05:00   -0.332444
2020-07-29 20:45:00+05:00   -0.665797
2020-07-30 20:45:00+05:00    0.538120
2020-07-31 20:45:00+05:00    0.212053
Freq: D, dtype: float64

In [119]:
ts_eastern.tz_convert('Asia/Shanghai')

2020-07-22 23:45:00+08:00    0.465961
2020-07-23 23:45:00+08:00   -0.863147
2020-07-24 23:45:00+08:00   -2.243858
2020-07-25 23:45:00+08:00    0.137871
2020-07-26 23:45:00+08:00   -0.684332
2020-07-27 23:45:00+08:00   -1.353112
2020-07-28 23:45:00+08:00   -0.332444
2020-07-29 23:45:00+08:00   -0.665797
2020-07-30 23:45:00+08:00    0.538120
2020-07-31 23:45:00+08:00    0.212053
Freq: D, dtype: float64

## Operations with Time Zone−Aware Timestamp Objects

In [120]:
stamp = pd.Timestamp('2020-08-24 04:00')

In [121]:
stamp_utc = stamp.tz_localize('UTC')
stamp_utc

Timestamp('2020-08-24 04:00:00+0000', tz='UTC')

In [122]:
stamp_utc.tz_convert('Asia/Karachi')

Timestamp('2020-08-24 09:00:00+0500', tz='Asia/Karachi')

In [123]:
stamp_KHI = pd.Timestamp('2020-08-24 04:00', tz='Asia/Karachi')
stamp_KHI

Timestamp('2020-08-24 04:00:00+0500', tz='Asia/Karachi')

Time zone–aware Timestamp objects internally store a UTC timestamp value as nano‐
seconds since the Unix epoch (January 1, 1970); this UTC value is invariant between
time zone conversions

In [124]:
stamp_utc.value

1598241600000000000

In [125]:
stamp_utc.tz_convert('America/New_York').value

1598241600000000000

In [126]:
from pandas.tseries.offsets import Hour

In [127]:
t_stamp = pd.Timestamp('2020-08-24 09:00', tz='Asia/Karachi')

In [128]:
t_stamp

Timestamp('2020-08-24 09:00:00+0500', tz='Asia/Karachi')

In [129]:
t_stamp + Hour()

Timestamp('2020-08-24 10:00:00+0500', tz='Asia/Karachi')

In [130]:
t_stamp + Hour(15)

Timestamp('2020-08-25 00:00:00+0500', tz='Asia/Karachi')

In [131]:
stamp = pd.Timestamp('2012-11-04 00:30', tz='US/Eastern')

In [132]:
stamp

Timestamp('2012-11-04 00:30:00-0400', tz='US/Eastern')

In [133]:
stamp + 2 * Hour()

Timestamp('2012-11-04 01:30:00-0500', tz='US/Eastern')

## Operations Between Different Time Zones

In [134]:
ts = pd.Series(np.random.randn(10), index=pd.date_range('22/07/2020 9:30', freq='B', periods=10))

In [135]:
ts

2020-07-22 09:30:00   -0.087394
2020-07-23 09:30:00   -1.668032
2020-07-24 09:30:00    2.751867
2020-07-27 09:30:00    0.761674
2020-07-28 09:30:00   -0.862198
2020-07-29 09:30:00    1.421057
2020-07-30 09:30:00   -0.991912
2020-07-31 09:30:00   -0.879898
2020-08-03 09:30:00    1.212179
2020-08-04 09:30:00    0.065789
Freq: B, dtype: float64

In [136]:
ts1 = ts[:7].tz_localize('US/Eastern')
ts1

2020-07-22 09:30:00-04:00   -0.087394
2020-07-23 09:30:00-04:00   -1.668032
2020-07-24 09:30:00-04:00    2.751867
2020-07-27 09:30:00-04:00    0.761674
2020-07-28 09:30:00-04:00   -0.862198
2020-07-29 09:30:00-04:00    1.421057
2020-07-30 09:30:00-04:00   -0.991912
Freq: B, dtype: float64

In [137]:
ts2 = ts[2:].tz_localize('US/Eastern')
ts2

2020-07-24 09:30:00-04:00    2.751867
2020-07-27 09:30:00-04:00    0.761674
2020-07-28 09:30:00-04:00   -0.862198
2020-07-29 09:30:00-04:00    1.421057
2020-07-30 09:30:00-04:00   -0.991912
2020-07-31 09:30:00-04:00   -0.879898
2020-08-03 09:30:00-04:00    1.212179
2020-08-04 09:30:00-04:00    0.065789
Freq: B, dtype: float64

In [138]:
result = ts1 + ts2
result.index

DatetimeIndex(['2020-07-22 09:30:00-04:00', '2020-07-23 09:30:00-04:00',
               '2020-07-24 09:30:00-04:00', '2020-07-27 09:30:00-04:00',
               '2020-07-28 09:30:00-04:00', '2020-07-29 09:30:00-04:00',
               '2020-07-30 09:30:00-04:00', '2020-07-31 09:30:00-04:00',
               '2020-08-03 09:30:00-04:00', '2020-08-04 09:30:00-04:00'],
              dtype='datetime64[ns, US/Eastern]', freq='B')

# 11.5 Periods and Period Arithmetic

*Periods* represent timespans, like days, months, quarters, or years. 

In [139]:
p = pd.Period(2018, freq='A-DEC')

In [140]:
p

Period('2018', 'A-DEC')

In the above case, the Period object represents the full timespan from January 1, 2018, to
December 31, 2018, inclusive

In [142]:
p + 2 

Period('2020', 'A-DEC')

In [144]:
p - 7

Period('2011', 'A-DEC')

In [145]:
pd.Period(2020, freq='A-DEC') - p

<2 * YearEnds: month=12>

In [147]:
rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')

In [148]:
rng

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')

In [158]:
pd.Series(data=np.random.random(6), index=rng)

2000-01    0.629161
2000-02    0.443245
2000-03    0.372908
2000-04    0.504169
2000-05    0.253219
2000-06    0.441228
Freq: M, dtype: float64

In [159]:
values = ['2001Q3', '2002Q2', '2003Q3']

In [161]:
pd.PeriodIndex(values, freq='Q-DEC')

PeriodIndex(['2001Q3', '2002Q2', '2003Q3'], dtype='period[Q-DEC]', freq='Q-DEC')

## Period Frequency Conversion

Periods and PeriodIndex objects can be converted to another frequency with their **asfreq** method.

In [163]:
p = pd.Period('2008', freq='A-DEC')

In [164]:
p

Period('2008', 'A-DEC')

In [165]:
p.asfreq('M', how='start')

Period('2008-01', 'M')

In [167]:
p.asfreq('M', how='end')

Period('2008-12', 'M')

In [168]:
p = pd.Period(2010, freq='A-JUN')

In [169]:
p

Period('2010', 'A-JUN')

In [170]:
p.asfreq('M', how='start')

Period('2009-07', 'M')

In [171]:
p.asfreq('M', how='end')

Period('2010-06', 'M')

In [172]:
p = pd.Period('Aug-2020', freq='M')

In [173]:
p

Period('2020-08', 'M')

In [182]:
p.asfreq('A-JUN')

Period('2021', 'A-JUN')

In [183]:
rng = pd.period_range('2010', '2020', freq='A-DEC')

In [184]:
ts = pd.Series(data=np.random.randn(len(rng)), index=rng)

In [185]:
ts

2010   -0.295352
2011    1.860815
2012    0.015717
2013   -0.375502
2014    0.416370
2015    0.274343
2016    0.511785
2017    1.918298
2018   -0.398792
2019   -0.434031
2020   -0.046855
Freq: A-DEC, dtype: float64

In [192]:
ts.asfreq('M', how='start').asfreq('D', how='start')

2010-01-01   -0.295352
2011-01-01    1.860815
2012-01-01    0.015717
2013-01-01   -0.375502
2014-01-01    0.416370
2015-01-01    0.274343
2016-01-01    0.511785
2017-01-01    1.918298
2018-01-01   -0.398792
2019-01-01   -0.434031
2020-01-01   -0.046855
Freq: D, dtype: float64

In [195]:
ts.asfreq('B')

2010-12-31   -0.295352
2011-12-30    1.860815
2012-12-31    0.015717
2013-12-31   -0.375502
2014-12-31    0.416370
2015-12-31    0.274343
2016-12-30    0.511785
2017-12-29    1.918298
2018-12-31   -0.398792
2019-12-31   -0.434031
2020-12-31   -0.046855
Freq: B, dtype: float64

## Quarterly Period Frequencies

Quarterly data is standard in accounting, finance, and other fields. Much quarterly
data is reported relative to a fiscal year end, typically the last calendar or business day
of one of the 12 months of the year

In [206]:
p = pd.Period('2012Q4', freq='Q-JAN')

In [207]:
p

Period('2012Q4', 'Q-JAN')

In [208]:
p.asfreq('D', 'start')

Period('2011-11-01', 'D')

In [209]:
p.asfreq('D', 'end')

Period('2012-01-31', 'D')

In [218]:
p4pm = (p.asfreq('B', 'e')-1).asfreq('T', 's') + 16*60

In [219]:
p4pm

Period('2012-01-30 16:00', 'T')

In [220]:
p4pm.to_timestamp()

Timestamp('2012-01-30 16:00:00')

In [221]:
rng = pd.period_range('2018Q3', '2020Q4', freq='Q-JAN')

In [223]:
ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [224]:
ts

2018Q3   -0.494310
2018Q4    0.116757
2019Q1    0.335931
2019Q2    1.045650
2019Q3   -0.267613
2019Q4   -0.582756
2020Q1   -0.011221
2020Q2   -0.887878
2020Q3    0.079652
2020Q4    1.568490
Freq: Q-JAN, dtype: float64

In [230]:
new_rng = (rng.asfreq('B', 'e') - 1).asfreq('T', 's') + 16*60

In [231]:
new_rng

PeriodIndex(['2017-10-30 16:00', '2018-01-30 16:00', '2018-04-27 16:00',
             '2018-07-30 16:00', '2018-10-30 16:00', '2019-01-30 16:00',
             '2019-04-29 16:00', '2019-07-30 16:00', '2019-10-30 16:00',
             '2020-01-30 16:00'],
            dtype='period[T]', freq='T')

In [234]:
ts.index = new_rng.to_timestamp()

In [235]:
ts

2017-10-30 16:00:00   -0.494310
2018-01-30 16:00:00    0.116757
2018-04-27 16:00:00    0.335931
2018-07-30 16:00:00    1.045650
2018-10-30 16:00:00   -0.267613
2019-01-30 16:00:00   -0.582756
2019-04-29 16:00:00   -0.011221
2019-07-30 16:00:00   -0.887878
2019-10-30 16:00:00    0.079652
2020-01-30 16:00:00    1.568490
dtype: float64

## Converting Timestamps to Periods (and Back)

In [244]:
rng = pd.date_range('2000-01-01', periods=5, freq='M')

In [245]:
ts = pd.Series(data=np.arange(len(rng)), index=rng)

In [246]:
ts

2000-01-31    0
2000-02-29    1
2000-03-31    2
2000-04-30    3
2000-05-31    4
Freq: M, dtype: int32

In [248]:
pts = ts.to_period()

In [249]:
pts

2000-01    0
2000-02    1
2000-03    2
2000-04    3
2000-05    4
Freq: M, dtype: int32

In [257]:
pts.to_timestamp(how='end')

2000-01-31 23:59:59.999999999    0
2000-02-29 23:59:59.999999999    1
2000-03-31 23:59:59.999999999    2
2000-04-30 23:59:59.999999999    3
2000-05-31 23:59:59.999999999    4
Freq: M, dtype: int32

## Creating a PeriodIndex from Arrays

In [337]:
data = pd.read_csv('macrodata.csv')

In [338]:
data.head()

Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
0,1959.0,1.0,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0
1,1959.0,2.0,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74
2,1959.0,3.0,2775.488,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09
3,1959.0,4.0,2785.204,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06
4,1960.0,1.0,2847.699,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19


In [339]:
data['year']

0      1959.0
1      1959.0
2      1959.0
3      1959.0
4      1960.0
        ...  
198    2008.0
199    2008.0
200    2009.0
201    2009.0
202    2009.0
Name: year, Length: 203, dtype: float64

In [340]:
data['quarter']

0      1.0
1      2.0
2      3.0
3      4.0
4      1.0
      ... 
198    3.0
199    4.0
200    1.0
201    2.0
202    3.0
Name: quarter, Length: 203, dtype: float64

In [341]:
index = pd.PeriodIndex(year=data['year'], quarter=data['quarter'], freq='Q-DEC')

In [342]:
data.index = index

In [343]:
data.head()

Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
1959Q1,1959.0,1.0,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0
1959Q2,1959.0,2.0,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74
1959Q3,1959.0,3.0,2775.488,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09
1959Q4,1959.0,4.0,2785.204,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06
1960Q1,1960.0,1.0,2847.699,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19


In [364]:
data['year'] = pd.to_datetime(data['year'], format='%Y')

In [370]:
data.head()

Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
1959Q1,1959-01-01,1.0,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0
1959Q2,1959-01-01,2.0,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74
1959Q3,1959-01-01,3.0,2775.488,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09
1959Q4,1959-01-01,4.0,2785.204,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06
1960Q1,1960-01-01,1.0,2847.699,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19


# 11.6 Resampling and Frequency Conversion