Time series data is an important form of structured data in many different fields, such
as finance, economics, ecology, neuroscience, and physics. Anything that is observed
or measured at many points in time forms a time series. Many time series are fixed
frequency, which is to say that data points occur at regular intervals according to some
rule, such as every 15 seconds, every 5 minutes, or once per month. Time series can
also be irregular without a fixed unit of time or offset between units.`

How you mark and refer to time series data depends on the application, and you may have one of the following:
1. Timestamps, specific instants in time
2. Fixed periods, such as the month January 2007 or the full year 2010
3. Intervals of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals
4. Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time (e.g., the diameter of a cookie baking each second since being placed in the oven)


In [1]:
import numpy as np
import pandas as pd

In [2]:
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# 11.1 Date and Time Data Types and Tools

In [3]:
from datetime import datetime

In [17]:
now = datetime.now()

In [18]:
print(now)

2020-07-20 11:24:55.911926


In [23]:
now.year

2020

In [24]:
now.day

20

In [25]:
now.month

7

In [26]:
now.year, now.day, now.month

(2020, 20, 7)

In [32]:
delta = datetime(2020, 7, 20) - datetime(2015, 2, 15, 7, 18)
delta

datetime.timedelta(days=1981, seconds=60120)

In [35]:
delta.days

1981

In [36]:
delta.seconds

60120

In [37]:
from datetime import timedelta

In [38]:
start = datetime(1998, 8, 24)

In [39]:
start + timedelta(-2)

datetime.datetime(1998, 8, 22, 0, 0)

In [40]:
start - 2 * timedelta(-2)

datetime.datetime(1998, 8, 28, 0, 0)

*See Table 11-1. Types in datetime module*

![Types in datetime module](Img/11.1.png)

## Converting Between String and Datetime

In [42]:
stamp = datetime(2020, 8, 24)

In [44]:
str(stamp)

'2020-08-24 00:00:00'

In [45]:
# Converting stamp-time into stamp-str
stamp.strftime('%Y-%m-%d')

'2020-08-24'

*See Table 11-2 for a complete list of the format codes present in Chp2*

In [46]:
value = '2020-08-24'

In [49]:
stamp = datetime.strptime(value, '%Y-%m-%d')

In [50]:
stamp

datetime.datetime(2020, 8, 24, 0, 0)

In [53]:
datelists = ['2020-02-24', '2019-08-15', '2018-04-18']

In [55]:
[datetime.strptime(x, '%Y-%m-%d') for x in datelists]   

[datetime.datetime(2020, 2, 24, 0, 0),
 datetime.datetime(2019, 8, 15, 0, 0),
 datetime.datetime(2018, 4, 18, 0, 0)]

**datetime.strptime** is a good way to parse a date with a known format. However, it
can be a bit annoying to have to write a format spec each time, especially for common
date formats. In this case, you can use the **parser.parse** method in the third-party
dateutil package (this is installed automatically when you install pandas):

In [56]:
from dateutil.parser import parse

In [57]:
parse('1998-08-24')

datetime.datetime(1998, 8, 24, 0, 0)

In [63]:
parse('August 24 1998 4:03 AM')

datetime.datetime(1998, 8, 24, 4, 3)

In [71]:
parse('26/12/2011')

datetime.datetime(2011, 12, 26, 0, 0)

In [72]:
parse('6/12/2011', dayfirst=True)

datetime.datetime(2011, 12, 6, 0, 0)

In [73]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']

In [77]:
stamp = pd.to_datetime(datestrs)
stamp

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

In [78]:
stamp = pd.to_datetime(datestrs + [None])
stamp

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [79]:
stamp[2]

NaT

**NaT** (Not a Time) is pandas’s null value for timestamp data.

*See Table 11-3. Locale-specific date formatting*

![Locale-specific date formatting](Img/11.3.png)

# 11.2 Time Series Basics

A basic kind of time series object in pandas is a Series indexed by timestamps, which
is often represented external to pandas as Python strings or datetime objects

In [84]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7),
         datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]

In [89]:
ts = pd.Series(np.random.randn(6), index=dates)

In [90]:
ts

2011-01-02   -0.115575
2011-01-05    1.649951
2011-01-07   -2.462054
2011-01-08   -1.649965
2011-01-10   -1.292433
2011-01-12    0.244163
dtype: float64

In [91]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [92]:
ts[::2]

2011-01-02   -0.115575
2011-01-07   -2.462054
2011-01-10   -1.292433
dtype: float64

In [93]:
ts + ts[::2]

2011-01-02   -0.231150
2011-01-05         NaN
2011-01-07   -4.924108
2011-01-08         NaN
2011-01-10   -2.584866
2011-01-12         NaN
dtype: float64

In [95]:
ts.index.dtype

dtype('<M8[ns]')

In [97]:
stamp = ts.index[0]

In [98]:
stamp

Timestamp('2011-01-02 00:00:00')

## Indexing, Selection, Subsetting

In [103]:
stamp = ts.index[2]

In [104]:
ts[stamp]

-2.462053885890917

In [105]:
ts

2011-01-02   -0.115575
2011-01-05    1.649951
2011-01-07   -2.462054
2011-01-08   -1.649965
2011-01-10   -1.292433
2011-01-12    0.244163
dtype: float64

In [106]:
ts['1/10/2011'] #you can also pass a string that is interpretable as a date

-1.2924328091862354

In [109]:
ts['20110110'] #you can also pass a string that is interpretable as a date

-1.2924328091862354

In [110]:
lts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2010', periods=1000))

In [111]:
lts

2010-01-01   -0.887836
2010-01-02    1.358469
2010-01-03   -0.594992
2010-01-04    1.745589
2010-01-05   -0.574138
                ...   
2012-09-22    0.038806
2012-09-23   -1.689092
2012-09-24    0.263320
2012-09-25    0.637088
2012-09-26   -0.136430
Freq: D, Length: 1000, dtype: float64

In [112]:
lts['2011']

2011-01-01    0.233362
2011-01-02   -1.638986
2011-01-03    0.105603
2011-01-04    0.686798
2011-01-05    0.414685
                ...   
2011-12-27   -1.194920
2011-12-28    1.304034
2011-12-29    0.921509
2011-12-30   -1.219101
2011-12-31   -0.850703
Freq: D, Length: 365, dtype: float64

In [116]:
lts['2011-08']

2011-08-01   -0.488186
2011-08-02   -0.704139
2011-08-03   -0.997885
2011-08-04    0.985268
2011-08-05    1.269453
2011-08-06   -0.786984
2011-08-07    0.536157
2011-08-08    0.627609
2011-08-09   -0.566383
2011-08-10    0.721205
2011-08-11   -0.238393
2011-08-12    0.063682
2011-08-13   -1.667524
2011-08-14    0.978698
2011-08-15   -0.836981
2011-08-16   -1.731711
2011-08-17    0.939709
2011-08-18   -0.575420
2011-08-19   -0.375245
2011-08-20    0.386578
2011-08-21    0.344467
2011-08-22   -0.853468
2011-08-23    1.201829
2011-08-24   -0.089071
2011-08-25    0.017278
2011-08-26   -2.005134
2011-08-27    1.443558
2011-08-28    0.034308
2011-08-29   -1.483223
2011-08-30    1.419453
2011-08-31   -0.201764
Freq: D, dtype: float64

In [120]:
lts[datetime(2011, 6, 1):]

2011-06-01    0.722423
2011-06-02    0.977205
2011-06-03   -1.189621
2011-06-04    0.343135
2011-06-05    2.132609
                ...   
2012-09-22    0.038806
2012-09-23   -1.689092
2012-09-24    0.263320
2012-09-25    0.637088
2012-09-26   -0.136430
Freq: D, Length: 484, dtype: float64

In [121]:
lts['2011-06-01':]

2011-06-01    0.722423
2011-06-02    0.977205
2011-06-03   -1.189621
2011-06-04    0.343135
2011-06-05    2.132609
                ...   
2012-09-22    0.038806
2012-09-23   -1.689092
2012-09-24    0.263320
2012-09-25    0.637088
2012-09-26   -0.136430
Freq: D, Length: 484, dtype: float64

In [126]:
lts['2011-08-25':'2012-08-24']

2011-08-25    0.017278
2011-08-26   -2.005134
2011-08-27    1.443558
2011-08-28    0.034308
2011-08-29   -1.483223
                ...   
2012-08-20    2.183054
2012-08-21    0.036474
2012-08-22    0.743651
2012-08-23   -1.626381
2012-08-24   -0.250551
Freq: D, Length: 366, dtype: float64

As before, you can pass either a string date, **datetime**, or timestamp. Remember that
slicing in this manner produces views on the source time series like slicing NumPy
arrays. This means that no data is copied and modifications on the slice will be reflec‐
ted in the original data.

There is an equivalent instance method, truncate, that slices a Series between two
dates

In [128]:
lts.truncate(before='2012-08-01')

2012-08-01    1.167278
2012-08-02    0.068496
2012-08-03    1.765299
2012-08-04   -2.099291
2012-08-05    0.291564
2012-08-06    1.392684
2012-08-07   -0.201388
2012-08-08   -2.995515
2012-08-09   -0.509542
2012-08-10    0.131930
2012-08-11   -0.923551
2012-08-12    2.313246
2012-08-13    0.191619
2012-08-14   -0.305479
2012-08-15    0.768651
2012-08-16   -1.153979
2012-08-17    1.694728
2012-08-18    0.094561
2012-08-19    1.726559
2012-08-20    2.183054
2012-08-21    0.036474
2012-08-22    0.743651
2012-08-23   -1.626381
2012-08-24   -0.250551
2012-08-25   -0.585316
2012-08-26   -0.016837
2012-08-27    1.073179
2012-08-28    0.796280
2012-08-29   -1.229766
2012-08-30    0.693535
2012-08-31   -0.932928
2012-09-01    0.643632
2012-09-02   -0.224287
2012-09-03   -0.415292
2012-09-04    0.083396
2012-09-05    0.075680
2012-09-06    0.459317
2012-09-07   -2.204199
2012-09-08    0.049365
2012-09-09   -0.895895
2012-09-10    1.197555
2012-09-11    1.197341
2012-09-12   -0.452108
2012-09-13 

In [133]:
dates = pd.date_range('2012-08-05', periods=100, freq='W-WED')

In [135]:
long_df = pd.DataFrame(np.random.randn(100,4), index=dates, columns=['NY', 'KHI', 'DEL', 'ANK'])

In [136]:
long_df

Unnamed: 0,NY,KHI,DEL,ANK
2012-08-08,1.405694,0.118228,2.712276,1.918814
2012-08-15,-1.559010,-0.571948,-0.036434,-1.705900
2012-08-22,2.124025,0.329116,0.805366,-0.562864
2012-08-29,-0.283042,1.389906,-0.677782,0.169700
2012-09-05,0.808404,-1.028570,0.489753,0.337929
...,...,...,...,...
2014-06-04,-0.570335,0.315681,-0.004869,-0.548640
2014-06-11,0.423229,-0.754539,-0.905362,-0.621869
2014-06-18,-0.736397,1.252538,-0.839161,-1.201307
2014-06-25,-0.550854,-0.355272,-1.205977,-0.282236


In [139]:
long_df.loc['2013-08']

Unnamed: 0,NY,KHI,DEL,ANK
2013-08-07,0.492894,-0.621687,0.755777,-1.929348
2013-08-14,0.040749,-0.752769,-0.004053,-0.918692
2013-08-21,0.807032,-0.53067,1.045536,-1.443252
2013-08-28,0.047445,2.433065,-2.0842,-0.558682


## Time Series with Duplicate Indices

In some applications, there may be multiple data observations falling on a particular
timestamp

In [140]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', '1/2/2000', '1/3/2000'])

In [141]:
dates

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-02', '2000-01-02',
               '2000-01-03'],
              dtype='datetime64[ns]', freq=None)

In [144]:
dup_ts = pd.Series(np.random.randn(len(dates)), index=dates)
dup_ts

2000-01-01   -0.519137
2000-01-02   -0.958715
2000-01-02    0.936128
2000-01-02   -0.552168
2000-01-03    0.333420
dtype: float64

In [146]:
dup_ts.index.is_unique #means that index has dupplicate value

False

In [149]:
dup_ts['2000-01-02']

2000-01-02   -0.958715
2000-01-02    0.936128
2000-01-02   -0.552168
dtype: float64

In [154]:
dup_ts.groupby(level=0).count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

# 11.3 Date Ranges, Frequencies, and Shifting

In [164]:
ts = long_df['NY']

In [165]:
ts

2012-08-08    1.405694
2012-08-15   -1.559010
2012-08-22    2.124025
2012-08-29   -0.283042
2012-09-05    0.808404
                ...   
2014-06-04   -0.570335
2014-06-11    0.423229
2014-06-18   -0.736397
2014-06-25   -0.550854
2014-07-02    1.242100
Freq: W-WED, Name: NY, Length: 100, dtype: float64

In [166]:
resampler = ts.resample('D') #The string 'D' is interpreted as daily frequency.

## Generating Date Ranges

In [177]:
index = pd.date_range(start='1998-08-24', end='1998-12-24')

In [178]:
index

DatetimeIndex(['1998-08-24', '1998-08-25', '1998-08-26', '1998-08-27',
               '1998-08-28', '1998-08-29', '1998-08-30', '1998-08-31',
               '1998-09-01', '1998-09-02',
               ...
               '1998-12-15', '1998-12-16', '1998-12-17', '1998-12-18',
               '1998-12-19', '1998-12-20', '1998-12-21', '1998-12-22',
               '1998-12-23', '1998-12-24'],
              dtype='datetime64[ns]', length=123, freq='D')

In [179]:
pd.date_range(start='1998-08-24', periods=20)

DatetimeIndex(['1998-08-24', '1998-08-25', '1998-08-26', '1998-08-27',
               '1998-08-28', '1998-08-29', '1998-08-30', '1998-08-31',
               '1998-09-01', '1998-09-02', '1998-09-03', '1998-09-04',
               '1998-09-05', '1998-09-06', '1998-09-07', '1998-09-08',
               '1998-09-09', '1998-09-10', '1998-09-11', '1998-09-12'],
              dtype='datetime64[ns]', freq='D')

In [182]:
pd.date_range(end='2020-07-20', periods=366)

DatetimeIndex(['2019-07-21', '2019-07-22', '2019-07-23', '2019-07-24',
               '2019-07-25', '2019-07-26', '2019-07-27', '2019-07-28',
               '2019-07-29', '2019-07-30',
               ...
               '2020-07-11', '2020-07-12', '2020-07-13', '2020-07-14',
               '2020-07-15', '2020-07-16', '2020-07-17', '2020-07-18',
               '2020-07-19', '2020-07-20'],
              dtype='datetime64[ns]', length=366, freq='D')

In [184]:
#'BM' frequency is business end of month
pd.date_range(end='2020-07-20', periods=15, freq='BM')

DatetimeIndex(['2019-04-30', '2019-05-31', '2019-06-28', '2019-07-31',
               '2019-08-30', '2019-09-30', '2019-10-31', '2019-11-29',
               '2019-12-31', '2020-01-31', '2020-02-28', '2020-03-31',
               '2020-04-30', '2020-05-29', '2020-06-30'],
              dtype='datetime64[ns]', freq='BM')

*See Table 11-4. Base time series frequencies (not comprehensive)*

![Base time series frequencies](Img/11.4.png)

In [187]:
pd.date_range(start='20/07/2020 13:02:54', periods=10)

DatetimeIndex(['2020-07-20 13:02:54', '2020-07-21 13:02:54',
               '2020-07-22 13:02:54', '2020-07-23 13:02:54',
               '2020-07-24 13:02:54', '2020-07-25 13:02:54',
               '2020-07-26 13:02:54', '2020-07-27 13:02:54',
               '2020-07-28 13:02:54', '2020-07-29 13:02:54'],
              dtype='datetime64[ns]', freq='D')

In [188]:
pd.date_range(start='20/07/2020 13:02:54', periods=10, normalize=True)

DatetimeIndex(['2020-07-20', '2020-07-21', '2020-07-22', '2020-07-23',
               '2020-07-24', '2020-07-25', '2020-07-26', '2020-07-27',
               '2020-07-28', '2020-07-29'],
              dtype='datetime64[ns]', freq='D')

# Frequencies and Date Offsets

In [194]:
pd.date_range('20/07/2020', '23/07/2020', freq='4H')

DatetimeIndex(['2020-07-20 00:00:00', '2020-07-20 04:00:00',
               '2020-07-20 08:00:00', '2020-07-20 12:00:00',
               '2020-07-20 16:00:00', '2020-07-20 20:00:00',
               '2020-07-21 00:00:00', '2020-07-21 04:00:00',
               '2020-07-21 08:00:00', '2020-07-21 12:00:00',
               '2020-07-21 16:00:00', '2020-07-21 20:00:00',
               '2020-07-22 00:00:00', '2020-07-22 04:00:00',
               '2020-07-22 08:00:00', '2020-07-22 12:00:00',
               '2020-07-22 16:00:00', '2020-07-22 20:00:00',
               '2020-07-23 00:00:00'],
              dtype='datetime64[ns]', freq='4H')

In [197]:
pd.date_range('20/07/2020', '23/07/2020', freq='1H30Min')

DatetimeIndex(['2020-07-20 00:00:00', '2020-07-20 01:30:00',
               '2020-07-20 03:00:00', '2020-07-20 04:30:00',
               '2020-07-20 06:00:00', '2020-07-20 07:30:00',
               '2020-07-20 09:00:00', '2020-07-20 10:30:00',
               '2020-07-20 12:00:00', '2020-07-20 13:30:00',
               '2020-07-20 15:00:00', '2020-07-20 16:30:00',
               '2020-07-20 18:00:00', '2020-07-20 19:30:00',
               '2020-07-20 21:00:00', '2020-07-20 22:30:00',
               '2020-07-21 00:00:00', '2020-07-21 01:30:00',
               '2020-07-21 03:00:00', '2020-07-21 04:30:00',
               '2020-07-21 06:00:00', '2020-07-21 07:30:00',
               '2020-07-21 09:00:00', '2020-07-21 10:30:00',
               '2020-07-21 12:00:00', '2020-07-21 13:30:00',
               '2020-07-21 15:00:00', '2020-07-21 16:30:00',
               '2020-07-21 18:00:00', '2020-07-21 19:30:00',
               '2020-07-21 21:00:00', '2020-07-21 22:30:00',
               '2020-07-

### Week of month dates

One useful frequency class is “week of month,” starting with WOM. This enables you to
get dates like the third Friday of each month

In [200]:
pd.date_range('2020-01-21', '2020-12-21', freq='WOM-1FRI')

DatetimeIndex(['2020-02-07', '2020-03-06', '2020-04-03', '2020-05-01',
               '2020-06-05', '2020-07-03', '2020-08-07', '2020-09-04',
               '2020-10-02', '2020-11-06', '2020-12-04'],
              dtype='datetime64[ns]', freq='WOM-1FRI')

## Shifting (Leading and Lagging) Data