# Time Series

__Time series__ data is an important form of structured data in many different fields, such as finance, economics, ecology, neuroscience, and physics. Anything that is observed or measured at many points in time forms a time series. Many time series are _fixed frequency_, which is to say that data pints occur at regular intervals according to some rule, such as 15 seconds, every 5 minutes, or once per month. Time series can also be _irregular_ without a fixed unit of time or offset between the units. How you mark and refer to time series data depends on the application, and you may have one of the following:
- __Timestamps__, specific instants in time
- Fixed __periods__, such as the month January 2007 or the full year 2010
- __Intervals__ of time, indicated by a start and end timestamp. Periods can be thought of as specialcase of intervals
- Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time (e.g., the diameter of a cookie baking each second since being placed in the oven)

In this chapter, we are mainly concerned with time series in the first three categories, though many of the techniques can be applied to experimental time series where the index may be an integer or floating-point number indicating elpsed time for the start of the experiment. The simplest and most widely used kind of time series are those indexed by timestamp.

__pandas__ provides many built-in time series tools and data algorithms. You can efficiently work woth very large time series and easily slice and dice, aggregate, and resample irregular- and fixed-frequency time series. Some of these tools are espcially useful for financial and economics applications, but you could certainly use them to analyze server log data, too.

## Data and Time Data Types and Tools

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime

In [2]:
now = datetime.now()

In [3]:
now

datetime.datetime(2019, 3, 14, 10, 48, 24, 670417)

In [4]:
now.year, now.month, now.day

(2019, 3, 14)

In [5]:
delta = datetime(2011,1,7)-datetime(2008,6,24,8,15)

In [6]:
delta

datetime.timedelta(days=926, seconds=56700)

In [7]:
delta.days

926

In [8]:
delta.seconds

56700

In [9]:
from datetime import timedelta

In [10]:
start = datetime(2011,1,7)

In [11]:
start + timedelta(12)

datetime.datetime(2011, 1, 19, 0, 0)

In [12]:
start - 2*timedelta(12)

datetime.datetime(2010, 12, 14, 0, 0)

### Converting Between String and Tools

In [13]:
stamp = datetime(2011,1,3)

In [14]:
str(stamp)

'2011-01-03 00:00:00'

In [15]:
stamp.strftime('%Y-%m-%d')

'2011-01-03'

In [16]:
value = '2011-01-03'

In [17]:
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [18]:
datestrs = ['7/6/2011', '8/6/2011']

In [19]:
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

In [20]:
from dateutil.parser import parse

In [21]:
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [22]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

In [23]:
parse('6/12/2011', dayfirst = True)

datetime.datetime(2011, 12, 6, 0, 0)

In [24]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']

In [25]:
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

In [26]:
idx = pd.to_datetime(datestrs + [None])

In [27]:
idx

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [28]:
idx[2]

NaT

In [29]:
pd.isnull(idx)

array([False, False,  True])

## Time Series Basics

In [30]:
from datetime import datetime

In [31]:
dates = [datetime(2011,1,2), datetime(2011,1,5),
        datetime(2011,1,7), datetime(2011,1,8),
        datetime(2011,1,10),datetime(2011,1,12)]

In [32]:
ts = pd.Series(np.random.randn(6),index=dates)

In [33]:
ts

2011-01-02   -0.787158
2011-01-05    0.771740
2011-01-07    1.313703
2011-01-08   -0.442174
2011-01-10   -0.524393
2011-01-12   -0.015609
dtype: float64

In [34]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [35]:
ts + ts[::2]

2011-01-02   -1.574316
2011-01-05         NaN
2011-01-07    2.627405
2011-01-08         NaN
2011-01-10   -1.048786
2011-01-12         NaN
dtype: float64

In [36]:
ts.index.dtype

dtype('<M8[ns]')

In [37]:
stamp = ts.index[0]

In [38]:
stamp

Timestamp('2011-01-02 00:00:00')

### Indexing, Selection, Subsetting

In [39]:
stamp = ts.index[2]

In [40]:
ts[stamp]

1.313702543063706

In [41]:
ts['1/10/2011']

-0.5243930053946363

In [42]:
ts['20110110']

-0.5243930053946363

In [43]:
longer_ts = pd.Series(np.random.randn(1000),
                     index=pd.date_range('1/1/2000',periods=1000))

In [44]:
longer_ts

2000-01-01   -0.681287
2000-01-02   -0.336899
2000-01-03   -1.251992
2000-01-04   -0.261979
2000-01-05    0.313501
2000-01-06   -0.486776
2000-01-07   -1.597715
2000-01-08    0.608604
2000-01-09    0.480928
2000-01-10    1.164814
2000-01-11    0.539854
2000-01-12    0.476257
2000-01-13   -0.841299
2000-01-14    0.144397
2000-01-15   -0.379186
2000-01-16   -0.146685
2000-01-17    0.835795
2000-01-18    2.672593
2000-01-19    1.117163
2000-01-20    1.285928
2000-01-21   -1.998992
2000-01-22    2.129501
2000-01-23   -0.675257
2000-01-24   -1.299804
2000-01-25    0.079599
2000-01-26   -0.308168
2000-01-27    0.757572
2000-01-28   -0.262957
2000-01-29    0.927925
2000-01-30   -1.982229
                ...   
2002-08-28    0.846354
2002-08-29   -0.110727
2002-08-30    2.123540
2002-08-31    0.968090
2002-09-01   -0.469636
2002-09-02    1.002549
2002-09-03   -1.780497
2002-09-04   -0.286825
2002-09-05   -1.973089
2002-09-06    0.738586
2002-09-07   -0.342340
2002-09-08   -1.046445
2002-09-09 

In [45]:
longer_ts['2001']

2001-01-01    0.043194
2001-01-02   -1.964050
2001-01-03   -0.535598
2001-01-04   -0.877326
2001-01-05   -1.393307
2001-01-06   -1.230758
2001-01-07   -0.550593
2001-01-08    0.432926
2001-01-09    1.653182
2001-01-10   -0.879634
2001-01-11    1.266799
2001-01-12   -1.229866
2001-01-13   -0.485148
2001-01-14    0.886132
2001-01-15   -1.013127
2001-01-16    0.583068
2001-01-17    1.327860
2001-01-18    0.649800
2001-01-19    0.479794
2001-01-20   -0.497157
2001-01-21    0.064763
2001-01-22    0.439525
2001-01-23   -1.065011
2001-01-24   -1.407821
2001-01-25   -0.031860
2001-01-26   -0.401878
2001-01-27    1.837667
2001-01-28    0.856678
2001-01-29   -0.785448
2001-01-30    0.892654
                ...   
2001-12-02    0.538393
2001-12-03    0.457431
2001-12-04   -0.684353
2001-12-05    1.577848
2001-12-06   -1.597052
2001-12-07    1.090769
2001-12-08   -0.439826
2001-12-09    0.407772
2001-12-10   -0.366970
2001-12-11    0.461338
2001-12-12   -1.139061
2001-12-13   -0.978721
2001-12-14 

In [46]:
longer_ts['2001-05']

2001-05-01    0.411043
2001-05-02   -0.651293
2001-05-03    0.767914
2001-05-04    1.982674
2001-05-05   -0.776771
2001-05-06    0.236972
2001-05-07    0.953803
2001-05-08    1.788658
2001-05-09   -1.504606
2001-05-10   -0.237862
2001-05-11   -0.387560
2001-05-12    0.597591
2001-05-13    0.697430
2001-05-14    0.436563
2001-05-15   -0.954409
2001-05-16    1.751339
2001-05-17    1.037096
2001-05-18   -1.659601
2001-05-19   -1.650265
2001-05-20   -0.863021
2001-05-21   -1.196871
2001-05-22   -0.231639
2001-05-23   -0.473601
2001-05-24    1.145300
2001-05-25    0.568848
2001-05-26   -0.379305
2001-05-27    0.684614
2001-05-28   -1.472452
2001-05-29    0.314945
2001-05-30    0.285756
2001-05-31    0.020037
Freq: D, dtype: float64

In [47]:
ts[datetime(2011,1,7):]

2011-01-07    1.313703
2011-01-08   -0.442174
2011-01-10   -0.524393
2011-01-12   -0.015609
dtype: float64

In [48]:
ts

2011-01-02   -0.787158
2011-01-05    0.771740
2011-01-07    1.313703
2011-01-08   -0.442174
2011-01-10   -0.524393
2011-01-12   -0.015609
dtype: float64

In [49]:
ts['1/6/2011':'1/11/2011']

2011-01-07    1.313703
2011-01-08   -0.442174
2011-01-10   -0.524393
dtype: float64

In [50]:
ts.truncate(after='1/9/2011')

2011-01-02   -0.787158
2011-01-05    0.771740
2011-01-07    1.313703
2011-01-08   -0.442174
dtype: float64

In [51]:
dates = pd.date_range('1/1/2000',periods=100,freq='W-WED')

In [52]:
long_df = pd.DataFrame(np.random.randn(100,4),
                      index = dates,
                      columns = ['Colorado','Texas',
                                'New York', 'Ohio'])

In [53]:
long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,-2.099893,-0.949003,-0.47989,-0.628382
2001-05-09,-0.812352,0.173144,-1.336675,0.263134
2001-05-16,-0.651724,2.165416,-0.633549,-0.719036
2001-05-23,0.934558,-0.113774,0.42962,0.209886
2001-05-30,0.319725,-0.41993,1.14537,0.040338


### Time Series with Duplicate Indices

In [54]:
dates = pd.DatetimeIndex(['1/1/2000','1/2/2000','1/2/2000',
                          '1/2/2000','1/3/2000'])

In [55]:
dup_ts = pd.Series(np.arange(5),index=dates)

In [56]:
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

In [57]:
dup_ts.index.is_unique

False

In [58]:
dup_ts['1/3/2000'] # not duplicated

4

In [59]:
dup_ts['1/2/2000'] # duplicated

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32

In [60]:
grouped = dup_ts.groupby(level=0)

In [61]:
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int32

In [62]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

## Date Ranges, Frequencies, and Shifting

In [63]:
ts

2011-01-02   -0.787158
2011-01-05    0.771740
2011-01-07    1.313703
2011-01-08   -0.442174
2011-01-10   -0.524393
2011-01-12   -0.015609
dtype: float64

In [64]:
resampler = ts.resample('D') # 'D' stands for daily frequency

### Generating Data Ranges

In [65]:
index = pd.date_range('2012-04-01', '2012-06-01')

In [66]:
index

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20',
               '2012-04-21', '2012-04-22', '2012-04-23', '2012-04-24',
               '2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28',
               '2012-04-29', '2012-04-30', '2012-05-01', '2012-05-02',
               '2012-05-03', '2012-05-04', '2012-05-05', '2012-05-06',
               '2012-05-07', '2012-05-08', '2012-05-09', '2012-05-10',
               '2012-05-11', '2012-05-12', '2012-05-13', '2012-05-14',
               '2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18',
               '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22',
               '2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
      

In [67]:
pd.date_range(start='2012-04-01', periods=20)

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')

In [68]:
pd.date_range(start='2012-06-01', periods=20)

DatetimeIndex(['2012-06-01', '2012-06-02', '2012-06-03', '2012-06-04',
               '2012-06-05', '2012-06-06', '2012-06-07', '2012-06-08',
               '2012-06-09', '2012-06-10', '2012-06-11', '2012-06-12',
               '2012-06-13', '2012-06-14', '2012-06-15', '2012-06-16',
               '2012-06-17', '2012-06-18', '2012-06-19', '2012-06-20'],
              dtype='datetime64[ns]', freq='D')

In [69]:
pd.date_range('2000-01-01','2000-12-01',freq='BM')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BM')

In [70]:
pd.date_range('2012-05-02 12:56:31', periods=5)

DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
               '2012-05-04 12:56:31', '2012-05-05 12:56:31',
               '2012-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

In [71]:
pd.date_range('2012-05-02 12:56:31', periods=5, normalize=True)

DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
               '2012-05-06'],
              dtype='datetime64[ns]', freq='D')

### Frequencies and Data Offsets

In [72]:
from pandas.tseries.offsets import Hour, Minute

In [73]:
hour = Hour()

In [74]:
hour

<Hour>

In [75]:
four_hours = Hour(4)

In [76]:
four_hours

<4 * Hours>

In [79]:
pd.date_range('2000-01-01', '2000-01-03 23:59', freq='4h')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
               '2000-01-01 08:00:00', '2000-01-01 12:00:00',
               '2000-01-01 16:00:00', '2000-01-01 20:00:00',
               '2000-01-02 00:00:00', '2000-01-02 04:00:00',
               '2000-01-02 08:00:00', '2000-01-02 12:00:00',
               '2000-01-02 16:00:00', '2000-01-02 20:00:00',
               '2000-01-03 00:00:00', '2000-01-03 04:00:00',
               '2000-01-03 08:00:00', '2000-01-03 12:00:00',
               '2000-01-03 16:00:00', '2000-01-03 20:00:00'],
              dtype='datetime64[ns]', freq='4H')

In [84]:
Hour(2) + Minute(30)

<150 * Minutes>

In [85]:
pd.date_range('2000-01-01',periods=10,freq='1h30min')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
               '2000-01-01 03:00:00', '2000-01-01 04:30:00',
               '2000-01-01 06:00:00', '2000-01-01 07:30:00',
               '2000-01-01 09:00:00', '2000-01-01 10:30:00',
               '2000-01-01 12:00:00', '2000-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90T')

#### Week of month dates

In [86]:
rng = pd.date_range('2012-01-01','2012-09-01',freq='WOM-3FRI')

In [87]:
list(rng)

[Timestamp('2012-01-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-02-17 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-03-16 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-04-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-05-18 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-06-15 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-07-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-08-17 00:00:00', freq='WOM-3FRI')]

### Shifting (Leading and Lagging) Data

In [88]:
ts = pd.Series(np.random.randn(4),
              index=pd.date_range('1/1/2000',periods=4,freq='M'))

In [89]:
ts

2000-01-31   -0.274146
2000-02-29   -0.666427
2000-03-31   -0.191362
2000-04-30    1.182281
Freq: M, dtype: float64

In [90]:
ts.shift(2)

2000-01-31         NaN
2000-02-29         NaN
2000-03-31   -0.274146
2000-04-30   -0.666427
Freq: M, dtype: float64

In [91]:
ts.shift(-2)

2000-01-31   -0.191362
2000-02-29    1.182281
2000-03-31         NaN
2000-04-30         NaN
Freq: M, dtype: float64

In [92]:
ts.shift(2,freq='M')

2000-03-31   -0.274146
2000-04-30   -0.666427
2000-05-31   -0.191362
2000-06-30    1.182281
Freq: M, dtype: float64

In [93]:
ts.shift(3,freq='D')

2000-02-03   -0.274146
2000-03-03   -0.666427
2000-04-03   -0.191362
2000-05-03    1.182281
dtype: float64

In [94]:
ts.shift(1,freq='90T')

2000-01-31 01:30:00   -0.274146
2000-02-29 01:30:00   -0.666427
2000-03-31 01:30:00   -0.191362
2000-04-30 01:30:00    1.182281
Freq: M, dtype: float64

## Time Zone Handling

### Time Zone Localization and Conversion

### Operations with Time Zone - Aware Timestamp Objects

### Operations Between Different Time Zones

## Periods and Period Arithmetic

### Period Frequency Conversation

### Quarterly Period Frequencies

### Converting Timestamps to Periods (and Back)

### Creating a PeriodIndex from Arrays

## Resampling and Frequency Conversion

### Downsampling 

### Upsampling and Interpolation

### Resampling with Periods

## Moving Windows Functions

### Exponentially Weighted Functions

### Binary Moving Windows Functions

### User-Defined Moving Window Functions