# Time Series

__Time series__ data is an important form of structured data in many different fields, such as finance, economics, ecology, neuroscience, and physics. Anything that is observed or measured at many points in time forms a time series. Many time series are _fixed frequency_, which is to say that data pints occur at regular intervals according to some rule, such as 15 seconds, every 5 minutes, or once per month. Time series can also be _irregular_ without a fixed unit of time or offset between the units. How you mark and refer to time series data depends on the application, and you may have one of the following:
- __Timestamps__, specific instants in time
- Fixed __periods__, such as the month January 2007 or the full year 2010
- __Intervals__ of time, indicated by a start and end timestamp. Periods can be thought of as specialcase of intervals
- Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time (e.g., the diameter of a cookie baking each second since being placed in the oven)

In this chapter, we are mainly concerned with time series in the first three categories, though many of the techniques can be applied to experimental time series where the index may be an integer or floating-point number indicating elpsed time for the start of the experiment. The simplest and most widely used kind of time series are those indexed by timestamp.

__pandas__ provides many built-in time series tools and data algorithms. You can efficiently work woth very large time series and easily slice and dice, aggregate, and resample irregular- and fixed-frequency time series. Some of these tools are espcially useful for financial and economics applications, but you could certainly use them to analyze server log data, too.

## Data and Time Data Types and Tools

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime

In [2]:
now = datetime.now()

In [3]:
now

datetime.datetime(2019, 3, 1, 19, 52, 12, 8674)

In [4]:
now.year, now.month, now.day

(2019, 3, 1)

In [5]:
delta = datetime(2011,1,7)-datetime(2008,6,24,8,15)

In [6]:
delta

datetime.timedelta(days=926, seconds=56700)

In [7]:
delta.days

926

In [8]:
delta.seconds

56700

In [9]:
from datetime import timedelta

In [10]:
start = datetime(2011,1,7)

In [11]:
start + timedelta(12)

datetime.datetime(2011, 1, 19, 0, 0)

In [12]:
start - 2*timedelta(12)

datetime.datetime(2010, 12, 14, 0, 0)

### Converting Between String and Tools

In [13]:
stamp = datetime(2011,1,3)

In [14]:
str(stamp)

'2011-01-03 00:00:00'

In [15]:
stamp.strftime('%Y-%m-%d')

'2011-01-03'

In [16]:
value = '2011-01-03'

In [17]:
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [18]:
datestrs = ['7/6/2011', '8/6/2011']

In [19]:
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

In [20]:
from dateutil.parser import parse

In [21]:
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [22]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

In [23]:
parse('6/12/2011', dayfirst = True)

datetime.datetime(2011, 12, 6, 0, 0)

In [24]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']

In [25]:
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

In [26]:
idx = pd.to_datetime(datestrs + [None])

In [27]:
idx

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [28]:
idx[2]

NaT

In [29]:
pd.isnull(idx)

array([False, False,  True])

## Time Series Basics

In [30]:
from datetime import datetime

In [31]:
dates = [datetime(2011,1,2), datetime(2011,1,5),
        datetime(2011,1,7), datetime(2011,1,8),
        datetime(2011,1,10),datetime(2011,1,12)]

In [32]:
ts = pd.Series(np.random.randn(6),index=dates)

In [33]:
ts

2011-01-02    1.042713
2011-01-05   -1.927515
2011-01-07    0.870192
2011-01-08    0.505934
2011-01-10   -0.920950
2011-01-12   -0.325004
dtype: float64

In [34]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [35]:
ts + ts[::2]

2011-01-02    2.085426
2011-01-05         NaN
2011-01-07    1.740384
2011-01-08         NaN
2011-01-10   -1.841900
2011-01-12         NaN
dtype: float64

In [36]:
ts.index.dtype

dtype('<M8[ns]')

In [37]:
stamp = ts.index[0]

In [38]:
stamp

Timestamp('2011-01-02 00:00:00')

### Indexing, Selection, Subsetting

In [39]:
stamp = ts.index[2]

In [40]:
ts[stamp]

0.8701917507857299

In [41]:
ts['1/10/2011']

-0.9209499080428349

In [42]:
ts['20110110']

-0.9209499080428349

In [43]:
longer_ts = pd.Series(np.random.randn(1000),
                     index=pd.date_range('1/1/2000',periods=1000))

In [44]:
longer_ts

2000-01-01   -0.587942
2000-01-02   -0.080187
2000-01-03   -0.167161
2000-01-04    1.986039
2000-01-05   -0.550993
2000-01-06    0.259449
2000-01-07   -1.293137
2000-01-08    0.123616
2000-01-09    0.325366
2000-01-10   -0.083922
2000-01-11   -0.373335
2000-01-12   -0.204482
2000-01-13   -0.952185
2000-01-14   -1.806144
2000-01-15    0.494247
2000-01-16   -0.806407
2000-01-17    0.289800
2000-01-18    0.059239
2000-01-19    1.041886
2000-01-20   -0.978989
2000-01-21   -1.520501
2000-01-22    1.439379
2000-01-23    0.159106
2000-01-24    0.356575
2000-01-25    2.256634
2000-01-26    0.278874
2000-01-27    0.178646
2000-01-28   -0.968139
2000-01-29    0.843049
2000-01-30   -0.605716
                ...   
2002-08-28   -0.746613
2002-08-29   -1.193238
2002-08-30    0.541408
2002-08-31    0.724293
2002-09-01   -1.149845
2002-09-02    0.426148
2002-09-03   -0.843385
2002-09-04    0.536386
2002-09-05   -0.839678
2002-09-06    0.289774
2002-09-07   -1.318428
2002-09-08    0.133471
2002-09-09 

In [45]:
longer_ts['2001']

2001-01-01   -0.243990
2001-01-02   -0.016835
2001-01-03    1.387961
2001-01-04   -0.216779
2001-01-05   -0.699629
2001-01-06   -0.547750
2001-01-07   -0.541751
2001-01-08   -0.393888
2001-01-09    0.364247
2001-01-10    0.665509
2001-01-11    0.609364
2001-01-12   -1.123701
2001-01-13    1.452918
2001-01-14    0.895996
2001-01-15   -0.053948
2001-01-16   -1.151484
2001-01-17    1.430368
2001-01-18   -0.484620
2001-01-19    0.495061
2001-01-20   -0.281044
2001-01-21   -0.358461
2001-01-22    0.091027
2001-01-23   -1.207346
2001-01-24    0.910044
2001-01-25    1.110754
2001-01-26    0.189259
2001-01-27   -0.363218
2001-01-28   -2.352811
2001-01-29   -1.090038
2001-01-30   -0.359185
                ...   
2001-12-02    0.702351
2001-12-03    0.069681
2001-12-04    1.237456
2001-12-05    0.178249
2001-12-06    0.945764
2001-12-07    1.817563
2001-12-08    1.279529
2001-12-09    1.726680
2001-12-10   -0.956679
2001-12-11    0.227928
2001-12-12    0.206689
2001-12-13    0.724194
2001-12-14 

In [46]:
longer_ts['2001-05']

2001-05-01    0.066128
2001-05-02   -0.255005
2001-05-03    0.104448
2001-05-04    0.700222
2001-05-05    0.715257
2001-05-06   -0.094077
2001-05-07   -0.627358
2001-05-08   -0.976409
2001-05-09    0.902168
2001-05-10   -0.385561
2001-05-11   -0.353236
2001-05-12    1.044096
2001-05-13   -0.160526
2001-05-14    0.116164
2001-05-15   -0.450665
2001-05-16    1.772062
2001-05-17   -0.193591
2001-05-18    0.206078
2001-05-19   -0.902546
2001-05-20   -0.590870
2001-05-21   -0.375577
2001-05-22   -0.054822
2001-05-23   -0.806033
2001-05-24   -1.041807
2001-05-25    1.205813
2001-05-26    1.779539
2001-05-27    0.210037
2001-05-28   -1.142534
2001-05-29   -0.769068
2001-05-30   -0.428615
2001-05-31   -0.808028
Freq: D, dtype: float64

In [47]:
ts[datetime(2011,1,7):]

2011-01-07    0.870192
2011-01-08    0.505934
2011-01-10   -0.920950
2011-01-12   -0.325004
dtype: float64

In [48]:
ts

2011-01-02    1.042713
2011-01-05   -1.927515
2011-01-07    0.870192
2011-01-08    0.505934
2011-01-10   -0.920950
2011-01-12   -0.325004
dtype: float64

In [49]:
ts['1/6/2011':'1/11/2011']

2011-01-07    0.870192
2011-01-08    0.505934
2011-01-10   -0.920950
dtype: float64

In [50]:
ts.truncate(after='1/9/2011')

2011-01-02    1.042713
2011-01-05   -1.927515
2011-01-07    0.870192
2011-01-08    0.505934
dtype: float64

In [51]:
dates = pd.date_range('1/1/2000',periods=100,freq='W-WED')

In [52]:
long_df = pd.DataFrame(np.random.randn(100,4),
                      index = dates,
                      columns = ['Colorado','Texas',
                                'New York', 'Ohio'])

In [53]:
long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,1.637074,0.339951,0.998392,0.170458
2001-05-09,-1.043174,1.026423,0.68425,1.199608
2001-05-16,-0.600783,1.753619,-2.372273,-0.965644
2001-05-23,0.34902,0.256409,-0.869648,0.402656
2001-05-30,0.355898,-0.810077,-1.17187,-0.001407


### Time Series with Duplicate Indices

In [54]:
dates = pd.DatetimeIndex(['1/1/2000','1/2/2000','1/2/2000',
                          '1/2/2000','1/3/2000'])

In [55]:
dup_ts = pd.Series(np.arange(5),index=dates)

In [56]:
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

In [57]:
dup_ts.index.is_unique

False

In [58]:
dup_ts['1/3/2000'] # not duplicated

4

In [59]:
dup_ts['1/2/2000'] # duplicated

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32

In [60]:
grouped = dup_ts.groupby(level=0)

In [61]:
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int32

In [62]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

## Date Ranges, Frequencies, and Shifting

In [63]:
ts

2011-01-02    1.042713
2011-01-05   -1.927515
2011-01-07    0.870192
2011-01-08    0.505934
2011-01-10   -0.920950
2011-01-12   -0.325004
dtype: float64

In [64]:
resampler = ts.resample('D') # 'D' stands for daily frequency

### Generating Data Ranges

In [65]:
index = pd.date_range('2012-04-01', '2012-06-01')

In [66]:
index

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20',
               '2012-04-21', '2012-04-22', '2012-04-23', '2012-04-24',
               '2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28',
               '2012-04-29', '2012-04-30', '2012-05-01', '2012-05-02',
               '2012-05-03', '2012-05-04', '2012-05-05', '2012-05-06',
               '2012-05-07', '2012-05-08', '2012-05-09', '2012-05-10',
               '2012-05-11', '2012-05-12', '2012-05-13', '2012-05-14',
               '2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18',
               '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22',
               '2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
      

In [67]:
pd.date_range(start='2012-04-01', periods=20)

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')

In [68]:
pd.date_range(start='2012-06-01', periods=20)

DatetimeIndex(['2012-06-01', '2012-06-02', '2012-06-03', '2012-06-04',
               '2012-06-05', '2012-06-06', '2012-06-07', '2012-06-08',
               '2012-06-09', '2012-06-10', '2012-06-11', '2012-06-12',
               '2012-06-13', '2012-06-14', '2012-06-15', '2012-06-16',
               '2012-06-17', '2012-06-18', '2012-06-19', '2012-06-20'],
              dtype='datetime64[ns]', freq='D')

In [69]:
pd.date_range('2000-01-01','2000-12-01',freq='BM')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BM')

In [70]:
pd.date_range('2012-05-02 12:56:31', periods=5)

DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
               '2012-05-04 12:56:31', '2012-05-05 12:56:31',
               '2012-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

In [71]:
pd.date_range('2012-05-02 12:56:31', periods=5, normalize=True)

DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
               '2012-05-06'],
              dtype='datetime64[ns]', freq='D')

### Frequencies and Data Offsets

In [73]:
b = 10

In [75]:
c = 20

In [77]:
e = 200

In [80]:
g = 404

In [84]:
v = 100

### Shifting (Leading and Lagging) Data

## Time Zone Handling

### Time Zone Localization and Conversion

### Operations with Time Zone - Aware Timestamp Objects

### Operations Between Different Time Zones

## Periods and Period Arithmetic

### Period Frequency Conversation

### Quarterly Period Frequencies

### Converting Timestamps to Periods (and Back)

### Creating a PeriodIndex from Arrays

## Resampling and Frequency Conversion

### Downsampling 

### Upsampling and Interpolation

### Resampling with Periods

## Moving Windows Functions

### Exponentially Weighted Functions

### Binary Moving Windows Functions

### User-Defined Moving Window Functions