# Time Series

__Time series__ data is an important form of structured data in many different fields, such as finance, economics, ecology, neuroscience, and physics. Anything that is observed or measured at many points in time forms a time series. Many time series are _fixed frequency_, which is to say that data pints occur at regular intervals according to some rule, such as 15 seconds, every 5 minutes, or once per month. Time series can also be _irregular_ without a fixed unit of time or offset between the units. How you mark and refer to time series data depends on the application, and you may have one of the following:
- __Timestamps__, specific instants in time
- Fixed __periods__, such as the month January 2007 or the full year 2010
- __Intervals__ of time, indicated by a start and end timestamp. Periods can be thought of as specialcase of intervals
- Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time (e.g., the diameter of a cookie baking each second since being placed in the oven)

In this chapter, we are mainly concerned with time series in the first three categories, though many of the techniques can be applied to experimental time series where the index may be an integer or floating-point number indicating elpsed time for the start of the experiment. The simplest and most widely used kind of time series are those indexed by timestamp.

__pandas__ provides many built-in time series tools and data algorithms. You can efficiently work woth very large time series and easily slice and dice, aggregate, and resample irregular- and fixed-frequency time series. Some of these tools are espcially useful for financial and economics applications, but you could certainly use them to analyze server log data, too.

## Data and Time Data Types and Tools

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime

In [2]:
now = datetime.now()

In [3]:
now

datetime.datetime(2019, 3, 1, 19, 34, 48, 408581)

In [4]:
now.year, now.month, now.day

(2019, 3, 1)

In [5]:
delta = datetime(2011,1,7)-datetime(2008,6,24,8,15)

In [6]:
delta

datetime.timedelta(days=926, seconds=56700)

In [7]:
delta.days

926

In [8]:
delta.seconds

56700

In [9]:
from datetime import timedelta

In [10]:
start = datetime(2011,1,7)

In [11]:
start + timedelta(12)

datetime.datetime(2011, 1, 19, 0, 0)

In [12]:
start - 2*timedelta(12)

datetime.datetime(2010, 12, 14, 0, 0)

### Converting Between String and Tools

In [13]:
stamp = datetime(2011,1,3)

In [14]:
str(stamp)

'2011-01-03 00:00:00'

In [15]:
stamp.strftime('%Y-%m-%d')

'2011-01-03'

In [16]:
value = '2011-01-03'

In [17]:
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [18]:
datestrs = ['7/6/2011', '8/6/2011']

In [19]:
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

In [20]:
from dateutil.parser import parse

In [21]:
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [22]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

In [23]:
parse('6/12/2011', dayfirst = True)

datetime.datetime(2011, 12, 6, 0, 0)

In [24]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']

In [25]:
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

In [26]:
idx = pd.to_datetime(datestrs + [None])

In [27]:
idx

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [28]:
idx[2]

NaT

In [29]:
pd.isnull(idx)

array([False, False,  True])

## Time Series Basics

In [30]:
from datetime import datetime

In [31]:
dates = [datetime(2011,1,2), datetime(2011,1,5),
        datetime(2011,1,7), datetime(2011,1,8),
        datetime(2011,1,10),datetime(2011,1,12)]

In [32]:
ts = pd.Series(np.random.randn(6),index=dates)

In [33]:
ts

2011-01-02   -0.263592
2011-01-05   -0.412651
2011-01-07    1.029463
2011-01-08    0.093216
2011-01-10    0.012043
2011-01-12    0.503967
dtype: float64

In [34]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [35]:
ts + ts[::2]

2011-01-02   -0.527184
2011-01-05         NaN
2011-01-07    2.058925
2011-01-08         NaN
2011-01-10    0.024086
2011-01-12         NaN
dtype: float64

In [36]:
ts.index.dtype

dtype('<M8[ns]')

In [37]:
stamp = ts.index[0]

In [38]:
stamp

Timestamp('2011-01-02 00:00:00')

### Indexing, Selection, Subsetting

In [39]:
stamp = ts.index[2]

In [40]:
ts[stamp]

1.0294626948508836

In [41]:
ts['1/10/2011']

0.012043069988531883

In [42]:
ts['20110110']

0.012043069988531883

In [43]:
longer_ts = pd.Series(np.random.randn(1000),
                     index=pd.date_range('1/1/2000',periods=1000))

In [44]:
longer_ts

2000-01-01    0.199073
2000-01-02    1.541654
2000-01-03    0.270372
2000-01-04    1.606253
2000-01-05   -0.801031
2000-01-06   -1.650939
2000-01-07    1.524198
2000-01-08    0.497214
2000-01-09    2.162037
2000-01-10    1.507577
2000-01-11   -0.119016
2000-01-12   -0.624292
2000-01-13   -0.725993
2000-01-14   -0.308813
2000-01-15   -0.453193
2000-01-16    1.247035
2000-01-17    3.118662
2000-01-18   -0.981148
2000-01-19    0.704079
2000-01-20   -0.278298
2000-01-21   -0.096319
2000-01-22   -0.152645
2000-01-23    1.980533
2000-01-24   -0.628723
2000-01-25    1.080900
2000-01-26   -0.864998
2000-01-27    1.188557
2000-01-28   -0.957215
2000-01-29    1.322494
2000-01-30    0.396919
                ...   
2002-08-28    0.802664
2002-08-29    1.062716
2002-08-30   -0.432806
2002-08-31    0.379211
2002-09-01   -0.199776
2002-09-02   -0.447683
2002-09-03   -0.915710
2002-09-04   -0.555587
2002-09-05   -0.226994
2002-09-06    1.117431
2002-09-07    0.381202
2002-09-08   -0.665285
2002-09-09 

In [45]:
longer_ts['2001']

2001-01-01   -1.372470
2001-01-02   -1.509638
2001-01-03    0.960312
2001-01-04   -1.568060
2001-01-05    0.528985
2001-01-06   -1.019593
2001-01-07    1.073343
2001-01-08    0.117891
2001-01-09    1.803249
2001-01-10    1.286701
2001-01-11    0.576537
2001-01-12    0.446353
2001-01-13    0.050706
2001-01-14   -0.681204
2001-01-15   -0.996671
2001-01-16   -0.652486
2001-01-17   -0.709377
2001-01-18   -0.801832
2001-01-19    1.235291
2001-01-20    0.413256
2001-01-21   -0.321319
2001-01-22   -1.004816
2001-01-23    1.236760
2001-01-24   -1.245319
2001-01-25   -1.938672
2001-01-26   -0.270596
2001-01-27   -0.878425
2001-01-28    0.807693
2001-01-29   -0.539466
2001-01-30   -0.836142
                ...   
2001-12-02    1.029689
2001-12-03    1.441300
2001-12-04    0.823270
2001-12-05    2.341499
2001-12-06   -1.519987
2001-12-07    0.669108
2001-12-08    0.000131
2001-12-09   -0.287560
2001-12-10    1.268954
2001-12-11   -0.233950
2001-12-12   -0.714639
2001-12-13   -1.274379
2001-12-14 

In [46]:
longer_ts['2001-05']

2001-05-01   -0.627937
2001-05-02   -0.233530
2001-05-03   -1.092812
2001-05-04   -0.172810
2001-05-05   -0.023692
2001-05-06    0.457299
2001-05-07   -0.420721
2001-05-08    0.031248
2001-05-09    0.396053
2001-05-10    1.175583
2001-05-11   -0.410668
2001-05-12    1.987958
2001-05-13    1.330990
2001-05-14    0.571462
2001-05-15    1.050691
2001-05-16   -0.900724
2001-05-17    0.954252
2001-05-18   -1.099720
2001-05-19    0.805147
2001-05-20   -0.104270
2001-05-21    1.324309
2001-05-22    1.803031
2001-05-23   -1.144040
2001-05-24    0.848235
2001-05-25    0.188925
2001-05-26   -1.958653
2001-05-27    0.867416
2001-05-28    0.118966
2001-05-29    1.798508
2001-05-30   -0.051289
2001-05-31    1.080544
Freq: D, dtype: float64

In [47]:
ts[datetime(2011,1,7):]

2011-01-07    1.029463
2011-01-08    0.093216
2011-01-10    0.012043
2011-01-12    0.503967
dtype: float64

In [48]:
ts

2011-01-02   -0.263592
2011-01-05   -0.412651
2011-01-07    1.029463
2011-01-08    0.093216
2011-01-10    0.012043
2011-01-12    0.503967
dtype: float64

In [49]:
ts['1/6/2011':'1/11/2011']

2011-01-07    1.029463
2011-01-08    0.093216
2011-01-10    0.012043
dtype: float64

In [50]:
ts.truncate(after='1/9/2011')

2011-01-02   -0.263592
2011-01-05   -0.412651
2011-01-07    1.029463
2011-01-08    0.093216
dtype: float64

In [51]:
dates = pd.date_range('1/1/2000',periods=100,freq='W-WED')

In [52]:
long_df = pd.DataFrame(np.random.randn(100,4),
                      index = dates,
                      columns = ['Colorado','Texas',
                                'New York', 'Ohio'])

In [53]:
long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,1.162498,1.100713,0.970175,-0.520717
2001-05-09,-1.224785,-0.400813,0.887109,-0.648828
2001-05-16,0.718742,0.492655,0.192561,-0.824008
2001-05-23,1.948554,1.485494,0.265991,0.239793
2001-05-30,0.737143,-0.427833,-0.648937,-0.899527


### Time Series with Duplicate Indices

In [54]:
dates = pd.DatetimeIndex(['1/1/2000','1/2/2000','1/2/2000',
                          '1/2/2000','1/3/2000'])

In [55]:
dup_ts = pd.Series(np.arange(5),index=dates)

In [56]:
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

In [57]:
dup_ts.index.is_unique

False

In [58]:
dup_ts['1/3/2000'] # not duplicated

4

In [59]:
dup_ts['1/2/2000'] # duplicated

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32

In [60]:
grouped = dup_ts.groupby(level=0)

In [61]:
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int32

In [62]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

## Date Ranges, Frequencies, and Shifting

### Generating Data Ranges

### Frequencies and Data Offsets

### Shifting (Leading and Lagging) Data

## Time Zone Handling

### Time Zone Localization and Conversion

### Operations with Time Zone - Aware Timestamp Objects

### Operations Between Different Time Zones

## Periods and Period Arithmetic

### Period Frequency Conversation

### Quarterly Period Frequencies

### Converting Timestamps to Periods (and Back)

### Creating a PeriodIndex from Arrays

## Resampling and Frequency Conversion

### Downsampling 

### Upsampling and Interpolation

### Resampling with Periods

## Moving Windows Functions

### Exponentially Weighted Functions

### Binary Moving Windows Functions

### User-Defined Moving Window Functions