# Chapter 11: Time Series

In [1]:
import numpy as np
import pandas as pd

Anything that is observed or measured at many points in time forms a time series.

## 11.1 Date and Time Data Types and Tools

Python's standard library includes data types for date and time data, as well as calendar functionality.

`datetime`, `time`, and `calendar` modules are the main places to start.

In [2]:
from datetime import datetime

In [3]:
now = datetime.now()
now

datetime.datetime(2021, 2, 8, 23, 7, 46, 243732)

In [4]:
now.year, now.month, now.day

(2021, 2, 8)

`datetime` stores both date and time down to the microsecond.

`timedelta` represents the temporal difference between two `datetime` objects.

In [5]:
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
delta

datetime.timedelta(days=926, seconds=56700)

In [6]:
delta.days

926

In [7]:
delta.seconds

56700

In [8]:
from datetime import timedelta

In [9]:
start = datetime(2011, 1, 7)
start + timedelta(12)

datetime.datetime(2011, 1, 19, 0, 0)

In [10]:
start - 2*timedelta(12)

datetime.datetime(2010, 12, 14, 0, 0)

### 11.1.1 Converting Between String and Datetime

You can format `datetime` objects and pandas `Timestamp` objects as strings using `str` or `strftime` method.

In [11]:
stamp = datetime(2011, 1, 3)
str(stamp)

'2011-01-03 00:00:00'

In [12]:
stamp.strftime('%Y-%m-%d')

'2011-01-03'

You can convert many of the same format code strings to dates using `datetime.strptime`.

In [13]:
value = '2011-02-03'
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 2, 3, 0, 0)

In [14]:
datestrs = ['7/6/2011', '8/6/2011']
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

`datetime.strptime` is a good way to parse a date with a known format. Use `parser.parse` from `dateutil` package to parse the format spec for you.

In [15]:
from dateutil.parser import parse

In [16]:
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [18]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

In [19]:
parse('6/12/2011', dayfirst=True)

datetime.datetime(2011, 12, 6, 0, 0)

panda's `to_datetime` method parses many different kinds of date representations.

In [21]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

In [22]:
idx = pd.to_datetime(datestrs + [None])
idx

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [23]:
idx[2]

NaT

In [24]:
pd.isnull(idx)

array([False, False,  True])

> Note: `dateutil.parser` is a useful but imperfect tool. It will recognize some strings as dates that you might prefer that it didn't. `'42'` => year `2042`.

## 11.2 Time Series Basics

In [3]:
from datetime import datetime

In [7]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
          datetime(2011, 1, 7), datetime(2011, 1, 8),
          datetime(2011, 1, 10), datetime(2011, 1, 12)]

ts = pd.Series(np.random.randn(6), index=dates)
ts

2011-01-02   -0.842383
2011-01-05   -0.224334
2011-01-07    0.545853
2011-01-08   -0.533138
2011-01-10   -0.848272
2011-01-12    0.525371
dtype: float64

In [8]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [9]:
ts + ts[::2]

2011-01-02   -1.684766
2011-01-05         NaN
2011-01-07    1.091705
2011-01-08         NaN
2011-01-10   -1.696544
2011-01-12         NaN
dtype: float64

> Recall: `ts[::2]` selects every second element in `ts`.

In [10]:
ts.index.dtype

dtype('<M8[ns]')

In [12]:
stamp = ts.index[0]
stamp

Timestamp('2011-01-02 00:00:00')

### 11.2.1 Indexing, Selection, Subsetting

Time series behaves like any other `pandas.Series` when indexing and label selection.

In [13]:
stamp = ts.index[2]
ts[stamp]

0.5458526310764886

In [14]:
ts['1/10/2011']

-0.8482719585121435

In [15]:
ts['20110110']

-0.8482719585121435

In [16]:
longer_ts = pd.Series(np.random.randn(1000),
                      index=pd.date_range('1/1/2000', periods=1000))
longer_ts

2000-01-01    0.653364
2000-01-02    1.964019
2000-01-03   -1.621066
2000-01-04    0.058644
2000-01-05   -1.530261
                ...   
2002-09-22   -0.067601
2002-09-23    1.646673
2002-09-24    1.873234
2002-09-25   -1.000796
2002-09-26   -1.512838
Freq: D, Length: 1000, dtype: float64

In [17]:
longer_ts['2001']

2001-01-01   -0.459880
2001-01-02    0.451819
2001-01-03   -0.063274
2001-01-04   -1.524734
2001-01-05    1.448305
                ...   
2001-12-27   -0.474549
2001-12-28    0.071386
2001-12-29    0.857609
2001-12-30   -0.113901
2001-12-31    0.803113
Freq: D, Length: 365, dtype: float64

The string `'2001'` is interpreted as a year and selects that time period. This also works if you specify the month.

In [18]:
longer_ts['2001-05']

2001-05-01    1.506914
2001-05-02    1.498434
2001-05-03    0.139040
2001-05-04    0.634580
2001-05-05   -1.537884
2001-05-06    1.058511
2001-05-07    1.352701
2001-05-08    1.382270
2001-05-09    0.756594
2001-05-10    1.023554
2001-05-11   -0.459984
2001-05-12   -0.187871
2001-05-13    1.155889
2001-05-14    0.924846
2001-05-15   -0.458487
2001-05-16    0.611716
2001-05-17   -1.352900
2001-05-18    0.013168
2001-05-19    0.448105
2001-05-20   -1.030279
2001-05-21   -0.299670
2001-05-22   -1.067690
2001-05-23    1.251145
2001-05-24    0.169525
2001-05-25    1.661231
2001-05-26   -1.414122
2001-05-27    1.882981
2001-05-28   -0.356259
2001-05-29   -0.230481
2001-05-30    0.955956
2001-05-31   -0.660353
Freq: D, dtype: float64

In [20]:
ts[datetime(2011, 1, 7):]

2011-01-07    0.545853
2011-01-08   -0.533138
2011-01-10   -0.848272
2011-01-12    0.525371
dtype: float64

In [21]:
ts

2011-01-02   -0.842383
2011-01-05   -0.224334
2011-01-07    0.545853
2011-01-08   -0.533138
2011-01-10   -0.848272
2011-01-12    0.525371
dtype: float64

In [22]:
ts['1/6/2011': '1/11/2011']

2011-01-07    0.545853
2011-01-08   -0.533138
2011-01-10   -0.848272
dtype: float64

> Note: Slicing in this manner produces views on the source time series like slicing NumPy arrays. This means that no data is copied and modifications on the slice will be reflected in the original data.

`truncate` also slices a Series between two dates.

In [23]:
ts.truncate(after='1/9/2011')

2011-01-02   -0.842383
2011-01-05   -0.224334
2011-01-07    0.545853
2011-01-08   -0.533138
dtype: float64

In [24]:
dates = pd.date_range('1/1/2000', periods=100, freq='W-WED')
long_df = pd.DataFrame(np.random.randn(100, 4),
                       index=dates,
                       columns=['Colorado', 'Texas',
                                'New York', 'Ohio'])

long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,0.862239,0.822899,-0.402528,0.171566
2001-05-09,0.290595,-1.457295,-0.228501,-0.703747
2001-05-16,-0.279027,-0.602307,0.831306,0.0093
2001-05-23,-1.29536,0.43271,0.540808,0.246159
2001-05-30,-0.149678,0.209044,-0.512423,-0.329186


### 11.2.2 Time Series with Duplicate Indices

In [25]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000',
                          '1/2/2000', '1/3/2000'])
dup_ts = pd.Series(np.arange(5), index=dates)
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

We can check whether the index is unique with `is_unique` property.

In [26]:
dup_ts.index.is_unique

False

In [27]:
dup_ts['1/3/2000'] # not duplicated

4

In [28]:
dup_ts['1/2/2000'] # duplicated

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32

To aggregate the date with non-unique timestamps, use `groupby` and pass `level=0`.

In [29]:
grouped = dup_ts.groupby(level=0)
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int32

In [30]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

## 11.3 Date Ranges, Frequencies, and Shifting

### 11.3.1 Generating Date Ranges

### 11.3.2 Frequencies and Date Offsets

### 11.3.3 Shifting (Leading and Lagging) Data

## 11.4 Time Zone Handling

### 11.4.1 Time Zone Localization and Conversion

### 11.4.2 Operations with Time Zone-Aware Timestamp Objects

### 11.4.3 Operations Between Different Time Zones

## 11.5 Periods and Period Arithmetic

### 11.5.1 Period Frequency Conversion

### 11.5.2 Quarterly Period Frequencies

### 11.5.3 Converting Timestamps to Periods (and Back)

### 11.5.4 Creating a PeriodIndex from Arrays

## 11.6 Resampling and Frequency Conversion

### 11.6.1 Downsampling

### 11.6.2 Upsampling and Interpolation

### 11.6.3 Resampling with Periods

## 11.7 Moving Window Functions

### 11.7.1 Exponentially Weighted Functions

### 11.7.2 Binary Moving Window Functions

### 11.7.3 User-Defined Moving Window Functions

## 11.8 Conclusion