# Chapter 11 - Time Series

## 11.2 - Time Series Basics

In [1]:
from datetime import datetime as dt
from dateutil.parser import parse
import pandas as pd

The basic time series object in `pandas` is a `Series` index.

In [2]:
qs1 = []
for i in range(1,13, 3):
    t = dt(2019, i, 1)
    qs1.append(t)
display(qs1)
s1 = pd.Series([10,20,30,40], index=qs1)
display(s1)

[datetime.datetime(2019, 1, 1, 0, 0),
 datetime.datetime(2019, 4, 1, 0, 0),
 datetime.datetime(2019, 7, 1, 0, 0),
 datetime.datetime(2019, 10, 1, 0, 0)]

2019-01-01    10
2019-04-01    20
2019-07-01    30
2019-10-01    40
dtype: int64

Under the hood, the `datetime` objects are stored as a `DateTimeIndex`.

In [3]:
display(s1.index)
print(type(s1.index))

DatetimeIndex(['2019-01-01', '2019-04-01', '2019-07-01', '2019-10-01'], dtype='datetime64[ns]', freq=None)

<class 'pandas.core.indexes.datetimes.DatetimeIndex'>


Like other `Series`, arithmetic operations on two `Series` objects are aligned on the dates.

In [4]:
qs2 = [parse('2019-01-01'), parse('2019-04-01'), parse('2019-07-01')]
s2 = pd.Series([25,45,65], index=qs2)
display(s2)
display(s1+s2) # There is no value for 2019-10-01 so resultant of sum is NaN

2019-01-01    25
2019-04-01    45
2019-07-01    65
dtype: int64

2019-01-01    35.0
2019-04-01    65.0
2019-07-01    95.0
2019-10-01     NaN
dtype: float64

You can use the date as the index to pull out its value. Different date formats are permitted.

In [5]:
print(s2['20190101'])
print(s2['07/01/2019'])

25
65


Use `pd.date_range` to automatically generate dates with a start date and number of periods.

In [6]:
d1 = pd.Series(range(0,20), pd.date_range('2018/07/01', periods=20))
display(d1.head(10))

d2 = pd.Series(range(0,400), pd.date_range('20190701', periods=400))
display(d2.head(10))

2018-07-01    0
2018-07-02    1
2018-07-03    2
2018-07-04    3
2018-07-05    4
2018-07-06    5
2018-07-07    6
2018-07-08    7
2018-07-09    8
2018-07-10    9
Freq: D, dtype: int64

2019-07-01    0
2019-07-02    1
2019-07-03    2
2019-07-04    3
2019-07-05    4
2019-07-06    5
2019-07-07    6
2019-07-08    7
2019-07-09    8
2019-07-10    9
Freq: D, dtype: int64

Now, filtering can be performed on the `DateTimeIndex`.

In [7]:
display(d2['2019'].iloc[:5]) # Filter by year
display(d2['2019-09'].iloc[:5]) # Filter by year & month
display(d2[dt(2019,10, 3):].iloc[:5]) # Filter by datetime

2019-07-01    0
2019-07-02    1
2019-07-03    2
2019-07-04    3
2019-07-05    4
Freq: D, dtype: int64

2019-09-01    62
2019-09-02    63
2019-09-03    64
2019-09-04    65
2019-09-05    66
Freq: D, dtype: int64

2019-10-03    94
2019-10-04    95
2019-10-05    96
2019-10-06    97
2019-10-07    98
Freq: D, dtype: int64

Filtering can also be done using truncating

In [8]:
# Truncate (remove) all values before 1 Aug 2019
d2.truncate(before='2019-08-01').iloc[:10]

2019-08-01    31
2019-08-02    32
2019-08-03    33
2019-08-04    34
2019-08-05    35
2019-08-06    36
2019-08-07    37
2019-08-08    38
2019-08-09    39
2019-08-10    40
Freq: D, dtype: int64

In [9]:
# Truncate (Remove) all values after 5 Jul
d2.truncate(after='2019-07-05')

2019-07-01    0
2019-07-02    1
2019-07-03    2
2019-07-04    3
2019-07-05    4
Freq: D, dtype: int64

**References:**

Python for Data Analysis, 2nd Edition, McKinney (2017)