Types of time series data:
- *Timestamps*, specific instants in time
- Fixed *periods*, such as the month January 2007 or the full year 2010
- *Intervals* of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals
- Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time.

## 1 Date and Time Data Types and Tools

Python standard library: `datetime`, `time` and `calendar` modules

In [39]:
import numpy as np

In [31]:
import pandas as pd

In [2]:
from datetime import datetime

In [3]:
now = datetime.now()

In [4]:
now

datetime.datetime(2017, 11, 12, 20, 56, 51, 590041)

In [5]:
now.year, now.month, now.day

(2017, 11, 12)

`datetime` stores both the date and time down to the microseconds. `timedelta` represents the temporal difference between two `datetime` object

In [6]:
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)

In [7]:
delta

datetime.timedelta(926, 56700)

In [8]:
delta.days

926

In [9]:
delta.seconds

56700

In [11]:
from datetime import timedelta

In [12]:
start = datetime(2011, 1, 7)

In [13]:
start + timedelta(12)

datetime.datetime(2011, 1, 19, 0, 0)

In [15]:
start - 2 * timedelta(12)

datetime.datetime(2010, 12, 14, 0, 0)

<img src='img/11_1_1.png'>

### 1.1 Converting Between String and Datetime

Format datetime objects using `str` or the `strftime` method

In [16]:
stamp = datetime(2011, 1, 3)

In [17]:
str(stamp)

'2011-01-03 00:00:00'

In [18]:
stamp.strftime('%Y-%m-%d')

'2011-01-03'

<img src='img/11_1_2.png'>

These same format codes can be used to convert strings to dates with `datetime.strptime`

In [19]:
value = '2011-01-03'

In [20]:
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [23]:
datestrs = ['7/6/2011', '8/6/2011']

In [24]:
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

Using the `parser.parse` method of the `dateutil` package to convert string to datetime

In [25]:
from dateutil.parser import parse

In [26]:
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [27]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

Passing `dayfirst=True` if day appearing before month

In [28]:
parse('6/12/2011', dayfirst=True)

datetime.datetime(2011, 12, 6, 0, 0)

pandas is generally oriented toward working with arrays of dates. The `to_datetime` method parses many different kinds of date representations

In [29]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']

In [32]:
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

It also handles missing values

In [33]:
idx = pd.to_datetime(datestrs + [None])

In [34]:
idx

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [35]:
pd.isnull(idx)

array([False, False,  True], dtype=bool)

In [36]:
?pd.to_datetime

## 2 Time Series Basics

A basic kind of time series object in pandas is a Series indexed by timestamps

In [37]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 1, 7), datetime(2011, 1, 8),
         datetime(2011, 1, 10), datetime(2011, 1, 12)]

In [40]:
ts = pd.Series(np.random.randn(6), index=dates)

In [41]:
ts

2011-01-02    0.617746
2011-01-05    0.371125
2011-01-07   -1.211203
2011-01-08   -0.608775
2011-01-10   -1.701659
2011-01-12   -0.720548
dtype: float64

These datetime objects have been put in a DatatimeIndex

In [42]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

Like other Series, arithmetic operations automatically align on the dates

In [43]:
ts + ts[::2]

2011-01-02    1.235493
2011-01-05         NaN
2011-01-07   -2.422405
2011-01-08         NaN
2011-01-10   -3.403317
2011-01-12         NaN
dtype: float64

pandas  stores timestamps using NumPy's `datetime64` data type at the nanosecond resolution

In [44]:
ts.index.dtype

dtype('<M8[ns]')

Scalar values from a DatetimeIndex are pandas Timestamp objects

In [45]:
ts.index[0]

Timestamp('2011-01-02 00:00:00')

A `Timestamp` can be substituted anywhere you would use a `datetime` object. Additionally, it has abilities of storing frequency information and dealing with time zone conversions and etc.

### 2.1 Indexing, Selection, Subsetting

In [46]:
stamp = ts.index[2]

In [47]:
ts[stamp]

-1.2112025628444267

as a convenience, you can also pass a string that is interpretable as a date

In [48]:
ts['1/10/2011']

-1.7016586485065794

In [49]:
ts['20110110']

-1.7016586485065794

For longer time series, a year or a year with month can be passed to select slices of data

In [50]:
longer_ts = pd.Series(np.random.randn(1000),
                      index=pd.date_range('1/1/2000', periods=1000))

In [51]:
longer_ts.head()

2000-01-01   -2.975876
2000-01-02    0.016668
2000-01-03    0.763588
2000-01-04    1.394561
2000-01-05    1.564448
Freq: D, dtype: float64

In [53]:
longer_ts['2001'].head()

2001-01-01    0.299106
2001-01-02    0.853992
2001-01-03    1.612081
2001-01-04    0.952879
2001-01-05   -1.561792
Freq: D, dtype: float64

Here, the string '2001' is interpreted as a year.

In [54]:
longer_ts['2001-05']

2001-05-01    0.065944
2001-05-02   -1.023276
2001-05-03   -0.051203
2001-05-04   -0.608367
2001-05-05    1.407096
2001-05-06    0.002727
2001-05-07    0.612694
2001-05-08   -1.345624
2001-05-09    0.452837
2001-05-10    0.501222
2001-05-11    2.181655
2001-05-12    1.065940
2001-05-13   -0.632370
2001-05-14   -1.577974
2001-05-15   -0.081655
2001-05-16    0.264279
2001-05-17    0.607758
2001-05-18   -0.004284
2001-05-19    0.311442
2001-05-20   -1.423109
2001-05-21    0.856808
2001-05-22    0.490765
2001-05-23   -0.400072
2001-05-24   -1.790300
2001-05-25    0.896075
2001-05-26    0.496587
2001-05-27    1.864134
2001-05-28   -1.820402
2001-05-29    0.928935
2001-05-30    0.004858
2001-05-31    0.676150
Freq: D, dtype: float64

Slicing with `datetime` objects works as well

In [56]:
ts[datetime(2011, 1, 7):]

2011-01-07   -1.211203
2011-01-08   -0.608775
2011-01-10   -1.701659
2011-01-12   -0.720548
dtype: float64

Because most time series data is ordered chronologically, you can slice with timestamps not contained in a time series to perform a range query

In [57]:
ts

2011-01-02    0.617746
2011-01-05    0.371125
2011-01-07   -1.211203
2011-01-08   -0.608775
2011-01-10   -1.701659
2011-01-12   -0.720548
dtype: float64

In [58]:
ts['1/6/2011':'1/11/2011']

2011-01-07   -1.211203
2011-01-08   -0.608775
2011-01-10   -1.701659
dtype: float64

The `truncate` method slices a Series between two dates

In [59]:
ts.truncate(after='1/9/2011')

2011-01-02    0.617746
2011-01-05    0.371125
2011-01-07   -1.211203
2011-01-08   -0.608775
dtype: float64

All of this holds true for DataFrame as well

In [60]:
dates = pd.date_range('1/1/2000', periods=100, freq='W-WED')

In [63]:
long_df = pd.DataFrame(np.random.randn(100, 4),
                       index=dates,
                       columns=['Colorado', 'Texas', 'New York', 'Ohio'])

In [64]:
long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,-0.588016,1.993915,1.526385,1.336923
2001-05-09,-1.350219,1.886024,0.018154,1.135339
2001-05-16,-0.611997,1.853333,-0.296126,0.66062
2001-05-23,-1.099791,0.266143,0.105998,-1.292795
2001-05-30,0.584104,0.929688,-1.240518,0.976999


### 2.2 Time Series with Duplicate Indices

In [65]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000',
                          '1/2/2000', '1/3/2000'])

In [66]:
dup_ts = pd.Series(np.arange(5), index=dates)

In [67]:
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int64

In [69]:
dup_ts.index.is_unique

False

Indexing into this time series will now either produce scalar values or slices depending on whether a timestamp is duplicated

In [70]:
dup_ts['1/3/2000']

4

In [71]:
dup_ts['1/2/2000']

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int64

Aggregate the data having non-unique timestamps

In [74]:
grouped = dup_ts.groupby(level=0)

In [75]:
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int64

In [76]:
dup_ts.mean(level=0)

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int64

## 3 Date Range, Frequencies, and Shifting