# Chapter 11. Time Series

Time series data is an important form of structured data in many different fields, such as finance, economics, ecology, neuroscience etc.

Many time series are *fixed frequency* meaning that the data points occur on regular intervals

They can also be *irregular* which means that the intervals dont follow a certain pattern. There are different classes of time series:

* Timestamsps - Instants in time
* Fixed periods, such as the month January 2007 or the full year of 2010
* Intervals of time, indicated by a start and end timestamp. 
* Experiment or elapsed time, each timestamp is a measure of time relative to a particular start time.

## Date and Time Data types and Tools

In python, we have a standard library for date and time data as well as calendar related functionality.

In [None]:
from datetime import datetime
now = datetime.now()
now

In [None]:
now.year

In [None]:
now.month

In [None]:
now.day

datetime objects stores both the date and the time down to microsecond precision. 

You can even do arithmetic on datetime objects

In [None]:
from datetime import timedelta
start = datetime(2011, 1, 7)
start + timedelta(2)

### Converting between string and datetime

In [None]:
stamp = datetime(2011, 1, 3)
str(stamp)

In [None]:
stamp.strftime('%Y-%m-%d')

In [None]:
value = '2011-01-03'
datetime.strptime(value, '%Y-%m-%d')

And if you dont want to specify format string

In [None]:
from dateutil.parser import parse
parse('2011-01-03')

In [None]:
parse('Jan 31, 1997 10:45 PM')

In [None]:
parse('6/12/2011 10:45 PM', dayfirst=True)

Pandas is generally oriented toward working with arrays of dates, whether used as an axis index or a column in a DataFrame. The *to_datetime* method parses many different kinds of date representations. Standard date formats like ISO 8601 can be parsed very quickly.

In [None]:
import pandas as pd
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']
pd.to_datetime(datestrs)

## Time Series Basics

A basic kind of time series object in pandas is a Series indexed by timestamps, which is often represented external to pandas as python strings or datetime objects:

In [None]:
from datetime import datetime
import numpy as np
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 1, 7), datetime(2011, 1, 8),
         datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts

Under the hood, these datetime objects have been put in a DateTimeIndex:

In [None]:
ts.index

Like other Series, arithmetic operations between differently indexed time series automatically align on these dates:

In [None]:
ts + ts[::2]

Recall that [::2] selects every second element in ts.

pandas stores timestamps using Numpy's datetime64 data type at the nanosecond resolution:

In [None]:
ts.index.dtype

Scalar values from a DatetimeIndex are pandas Timestamp objects:

In [None]:
stamp = ts.index[0]
stamp

### Indexing, Selection, Subsetting

Time series behaves like any other pandas.Series when yu are indexing and selecting data based on label:

In [None]:
stamp = ts.index[2]
stamp

As a convenience, you can also pass a string that is interpretable as a date:

In [None]:
ts[stamp]

In [None]:
ts['1/10/2011']

In [None]:
ts['20110110']

For longer time series, a year or only a year and a month can be passed to easily select slices of data:

In [None]:
longer_ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
longer_ts

In [None]:
longer_ts['2001']

In [None]:
longer_ts['2001-05']

Slicing with datetime objects works as well

In [None]:
ts[datetime(2011, 1, 7)]

Because most time series data is orered chronologically, you can slice with timestamps not contained in a time series to perform a range query:

In [None]:
ts['1/6/2011':'1/11/2011']

### Time series with duplicate indices

In [None]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', '1/3/2000'])
dates.is_unique

## Date Ranges, Frequencies and Shifting

In [None]:
ts

In [None]:
resampler = ts.resample('D')
resampler

The string 'D' is interpreted as daily frequency

### Generating Date Ranges

While i used it previously without explanation, pandas.data_range is responsible for generating a *DatetimeIndex* with an indicated length according to a particular frequency:

In [None]:
index = pd.date_range('2011-04-01', '2012-06-01')
index

By default, date-ranges generates daily timestamps. If you pass only a start or end date, you must pass a number of periods to generte:

In [None]:
pd.date_range(start='2012-04-01', periods=20)

In [None]:
pd.date_range(end='2012-06-01', periods=20)

The start and end dates define strict boundaries for the generated date index. For example, if you wanted a date index containing the last business day of each month, you would pass the 'BM' frequency (business end of month), and there are many more examples.

In [None]:
pd.date_range('2000-01-01', '2000-12-01', freq='BM')

### Frequencies and Date Offsets

Frequencies in pandas are composed of a *base frequency* and a multiplier. 

In [None]:
from pandas.tseries.offsets import Hour, Minute
hour = Hour()
hour

In [None]:
four_hours = Hour(4)
four_hours

In [None]:
pd.date_range('2000-01-01', '2000-01-03', freq='4h')


## Time Zone Handling

Working with time zones is generally considered one of the most unpleasant parts of time series manipulation. This is why many choose to work with time series in coordinated universal time or UTC which is the current international standard. 

In [None]:
import pytz
pytz.common_timezones[-5:]

To get a time zone object from pytz, use pytz.timezone:

In [None]:
tz = pytz.timezone('America/New_York')
tz

### Time Zone Localization and Conversion

In [None]:
rng = pd.date_range('3/9/2012 9:30', periods=6, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [None]:
print(ts.index.tz)

In [None]:
pd.date_range('3/9/2012 9:30', periods=10, freq='D', tz='UTC')