#Chapter 10 - Time Series

+ `datetime()`
+ `.timedelta()`
+ `str.strftime('%Y-%m-%d')`, `datetime.strptime(x, '%d/%m/%Y)'` - parse string with known format to datetime
+ `dateutil.parser.parse.parse(str)` - parse string with unknow format
+ `pd.to_datetime(list of strs)` - parse standard formats, handles None as NaT
+ `pd.Series(value), index=DatetimeIndex)` - create time series
+ `ts.truncate(before='2003-01-01')`  - truncate series
+ `pd.date_range('start', periods=, freq=)` - generate index
    + pd.date_range('2001-01-01', periods=30, freq='WOM-3FRI')
+ `dup_ts.groupby(level=0)` - aggregate duplicate index data
+ `pd.date_range(start=, end=, periods=20, freq=, normalize=True)
+ `ts.shift(number, freq=, )` - shift time series
    + calculate percent changes in data: `ts/ts.shift(1)-1`
+ `offset=MonthEnd(); ts.groupby(offset.rollforward).mean()` - to get average by month
    + same as `ts.resample('M', how='mean')`
+ 

In [78]:
import pandas as pd
from datetime import datetime
from datetime import timedelta
from dateutil.parser import parse
import numpy as np
from pandas.tseries.offsets import Hour, Minute, Day, MonthEnd

##Convert to datetime

In [4]:
now = datetime.now()
now

datetime.datetime(2015, 6, 23, 13, 56, 37, 876000)

In [5]:
now.year, now.month, now.day

(2015, 6, 23)

In [7]:
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8,15)
delta

datetime.timedelta(926, 56700)

In [10]:
delta.days, delta.seconds

926

In [14]:
start =  datetime(2011, 1, 7)
start + timedelta(12), start - 2 * timedelta(12)

(datetime.datetime(2011, 1, 19, 0, 0), datetime.datetime(2010, 12, 14, 0, 0))

In [15]:
datestrs = ['7/6/2011', '8/6/2011']
[datetime.strptime(x, '%d/%m/%Y') for x in datestrs]

[datetime.datetime(2011, 6, 7, 0, 0), datetime.datetime(2011, 6, 8, 0, 0)]

##Basics:

In [22]:
dates = [datetime(2001,1,2), datetime(2001,1,5), datetime(2001,1,7), datetime(2001,1,8),datetime(2001,1,10), datetime(2001,1,12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts

2001-01-02   -0.043065
2001-01-05    0.795079
2001-01-07   -0.260652
2001-01-08    0.112792
2001-01-10    1.122197
2001-01-12   -1.011343
dtype: float64

In [24]:
ts + ts[::2]

2001-01-02   -0.086130
2001-01-05         NaN
2001-01-07   -0.521305
2001-01-08         NaN
2001-01-10    2.244394
2001-01-12         NaN
dtype: float64

Index of time series is `DatetimeIndex`, which are timestamps stored in `datetime64` type.  
Element of `DatetimeIndex` are `Timestamp` objects

###Indexing, slicing time series
+ could use datetime, strings of dates, or Timestamp, will automatically recognize

In [32]:
stamp = ts.index[2]
ts[stamp]

-0.26065248671688612

In [28]:
ts['1/10/2001'], ts['20010110']

(1.1221972398446791, 1.1221972398446791)

In [31]:
longer_ts = pd.Series(np.random.randn(1000),
                  index=pd.date_range('2001-01-01', periods=1000))

In [35]:
longer_ts['2001-12-25':'2001-12-31']

2001-12-25   -0.504424
2001-12-26   -0.541032
2001-12-27    1.302668
2001-12-28    0.872633
2001-12-29   -1.435331
2001-12-30    0.316514
2001-12-31   -0.948176
Freq: D, dtype: float64

In [47]:
longer_ts.truncate(before='2003-05-01')

2003-05-01    0.754168
2003-05-02   -1.712847
2003-05-03    0.510372
2003-05-04    2.047642
2003-05-05   -1.533124
2003-05-06    0.299380
2003-05-07   -0.108829
2003-05-08   -0.620412
2003-05-09    0.820070
2003-05-10    1.321178
2003-05-11   -0.474975
2003-05-12    0.808327
2003-05-13    1.465529
2003-05-14    1.824638
2003-05-15   -0.970271
...
2003-09-13    0.699672
2003-09-14    0.370905
2003-09-15   -2.954452
2003-09-16    0.018892
2003-09-17   -0.569614
2003-09-18   -1.659805
2003-09-19   -1.607561
2003-09-20    0.326318
2003-09-21    1.644668
2003-09-22    0.190087
2003-09-23    2.035348
2003-09-24    1.521530
2003-09-25    0.948924
2003-09-26    1.038703
2003-09-27   -1.363364
Freq: D, Length: 150

In [45]:
dates = pd.date_range('1/1/2000', periods=100, freq='W-WED')
long_df = pd.DataFrame(np.random.randn(100,4), index=dates, columns=['Colorado', 'Texas', 'New York', 'Ohio'])
long_df.ix['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,-1.161427,0.598292,-0.370993,0.498838
2001-05-09,-0.106573,1.036303,1.37703,2.515939
2001-05-16,-0.474813,-0.026353,1.128873,0.938924
2001-05-23,-0.42864,0.446975,0.983607,0.335281
2001-05-30,0.406225,0.11549,1.175637,0.619559


###Duplicate indices
+ check by `ts.index.is_unique`
+ `groupby('level=0')`

In [53]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', '1/2/2000', '1/2/2000'])
dup_tx = pd.Series(range(5), index=dates)

In [54]:
dup_tx

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-02    4
dtype: int64

In [58]:
dup_tx.index.is_unique

False

In [61]:
grouped = dup_tx.groupby(level=0) # aggregate data for non-unique timestamps
grouped.count()

2000-01-01    1
2000-01-02    4
dtype: int64

##Range, freq, shifting
+ `ts.resample(freq='', how=)`

In [63]:
ts, ts.resample('D')

(2001-01-02   -0.043065
 2001-01-05    0.795079
 2001-01-07   -0.260652
 2001-01-08    0.112792
 2001-01-10    1.122197
 2001-01-12   -1.011343
 dtype: float64, 2001-01-02   -0.043065
 2001-01-03         NaN
 2001-01-04         NaN
 2001-01-05    0.795079
 2001-01-06         NaN
 2001-01-07   -0.260652
 2001-01-08    0.112792
 2001-01-09         NaN
 2001-01-10    1.122197
 2001-01-11         NaN
 2001-01-12   -1.011343
 Freq: D, dtype: float64)

###generate date ranges

In [64]:
# generate time stamps
pd.date_range(start='20120104', periods=20)

<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-04, ..., 2012-01-23]
Length: 20, Freq: D, Timezone: None

In [66]:
# last business day of each month
pd.date_range('20120104', '20121231', freq='BM')

<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-31, ..., 2012-12-31]
Length: 12, Freq: BM, Timezone: None

###Frequencies and date offsets
+ see page 300 for a full list of frenquencies

In [70]:
hour = Hour(4)
hour

<4 * Hours>

In [72]:
Hour(5) + Minute(30)

<330 * Minutes>

In [71]:
pd.date_range('2001-01-01', periods=30, freq='4H')

<class 'pandas.tseries.index.DatetimeIndex'>
[2001-01-01 00:00:00, ..., 2001-01-05 20:00:00]
Length: 30, Freq: 4H, Timezone: None

In [73]:
pd.date_range('2001-01-01', periods=30, freq='1h30min')

<class 'pandas.tseries.index.DatetimeIndex'>
[2001-01-01 00:00:00, ..., 2001-01-02 19:30:00]
Length: 30, Freq: 90T, Timezone: None

In [74]:
# third Friday of each month
pd.date_range('2001-01-01', periods=30, freq='WOM-3FRI')

<class 'pandas.tseries.index.DatetimeIndex'>
[2001-01-19, ..., 2003-06-20]
Length: 30, Freq: WOM-3FRI, Timezone: None

###Shifting/lagging data

In [75]:
ts

2001-01-02   -0.043065
2001-01-05    0.795079
2001-01-07   -0.260652
2001-01-08    0.112792
2001-01-10    1.122197
2001-01-12   -1.011343
dtype: float64

In [76]:
ts.shift(1)

2001-01-02         NaN
2001-01-05   -0.043065
2001-01-07    0.795079
2001-01-08   -0.260652
2001-01-10    0.112792
2001-01-12    1.122197
dtype: float64

In [77]:
ts.shift(-2)

2001-01-02   -0.260652
2001-01-05    0.112792
2001-01-07    1.122197
2001-01-08   -1.011343
2001-01-10         NaN
2001-01-12         NaN
dtype: float64

In [81]:
now = datetime.now()
now + 3*Day()

Timestamp('2015-06-26 15:39:59.934000')

In [82]:
now + MonthEnd() # roll forward

Timestamp('2015-06-30 15:39:59.934000')

In [83]:
now + MonthEnd(2)

Timestamp('2015-07-31 15:39:59.934000')

In [87]:
offset = MonthEnd()
offset.rollforward(now), offset.rollback(now)

(Timestamp('2015-06-30 15:39:59.934000'),
 Timestamp('2015-05-31 15:39:59.934000'))

In [90]:
# example of getting montly average
ts = pd.Series(np.random.randn(20), index=pd.date_range('20000115', periods=20, freq='4d'))
ts.groupby(offset.rollforward).mean()

2000-01-31    0.103575
2000-02-29    0.237210
2000-03-31    0.314219
dtype: float64

In [91]:
ts.resample('M', how='mean')

2000-01-31    0.103575
2000-02-29    0.237210
2000-03-31    0.314219
Freq: M, dtype: float64

##Time zone 
