# Time Series

__Time series__ data is an important form of structured data in many different fields, such as finance, economics, ecology, neuroscience, and physics. Anything that is observed or measured at many points in time forms a time series. Many time series are _fixed frequency_, which is to say that data pints occur at regular intervals according to some rule, such as 15 seconds, every 5 minutes, or once per month. Time series can also be _irregular_ without a fixed unit of time or offset between the units. How you mark and refer to time series data depends on the application, and you may have one of the following:
- __Timestamps__, specific instants in time
- Fixed __periods__, such as the month January 2007 or the full year 2010
- __Intervals__ of time, indicated by a start and end timestamp. Periods can be thought of as specialcase of intervals
- Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time (e.g., the diameter of a cookie baking each second since being placed in the oven)

In this chapter, we are mainly concerned with time series in the first three categories, though many of the techniques can be applied to experimental time series where the index may be an integer or floating-point number indicating elpsed time for the start of the experiment. The simplest and most widely used kind of time series are those indexed by timestamp.

__pandas__ provides many built-in time series tools and data algorithms. You can efficiently work woth very large time series and easily slice and dice, aggregate, and resample irregular- and fixed-frequency time series. Some of these tools are espcially useful for financial and economics applications, but you could certainly use them to analyze server log data, too.

## Data and Time Data Types and Tools

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime

In [2]:
now = datetime.now()

In [3]:
now

datetime.datetime(2019, 5, 30, 23, 48, 3, 291705)

In [4]:
now.year, now.month, now.day

(2019, 5, 30)

In [5]:
delta = datetime(2011,1,7)-datetime(2008,6,24,8,15)

In [6]:
delta

datetime.timedelta(days=926, seconds=56700)

In [7]:
delta.days

926

In [8]:
delta.seconds

56700

In [9]:
from datetime import timedelta

In [10]:
start = datetime(2011,1,7)

In [11]:
start + timedelta(12)

datetime.datetime(2011, 1, 19, 0, 0)

In [12]:
start - 2*timedelta(12)

datetime.datetime(2010, 12, 14, 0, 0)

### Converting Between String and Tools

In [13]:
stamp = datetime(2011,1,3)

In [14]:
str(stamp)

'2011-01-03 00:00:00'

In [15]:
stamp.strftime('%Y-%m-%d')

'2011-01-03'

In [16]:
value = '2011-01-03'

In [17]:
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [18]:
datestrs = ['7/6/2011', '8/6/2011']

In [19]:
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

In [20]:
from dateutil.parser import parse

In [21]:
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [22]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

In [23]:
parse('6/12/2011', dayfirst = True)

datetime.datetime(2011, 12, 6, 0, 0)

In [24]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']

In [25]:
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

In [26]:
idx = pd.to_datetime(datestrs + [None])

In [27]:
idx

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [28]:
idx[2]

NaT

In [29]:
pd.isnull(idx)

array([False, False,  True])

## Time Series Basics

In [30]:
from datetime import datetime

In [31]:
dates = [datetime(2011,1,2), datetime(2011,1,5),
        datetime(2011,1,7), datetime(2011,1,8),
        datetime(2011,1,10),datetime(2011,1,12)]

In [32]:
ts = pd.Series(np.random.randn(6),index=dates)

In [33]:
ts

2011-01-02    0.083201
2011-01-05    2.015480
2011-01-07    1.207525
2011-01-08    1.380777
2011-01-10    0.051261
2011-01-12    0.420388
dtype: float64

In [34]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [35]:
ts + ts[::2]

2011-01-02    0.166401
2011-01-05         NaN
2011-01-07    2.415050
2011-01-08         NaN
2011-01-10    0.102523
2011-01-12         NaN
dtype: float64

In [36]:
ts.index.dtype

dtype('<M8[ns]')

In [37]:
stamp = ts.index[0]

In [38]:
stamp

Timestamp('2011-01-02 00:00:00')

### Indexing, Selection, Subsetting

In [39]:
stamp = ts.index[2]

In [40]:
ts[stamp]

1.2075251390607997

In [41]:
ts['1/10/2011']

0.05126126101841497

In [42]:
ts['20110110']

0.05126126101841497

In [43]:
longer_ts = pd.Series(np.random.randn(1000),
                     index=pd.date_range('1/1/2000',periods=1000))

In [44]:
longer_ts

2000-01-01    0.380901
2000-01-02    0.815354
2000-01-03    0.518786
2000-01-04    0.057145
2000-01-05   -1.900680
2000-01-06    0.190879
2000-01-07    0.204887
2000-01-08   -1.184638
2000-01-09    0.641840
2000-01-10    0.198893
2000-01-11   -1.626600
2000-01-12   -0.261942
2000-01-13    0.443657
2000-01-14   -0.276638
2000-01-15   -0.715496
2000-01-16   -0.663466
2000-01-17    0.378531
2000-01-18   -0.764111
2000-01-19   -1.573363
2000-01-20   -1.044214
2000-01-21   -1.089825
2000-01-22    0.865118
2000-01-23   -0.291791
2000-01-24   -0.164569
2000-01-25    0.032648
2000-01-26    0.879288
2000-01-27   -0.770257
2000-01-28   -0.261910
2000-01-29   -0.721101
2000-01-30    0.145310
                ...   
2002-08-28    0.171116
2002-08-29    0.466277
2002-08-30   -0.151500
2002-08-31   -1.230951
2002-09-01    0.144857
2002-09-02   -0.748481
2002-09-03    2.085608
2002-09-04   -1.659756
2002-09-05    0.843657
2002-09-06   -3.253305
2002-09-07   -0.571433
2002-09-08   -1.256772
2002-09-09 

In [45]:
longer_ts['2001']

2001-01-01    0.506503
2001-01-02   -0.744509
2001-01-03    1.984584
2001-01-04   -0.526758
2001-01-05    1.709327
2001-01-06    0.377297
2001-01-07    1.144638
2001-01-08   -0.154548
2001-01-09    0.686966
2001-01-10    0.414021
2001-01-11    0.265489
2001-01-12   -1.263688
2001-01-13    1.190106
2001-01-14    0.452605
2001-01-15   -0.358630
2001-01-16   -1.490408
2001-01-17   -0.181962
2001-01-18   -0.530429
2001-01-19   -1.518637
2001-01-20   -0.930847
2001-01-21    1.069738
2001-01-22   -0.357395
2001-01-23    2.153402
2001-01-24    0.291204
2001-01-25    0.935571
2001-01-26   -0.536475
2001-01-27   -0.579702
2001-01-28    0.192151
2001-01-29   -1.166822
2001-01-30   -0.212321
                ...   
2001-12-02    0.293326
2001-12-03   -0.423171
2001-12-04   -0.268857
2001-12-05   -0.538233
2001-12-06    0.703652
2001-12-07    1.383348
2001-12-08    0.451323
2001-12-09    1.195023
2001-12-10   -0.438846
2001-12-11   -0.926899
2001-12-12   -0.571068
2001-12-13   -1.359516
2001-12-14 

In [46]:
longer_ts['2001-05']

2001-05-01   -0.282573
2001-05-02   -1.575455
2001-05-03   -1.324329
2001-05-04   -0.419613
2001-05-05    0.079803
2001-05-06    1.835884
2001-05-07    0.248188
2001-05-08   -0.020197
2001-05-09    0.911450
2001-05-10    2.059626
2001-05-11    0.702898
2001-05-12   -1.120745
2001-05-13    0.199103
2001-05-14    0.540757
2001-05-15   -0.105552
2001-05-16   -0.013306
2001-05-17   -2.755031
2001-05-18   -0.061007
2001-05-19   -2.044479
2001-05-20   -1.158419
2001-05-21    2.327863
2001-05-22   -0.139606
2001-05-23    0.004856
2001-05-24    1.672599
2001-05-25    0.343540
2001-05-26   -1.014236
2001-05-27    0.274679
2001-05-28   -0.445080
2001-05-29    0.819416
2001-05-30    1.218907
2001-05-31   -0.259445
Freq: D, dtype: float64

In [47]:
ts[datetime(2011,1,7):]

2011-01-07    1.207525
2011-01-08    1.380777
2011-01-10    0.051261
2011-01-12    0.420388
dtype: float64

In [48]:
ts

2011-01-02    0.083201
2011-01-05    2.015480
2011-01-07    1.207525
2011-01-08    1.380777
2011-01-10    0.051261
2011-01-12    0.420388
dtype: float64

In [49]:
ts['1/6/2011':'1/11/2011']

2011-01-07    1.207525
2011-01-08    1.380777
2011-01-10    0.051261
dtype: float64

In [50]:
ts.truncate(after='1/9/2011')

2011-01-02    0.083201
2011-01-05    2.015480
2011-01-07    1.207525
2011-01-08    1.380777
dtype: float64

In [51]:
dates = pd.date_range('1/1/2000',periods=100,freq='W-WED')

In [52]:
long_df = pd.DataFrame(np.random.randn(100,4),
                      index = dates,
                      columns = ['Colorado','Texas',
                                'New York', 'Ohio'])

In [53]:
long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,-1.110028,-0.329523,-0.076378,-0.080568
2001-05-09,-0.434483,-0.142737,0.065441,1.288886
2001-05-16,-0.003578,-0.356801,0.233889,0.556696
2001-05-23,1.627757,0.771354,-0.56586,-0.220472
2001-05-30,-0.816415,-0.392643,2.292168,-1.105914


### Time Series with Duplicate Indices

In [54]:
dates = pd.DatetimeIndex(['1/1/2000','1/2/2000','1/2/2000',
                          '1/2/2000','1/3/2000'])

In [55]:
dup_ts = pd.Series(np.arange(5),index=dates)

In [56]:
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

In [57]:
dup_ts.index.is_unique

False

In [58]:
dup_ts['1/3/2000'] # not duplicated

4

In [59]:
dup_ts['1/2/2000'] # duplicated

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32

In [60]:
grouped = dup_ts.groupby(level=0)

In [61]:
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int32

In [62]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

## Date Ranges, Frequencies, and Shifting

In [63]:
ts

2011-01-02    0.083201
2011-01-05    2.015480
2011-01-07    1.207525
2011-01-08    1.380777
2011-01-10    0.051261
2011-01-12    0.420388
dtype: float64

In [64]:
resampler = ts.resample('D') # 'D' stands for daily frequency

### Generating Data Ranges

In [65]:
index = pd.date_range('2012-04-01', '2012-06-01')

In [66]:
index

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20',
               '2012-04-21', '2012-04-22', '2012-04-23', '2012-04-24',
               '2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28',
               '2012-04-29', '2012-04-30', '2012-05-01', '2012-05-02',
               '2012-05-03', '2012-05-04', '2012-05-05', '2012-05-06',
               '2012-05-07', '2012-05-08', '2012-05-09', '2012-05-10',
               '2012-05-11', '2012-05-12', '2012-05-13', '2012-05-14',
               '2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18',
               '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22',
               '2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
      

In [67]:
pd.date_range(start='2012-04-01', periods=20)

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')

In [68]:
pd.date_range(start='2012-06-01', periods=20)

DatetimeIndex(['2012-06-01', '2012-06-02', '2012-06-03', '2012-06-04',
               '2012-06-05', '2012-06-06', '2012-06-07', '2012-06-08',
               '2012-06-09', '2012-06-10', '2012-06-11', '2012-06-12',
               '2012-06-13', '2012-06-14', '2012-06-15', '2012-06-16',
               '2012-06-17', '2012-06-18', '2012-06-19', '2012-06-20'],
              dtype='datetime64[ns]', freq='D')

In [69]:
pd.date_range('2000-01-01','2000-12-01',freq='BM')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BM')

In [70]:
pd.date_range('2012-05-02 12:56:31', periods=5)

DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
               '2012-05-04 12:56:31', '2012-05-05 12:56:31',
               '2012-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

In [71]:
pd.date_range('2012-05-02 12:56:31', periods=5, normalize=True)

DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
               '2012-05-06'],
              dtype='datetime64[ns]', freq='D')

### Frequencies and Data Offsets

In [72]:
from pandas.tseries.offsets import Hour, Minute

In [73]:
hour = Hour()

In [74]:
hour

<Hour>

In [75]:
four_hours = Hour(4)

In [76]:
four_hours

<4 * Hours>

In [77]:
pd.date_range('2000-01-01', '2000-01-03 23:59', freq='4h')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
               '2000-01-01 08:00:00', '2000-01-01 12:00:00',
               '2000-01-01 16:00:00', '2000-01-01 20:00:00',
               '2000-01-02 00:00:00', '2000-01-02 04:00:00',
               '2000-01-02 08:00:00', '2000-01-02 12:00:00',
               '2000-01-02 16:00:00', '2000-01-02 20:00:00',
               '2000-01-03 00:00:00', '2000-01-03 04:00:00',
               '2000-01-03 08:00:00', '2000-01-03 12:00:00',
               '2000-01-03 16:00:00', '2000-01-03 20:00:00'],
              dtype='datetime64[ns]', freq='4H')

In [78]:
Hour(2) + Minute(30)

<150 * Minutes>

In [79]:
pd.date_range('2000-01-01',periods=10,freq='1h30min')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
               '2000-01-01 03:00:00', '2000-01-01 04:30:00',
               '2000-01-01 06:00:00', '2000-01-01 07:30:00',
               '2000-01-01 09:00:00', '2000-01-01 10:30:00',
               '2000-01-01 12:00:00', '2000-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90T')

#### Week of month dates

In [80]:
rng = pd.date_range('2012-01-01','2012-09-01',freq='WOM-3FRI')

In [81]:
list(rng)

[Timestamp('2012-01-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-02-17 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-03-16 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-04-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-05-18 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-06-15 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-07-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-08-17 00:00:00', freq='WOM-3FRI')]

### Shifting (Leading and Lagging) Data

In [82]:
ts = pd.Series(np.random.randn(4),
              index=pd.date_range('1/1/2000',periods=4,freq='M'))

In [83]:
ts

2000-01-31    0.336999
2000-02-29   -0.705623
2000-03-31   -1.733513
2000-04-30   -0.831563
Freq: M, dtype: float64

In [84]:
ts.shift(2)

2000-01-31         NaN
2000-02-29         NaN
2000-03-31    0.336999
2000-04-30   -0.705623
Freq: M, dtype: float64

In [85]:
ts.shift(-2)

2000-01-31   -1.733513
2000-02-29   -0.831563
2000-03-31         NaN
2000-04-30         NaN
Freq: M, dtype: float64

In [86]:
ts.shift(2,freq='M')

2000-03-31    0.336999
2000-04-30   -0.705623
2000-05-31   -1.733513
2000-06-30   -0.831563
Freq: M, dtype: float64

In [87]:
ts.shift(3,freq='D')

2000-02-03    0.336999
2000-03-03   -0.705623
2000-04-03   -1.733513
2000-05-03   -0.831563
dtype: float64

In [88]:
ts.shift(1,freq='90T')

2000-01-31 01:30:00    0.336999
2000-02-29 01:30:00   -0.705623
2000-03-31 01:30:00   -1.733513
2000-04-30 01:30:00   -0.831563
Freq: M, dtype: float64

#### Shifting dates with offsets

In [89]:
from pandas.tseries.offsets import Day, MonthEnd

In [90]:
now = datetime(2011,11,17)

In [91]:
now + 3 * Day()

Timestamp('2011-11-20 00:00:00')

In [92]:
now + MonthEnd()

Timestamp('2011-11-30 00:00:00')

In [93]:
now + MonthEnd(2)

Timestamp('2011-12-31 00:00:00')

In [94]:
offset = MonthEnd()

In [95]:
offset.rollforward(now)

Timestamp('2011-11-30 00:00:00')

In [96]:
offset.rollback(now)

Timestamp('2011-10-31 00:00:00')

In [97]:
ts = pd.Series(np.random.randn(20),
              index=pd.date_range('1/15/2000',periods=20,freq='4d'))

In [98]:
ts

2000-01-15    0.032958
2000-01-19   -0.220756
2000-01-23   -1.410712
2000-01-27    1.097661
2000-01-31   -1.217798
2000-02-04    0.157188
2000-02-08    0.339819
2000-02-12   -1.400527
2000-02-16   -0.028781
2000-02-20    0.777474
2000-02-24   -1.704652
2000-02-28    0.025442
2000-03-03    0.785687
2000-03-07    0.182650
2000-03-11    0.022245
2000-03-15   -0.844895
2000-03-19   -0.793873
2000-03-23    1.417004
2000-03-27   -1.507022
2000-03-31    0.421528
Freq: 4D, dtype: float64

In [99]:
ts.groupby(offset.rollforward).mean()

2000-01-31   -0.343730
2000-02-29   -0.262005
2000-03-31   -0.039584
dtype: float64

In [100]:
ts.resample('M').mean()

2000-01-31   -0.343730
2000-02-29   -0.262005
2000-03-31   -0.039584
Freq: M, dtype: float64

## Time Zone Handling

In [101]:
import pytz

In [102]:
pytz.common_timezones[-5:]

['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']

In [103]:
tz = pytz.timezone('America/New_York')

In [104]:
tz

<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>

### Time Zone Localization and Conversion

By default, time series in pandas are __time zone naive__. For example, consider the following time series:

In [105]:
rng = pd.date_range('3/9/2012 9:30',periods=6,freq='D')

In [106]:
ts = pd.Series(np.random.randn(len(rng)),index=rng)

In [107]:
ts

2012-03-09 09:30:00    0.285138
2012-03-10 09:30:00   -0.457801
2012-03-11 09:30:00   -1.256647
2012-03-12 09:30:00   -0.576732
2012-03-13 09:30:00    0.676238
2012-03-14 09:30:00    0.995456
Freq: D, dtype: float64

In [108]:
print(ts.index.tz)

None


In [109]:
pd.date_range('3/9/2012 9:30',periods=10,freq='D',tz='UTC')

DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
               '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00', '2012-03-16 09:30:00+00:00',
               '2012-03-17 09:30:00+00:00', '2012-03-18 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

Conversion from naive to __localized__ is handled by the __tz_localize__ method:

In [110]:
ts

2012-03-09 09:30:00    0.285138
2012-03-10 09:30:00   -0.457801
2012-03-11 09:30:00   -1.256647
2012-03-12 09:30:00   -0.576732
2012-03-13 09:30:00    0.676238
2012-03-14 09:30:00    0.995456
Freq: D, dtype: float64

In [111]:
ts_utc = ts.tz_localize('UTC')

In [112]:
ts_utc

2012-03-09 09:30:00+00:00    0.285138
2012-03-10 09:30:00+00:00   -0.457801
2012-03-11 09:30:00+00:00   -1.256647
2012-03-12 09:30:00+00:00   -0.576732
2012-03-13 09:30:00+00:00    0.676238
2012-03-14 09:30:00+00:00    0.995456
Freq: D, dtype: float64

In [113]:
ts_utc.index

DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
               '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

In [114]:
ts_utc.tz_convert('America/New_York')

2012-03-09 04:30:00-05:00    0.285138
2012-03-10 04:30:00-05:00   -0.457801
2012-03-11 05:30:00-04:00   -1.256647
2012-03-12 05:30:00-04:00   -0.576732
2012-03-13 05:30:00-04:00    0.676238
2012-03-14 05:30:00-04:00    0.995456
Freq: D, dtype: float64

In [115]:
ts_eastern = ts.tz_localize('America/New_York')

In [116]:
ts_eastern.tz_convert('UTC')

2012-03-09 14:30:00+00:00    0.285138
2012-03-10 14:30:00+00:00   -0.457801
2012-03-11 13:30:00+00:00   -1.256647
2012-03-12 13:30:00+00:00   -0.576732
2012-03-13 13:30:00+00:00    0.676238
2012-03-14 13:30:00+00:00    0.995456
Freq: D, dtype: float64

In [117]:
ts_eastern.tz_convert('Europe/Berlin')

2012-03-09 15:30:00+01:00    0.285138
2012-03-10 15:30:00+01:00   -0.457801
2012-03-11 14:30:00+01:00   -1.256647
2012-03-12 14:30:00+01:00   -0.576732
2012-03-13 14:30:00+01:00    0.676238
2012-03-14 14:30:00+01:00    0.995456
Freq: D, dtype: float64

In [118]:
ts.index.tz_localize('Asia/Shanghai')

DatetimeIndex(['2012-03-09 09:30:00+08:00', '2012-03-10 09:30:00+08:00',
               '2012-03-11 09:30:00+08:00', '2012-03-12 09:30:00+08:00',
               '2012-03-13 09:30:00+08:00', '2012-03-14 09:30:00+08:00'],
              dtype='datetime64[ns, Asia/Shanghai]', freq='D')

### Operations with Time Zone - Aware Timestamp Objects

In [119]:
stamp = pd.Timestamp('2011-03-12 04:00')

In [120]:
stamp_utc = stamp.tz_localize('utc')

In [121]:
stamp_utc.tz_convert('America/New_York')

Timestamp('2011-03-11 23:00:00-0500', tz='America/New_York')

In [122]:
stamp_moscow = pd.Timestamp('2011-03-12 04:00',tz='Europe/Moscow')

In [123]:
stamp_moscow

Timestamp('2011-03-12 04:00:00+0300', tz='Europe/Moscow')

In [124]:
stamp_utc.value

1299902400000000000

In [125]:
stamp_utc.tz_convert('America/New_York').value

1299902400000000000

In [126]:
from pandas.tseries.offsets import Hour

In [127]:
stamp = pd.Timestamp('2012-03-12 01:30',tz='US/Eastern')

In [128]:
stamp

Timestamp('2012-03-12 01:30:00-0400', tz='US/Eastern')

In [129]:
stamp + Hour()

Timestamp('2012-03-12 02:30:00-0400', tz='US/Eastern')

In [130]:
stamp = pd.Timestamp('2012-11-04 00:30',tz='US/Eastern')

In [131]:
stamp

Timestamp('2012-11-04 00:30:00-0400', tz='US/Eastern')

In [132]:
stamp + 2*Hour()

Timestamp('2012-11-04 01:30:00-0500', tz='US/Eastern')

### Operations Between Different Time Zones

In [133]:
rng = pd.date_range('3/7/2012 9:30',periods=10,freq='B')

In [134]:
ts = pd.Series(np.random.randn(len(rng)),index=rng)

In [135]:
ts

2012-03-07 09:30:00   -1.952914
2012-03-08 09:30:00   -0.064687
2012-03-09 09:30:00   -0.978994
2012-03-12 09:30:00    0.625459
2012-03-13 09:30:00   -0.647132
2012-03-14 09:30:00    0.311179
2012-03-15 09:30:00   -0.057439
2012-03-16 09:30:00    0.484679
2012-03-19 09:30:00   -0.035220
2012-03-20 09:30:00   -0.265997
Freq: B, dtype: float64

In [136]:
ts1 = ts[:7].tz_localize('Europe/London')

In [137]:
ts2 = ts[2:].tz_localize('Europe/Moscow')

In [138]:
result = ts1 + ts2

In [139]:
result.index

DatetimeIndex(['2012-03-07 09:30:00+00:00', '2012-03-08 09:30:00+00:00',
               '2012-03-09 05:30:00+00:00', '2012-03-09 09:30:00+00:00',
               '2012-03-12 05:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 05:30:00+00:00', '2012-03-13 09:30:00+00:00',
               '2012-03-14 05:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 05:30:00+00:00', '2012-03-15 09:30:00+00:00',
               '2012-03-16 05:30:00+00:00', '2012-03-19 05:30:00+00:00',
               '2012-03-20 05:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

## Periods and Period Arithmetic

In [140]:
p = pd.Period(2007,freq='A-DEC')

In [141]:
p

Period('2007', 'A-DEC')

In [142]:
p + 5

Period('2012', 'A-DEC')

In [143]:
p - 2

Period('2005', 'A-DEC')

In [144]:
pd.Period('2014',freq='A-DEC')-p

7

In [145]:
rng = pd.period_range('2000-01-01','2000-06-30',freq='M')

In [146]:
rng

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')

In [147]:
pd.Series(np.random.randn(6),index=rng)

2000-01   -0.824633
2000-02   -0.344649
2000-03    0.831591
2000-04    1.571929
2000-05    1.183715
2000-06    0.505246
Freq: M, dtype: float64

In [148]:
values = ['2001Q3','2002Q2','2003Q1']

In [149]:
index = pd.PeriodIndex(values,freq='Q-DEC')

In [150]:
index

PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]', freq='Q-DEC')

### Period Frequency Conversation

In [151]:
p = pd.Period('2007',freq='A-DEC')

In [152]:
p

Period('2007', 'A-DEC')

In [153]:
p.asfreq('M',how='start')

Period('2007-01', 'M')

In [154]:
p.asfreq('M',how='end')

Period('2007-12', 'M')

In [155]:
p = pd.Period('2007',freq='A-JUN')

In [156]:
p

Period('2007', 'A-JUN')

In [157]:
p.asfreq('M','start')

Period('2006-07', 'M')

In [158]:
p.asfreq('M','end')

Period('2007-06', 'M')

In [159]:
p = pd.Period('Aug-2007','M')

In [160]:
p.asfreq('A-JUN')

Period('2008', 'A-JUN')

In [161]:
rng = pd.period_range('2006','2009',freq='A-DEC')

In [162]:
ts = pd.Series(np.random.randn(len(rng)),index=rng)

In [163]:
ts

2006    0.984048
2007   -0.455595
2008    0.089291
2009    0.023097
Freq: A-DEC, dtype: float64

In [164]:
ts.asfreq('M',how='start')

2006-01    0.984048
2007-01   -0.455595
2008-01    0.089291
2009-01    0.023097
Freq: M, dtype: float64

In [165]:
ts.asfreq('B',how='end')

2006-12-29    0.984048
2007-12-31   -0.455595
2008-12-31    0.089291
2009-12-31    0.023097
Freq: B, dtype: float64

### Quarterly Period Frequencies

In [166]:
p = pd.Period('2012Q4',freq='Q-JAN')

In [167]:
p

Period('2012Q4', 'Q-JAN')

In [168]:
p.asfreq('D','start')

Period('2011-11-01', 'D')

In [169]:
p.asfreq('D','end')

Period('2012-01-31', 'D')

In [170]:
p4pm = (p.asfreq('B','e')-1).asfreq('T','s')+16*60

In [171]:
p4pm

Period('2012-01-30 16:00', 'T')

In [172]:
p4pm.to_timestamp()

Timestamp('2012-01-30 16:00:00')

In [173]:
rng = pd.period_range('2011Q3','2012Q4',freq='Q-JAN')

In [174]:
ts = pd.Series(np.arange(len(rng)),index=rng)

In [175]:
ts

2011Q3    0
2011Q4    1
2012Q1    2
2012Q2    3
2012Q3    4
2012Q4    5
Freq: Q-JAN, dtype: int32

In [176]:
new_rng = (rng.asfreq('B','e')-1).asfreq('T','s')+16*60

In [177]:
ts.index = new_rng.to_timestamp()

In [178]:
ts

2010-10-28 16:00:00    0
2011-01-28 16:00:00    1
2011-04-28 16:00:00    2
2011-07-28 16:00:00    3
2011-10-28 16:00:00    4
2012-01-30 16:00:00    5
dtype: int32

### Converting Timestamps to Periods (and Back)

In [179]:
rng = pd.date_range('2000-01-01', periods=3, freq='M')

In [180]:
ts = pd.Series(np.random.randn(3),index=rng)

In [181]:
ts

2000-01-31    0.693318
2000-02-29   -0.388158
2000-03-31    0.432816
Freq: M, dtype: float64

In [182]:
pts = ts.to_period()

In [183]:
pts

2000-01    0.693318
2000-02   -0.388158
2000-03    0.432816
Freq: M, dtype: float64

In [184]:
rng = pd.date_range('1/29/2000',periods=6,freq='D')

In [185]:
ts2 = pd.Series(np.random.randn(6),index=rng)

In [186]:
ts2

2000-01-29    0.868811
2000-01-30   -0.925155
2000-01-31    1.233427
2000-02-01   -1.110299
2000-02-02    2.105565
2000-02-03   -1.561579
Freq: D, dtype: float64

In [187]:
ts2.to_period('M')

2000-01    0.868811
2000-01   -0.925155
2000-01    1.233427
2000-02   -1.110299
2000-02    2.105565
2000-02   -1.561579
Freq: M, dtype: float64

In [188]:
pts = ts2.to_period()

In [189]:
pts

2000-01-29    0.868811
2000-01-30   -0.925155
2000-01-31    1.233427
2000-02-01   -1.110299
2000-02-02    2.105565
2000-02-03   -1.561579
Freq: D, dtype: float64

In [190]:
pts.to_timestamp(how='end')

2000-01-29    0.868811
2000-01-30   -0.925155
2000-01-31    1.233427
2000-02-01   -1.110299
2000-02-02    2.105565
2000-02-03   -1.561579
Freq: D, dtype: float64

### Creating a PeriodIndex from Arrays

In [191]:
data = pd.read_csv('../examples/macrodata.csv')

In [192]:
data.head(5)

Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
0,1959.0,1.0,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0
1,1959.0,2.0,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74
2,1959.0,3.0,2775.488,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09
3,1959.0,4.0,2785.204,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06
4,1960.0,1.0,2847.699,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19


In [193]:
data.year

0      1959.0
1      1959.0
2      1959.0
3      1959.0
4      1960.0
5      1960.0
6      1960.0
7      1960.0
8      1961.0
9      1961.0
10     1961.0
11     1961.0
12     1962.0
13     1962.0
14     1962.0
15     1962.0
16     1963.0
17     1963.0
18     1963.0
19     1963.0
20     1964.0
21     1964.0
22     1964.0
23     1964.0
24     1965.0
25     1965.0
26     1965.0
27     1965.0
28     1966.0
29     1966.0
        ...  
173    2002.0
174    2002.0
175    2002.0
176    2003.0
177    2003.0
178    2003.0
179    2003.0
180    2004.0
181    2004.0
182    2004.0
183    2004.0
184    2005.0
185    2005.0
186    2005.0
187    2005.0
188    2006.0
189    2006.0
190    2006.0
191    2006.0
192    2007.0
193    2007.0
194    2007.0
195    2007.0
196    2008.0
197    2008.0
198    2008.0
199    2008.0
200    2009.0
201    2009.0
202    2009.0
Name: year, Length: 203, dtype: float64

In [194]:
data.quarter

0      1.0
1      2.0
2      3.0
3      4.0
4      1.0
5      2.0
6      3.0
7      4.0
8      1.0
9      2.0
10     3.0
11     4.0
12     1.0
13     2.0
14     3.0
15     4.0
16     1.0
17     2.0
18     3.0
19     4.0
20     1.0
21     2.0
22     3.0
23     4.0
24     1.0
25     2.0
26     3.0
27     4.0
28     1.0
29     2.0
      ... 
173    2.0
174    3.0
175    4.0
176    1.0
177    2.0
178    3.0
179    4.0
180    1.0
181    2.0
182    3.0
183    4.0
184    1.0
185    2.0
186    3.0
187    4.0
188    1.0
189    2.0
190    3.0
191    4.0
192    1.0
193    2.0
194    3.0
195    4.0
196    1.0
197    2.0
198    3.0
199    4.0
200    1.0
201    2.0
202    3.0
Name: quarter, Length: 203, dtype: float64

In [195]:
index = pd.PeriodIndex(year=data.year,quarter=data.quarter, freq = 'Q-DEC')

In [196]:
index

PeriodIndex(['1959Q1', '1959Q2', '1959Q3', '1959Q4', '1960Q1', '1960Q2',
             '1960Q3', '1960Q4', '1961Q1', '1961Q2',
             ...
             '2007Q2', '2007Q3', '2007Q4', '2008Q1', '2008Q2', '2008Q3',
             '2008Q4', '2009Q1', '2009Q2', '2009Q3'],
            dtype='period[Q-DEC]', length=203, freq='Q-DEC')

In [197]:
data.index = index

In [198]:
data.infl

1959Q1    0.00
1959Q2    2.34
1959Q3    2.74
1959Q4    0.27
1960Q1    2.31
1960Q2    0.14
1960Q3    2.70
1960Q4    1.21
1961Q1   -0.40
1961Q2    1.47
1961Q3    0.80
1961Q4    0.80
1962Q1    2.26
1962Q2    0.13
1962Q3    2.11
1962Q4    0.79
1963Q1    0.53
1963Q2    2.75
1963Q3    0.78
1963Q4    2.46
1964Q1    0.13
1964Q2    0.90
1964Q3    1.29
1964Q4    2.05
1965Q1    1.28
1965Q2    2.54
1965Q3    0.89
1965Q4    2.90
1966Q1    4.99
1966Q2    2.10
          ... 
2002Q2    1.56
2002Q3    2.66
2002Q4    3.08
2003Q1    1.31
2003Q2    1.09
2003Q3    2.60
2003Q4    3.02
2004Q1    2.35
2004Q2    3.61
2004Q3    3.58
2004Q4    2.09
2005Q1    4.15
2005Q2    1.85
2005Q3    9.14
2005Q4    0.40
2006Q1    2.60
2006Q2    3.97
2006Q3   -1.58
2006Q4    3.30
2007Q1    4.58
2007Q2    2.75
2007Q3    3.45
2007Q4    6.38
2008Q1    2.82
2008Q2    8.53
2008Q3   -3.16
2008Q4   -8.79
2009Q1    0.94
2009Q2    3.37
2009Q3    3.56
Freq: Q-DEC, Name: infl, Length: 203, dtype: float64

## Resampling and Frequency Conversion

In [200]:
rng = pd.date_range('2000-01-01', periods=100, freq='D')

In [201]:
ts = pd.Series(np.random.randn(len(rng)),index=rng)

In [202]:
ts

2000-01-01   -0.503281
2000-01-02    0.098724
2000-01-03    0.112531
2000-01-04   -0.095180
2000-01-05    1.219020
2000-01-06    0.100433
2000-01-07    0.360853
2000-01-08   -0.788157
2000-01-09   -0.595734
2000-01-10    1.257720
2000-01-11   -0.329813
2000-01-12    0.174794
2000-01-13   -1.019999
2000-01-14   -0.567199
2000-01-15   -0.405634
2000-01-16    0.613153
2000-01-17   -1.387715
2000-01-18    0.711966
2000-01-19   -0.404624
2000-01-20   -0.243163
2000-01-21    0.106480
2000-01-22    0.105211
2000-01-23   -1.969025
2000-01-24   -1.006891
2000-01-25    0.638498
2000-01-26    0.532305
2000-01-27   -1.244464
2000-01-28   -0.231877
2000-01-29    0.166862
2000-01-30   -0.023118
                ...   
2000-03-11    0.143202
2000-03-12   -0.822074
2000-03-13   -0.957975
2000-03-14   -0.488004
2000-03-15    0.660807
2000-03-16   -1.550157
2000-03-17   -0.672695
2000-03-18    0.533763
2000-03-19    1.529204
2000-03-20    0.765724
2000-03-21    0.496677
2000-03-22   -0.428952
2000-03-23 

In [203]:
ts.resample('M').mean()

2000-01-31   -0.126830
2000-02-29   -0.276664
2000-03-31   -0.145260
2000-04-30   -0.363392
Freq: M, dtype: float64

In [204]:
ts.resample('M',kind='period').mean()

2000-01   -0.126830
2000-02   -0.276664
2000-03   -0.145260
2000-04   -0.363392
Freq: M, dtype: float64

### Downsampling 

In [205]:
rng = pd.date_range('2000-01-01',periods=12,freq='T')

In [207]:
ts = pd.Series(np.arange(12),index=rng)

In [208]:
ts

2000-01-01 00:00:00     0
2000-01-01 00:01:00     1
2000-01-01 00:02:00     2
2000-01-01 00:03:00     3
2000-01-01 00:04:00     4
2000-01-01 00:05:00     5
2000-01-01 00:06:00     6
2000-01-01 00:07:00     7
2000-01-01 00:08:00     8
2000-01-01 00:09:00     9
2000-01-01 00:10:00    10
2000-01-01 00:11:00    11
Freq: T, dtype: int32

In [209]:
ts.resample('5min',closed='right').sum()

1999-12-31 23:55:00     0
2000-01-01 00:00:00    15
2000-01-01 00:05:00    40
2000-01-01 00:10:00    11
Freq: 5T, dtype: int32

In [210]:
ts.resample('5min',closed='right').sum()

1999-12-31 23:55:00     0
2000-01-01 00:00:00    15
2000-01-01 00:05:00    40
2000-01-01 00:10:00    11
Freq: 5T, dtype: int32

In [211]:
ts.resample('5min',closed='right',label='right').sum()

2000-01-01 00:00:00     0
2000-01-01 00:05:00    15
2000-01-01 00:10:00    40
2000-01-01 00:15:00    11
Freq: 5T, dtype: int32

In [214]:
ts.resample('5min',closed='right',label='right', loffset='-1s').sum()

1999-12-31 23:59:59     0
2000-01-01 00:04:59    15
2000-01-01 00:09:59    40
2000-01-01 00:14:59    11
Freq: 5T, dtype: int32

#### Open-High-Low-Close (OHLC) resampling

In [216]:
ts.resample('5min').ohlc()

Unnamed: 0,open,high,low,close
2000-01-01 00:00:00,0,4,0,4
2000-01-01 00:05:00,5,9,5,9
2000-01-01 00:10:00,10,11,10,11


### Upsampling and Interpolation

In [217]:
frame = pd.DataFrame(np.random.randn(2,4),
                    index=pd.date_range('1/1/2000', periods=2,
                                       freq='W-WED'),
                    columns=['Colorado','Texas','New York','Ohio'])

In [218]:
frame

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,0.859012,-0.593867,-1.475458,1.349093
2000-01-12,-0.055272,1.123251,-0.292876,1.107457


In [219]:
df_daily = frame.resample('D').asfreq()

In [221]:
df_daily

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,0.859012,-0.593867,-1.475458,1.349093
2000-01-06,,,,
2000-01-07,,,,
2000-01-08,,,,
2000-01-09,,,,
2000-01-10,,,,
2000-01-11,,,,
2000-01-12,-0.055272,1.123251,-0.292876,1.107457


In [222]:
frame.resample('D').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,0.859012,-0.593867,-1.475458,1.349093
2000-01-06,0.859012,-0.593867,-1.475458,1.349093
2000-01-07,0.859012,-0.593867,-1.475458,1.349093
2000-01-08,0.859012,-0.593867,-1.475458,1.349093
2000-01-09,0.859012,-0.593867,-1.475458,1.349093
2000-01-10,0.859012,-0.593867,-1.475458,1.349093
2000-01-11,0.859012,-0.593867,-1.475458,1.349093
2000-01-12,-0.055272,1.123251,-0.292876,1.107457


In [224]:
frame.resample('D').ffill(limit=2)

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,0.859012,-0.593867,-1.475458,1.349093
2000-01-06,0.859012,-0.593867,-1.475458,1.349093
2000-01-07,0.859012,-0.593867,-1.475458,1.349093
2000-01-08,,,,
2000-01-09,,,,
2000-01-10,,,,
2000-01-11,,,,
2000-01-12,-0.055272,1.123251,-0.292876,1.107457


In [225]:
frame.resample('W-THU').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-06,0.859012,-0.593867,-1.475458,1.349093
2000-01-13,-0.055272,1.123251,-0.292876,1.107457


### Resampling with Periods

In [226]:
frame = pd.DataFrame(np.random.randn(24,4),
                    index=pd.period_range('1-2000','12-2001',
                                         freq='M'),
                    columns=['Colorando','Texas','New York','Ohio'])

In [227]:
frame[:5]

Unnamed: 0,Colorando,Texas,New York,Ohio
2000-01,-0.266894,-0.458003,-0.558211,-0.301921
2000-02,0.101667,0.49935,-0.772249,-1.716228
2000-03,-1.793567,-0.1892,0.920568,0.207212
2000-04,-0.459112,0.291274,-1.101462,-0.611035
2000-05,-0.593033,0.029091,0.226391,0.380223


In [228]:
annual_frame = frame.resample('A-DEC').mean()

In [229]:
annual_frame

Unnamed: 0,Colorando,Texas,New York,Ohio
2000,-0.166318,0.113916,-0.148863,-0.132102
2001,-0.355021,-0.384082,-0.120598,0.552153


## Moving Windows Functions

### Exponentially Weighted Functions

### Binary Moving Windows Functions

### User-Defined Moving Window Functions