# Indexing, Selection, Subsetting

TimeSeries is a sbuclass of Series and thus behaves in the same way with regard to indexing and selecting data based in label:

In [1]:
from datetime import datetime
import numpy as np
import pandas as pd
from pandas import DataFrame, Series

In [2]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7),
        datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]

In [5]:
ts = Series(np.arange(6), index = dates)

In [7]:
stamp = ts.index[2]

ts[stamp]

2

As a convenience, you can also pass a string that is interpretable as a date:

In [12]:
ts['1/10/2011'], ts['20110110']

(4, 4)

For longer time series, a year or only a year and month can be passed to easily selet slices of data:

In [15]:
longer_ts = Series(np.random.randn(1000),
                index = pd.date_range('1/1/2000', periods = 1000))

In [16]:
longer_ts

2000-01-01    0.402672
2000-01-02    1.410485
2000-01-03    0.953895
2000-01-04   -0.576692
2000-01-05    0.104537
                ...   
2002-09-22   -0.471728
2002-09-23   -0.417972
2002-09-24    0.113356
2002-09-25    0.377913
2002-09-26   -0.138039
Freq: D, Length: 1000, dtype: float64

In [17]:
longer_ts['2001'], longer_ts['2001/05'], longer_ts['2001-05-01']

(2001-01-01    0.269884
 2001-01-02    0.217753
 2001-01-03   -0.996793
 2001-01-04   -2.995544
 2001-01-05   -0.238005
                 ...   
 2001-12-27   -1.038909
 2001-12-28   -1.723215
 2001-12-29    0.259803
 2001-12-30    0.017401
 2001-12-31    0.445740
 Freq: D, Length: 365, dtype: float64,
 2001-05-01    0.582064
 2001-05-02    1.115155
 2001-05-03   -1.285084
 2001-05-04   -1.141618
 2001-05-05    0.761272
 2001-05-06   -0.137796
 2001-05-07    0.195033
 2001-05-08   -0.679612
 2001-05-09    0.233599
 2001-05-10    0.472539
 2001-05-11    0.917074
 2001-05-12   -0.521201
 2001-05-13    0.489207
 2001-05-14   -1.517220
 2001-05-15   -1.254942
 2001-05-16    0.493229
 2001-05-17   -0.167290
 2001-05-18   -1.795093
 2001-05-19   -0.174348
 2001-05-20   -2.618196
 2001-05-21   -1.324433
 2001-05-22   -0.970721
 2001-05-23   -0.703807
 2001-05-24   -0.523259
 2001-05-25   -1.503593
 2001-05-26    0.902467
 2001-05-27    0.693617
 2001-05-28   -2.685334
 2001-05-29   -0.045765
 

Slicing with dates works just like with a regular Series:

In [19]:
ts[datetime(2011, 1, 3):]

2011-01-05    1
2011-01-07    2
2011-01-08    3
2011-01-10    4
2011-01-12    5
dtype: int32

Because most time series data is ordared chronologically, you can slice with timestamps not contained in a time series to perform a range query:

In [21]:
ts, ts['1/06/2011': '1/11/2011']

(2011-01-02    0
 2011-01-05    1
 2011-01-07    2
 2011-01-08    3
 2011-01-10    4
 2011-01-12    5
 dtype: int32,
 2011-01-07    2
 2011-01-08    3
 2011-01-10    4
 dtype: int32)

As before you can pass either a string date, datetime, or Timestamp. Remember that slicing in this manner produces views on the source time series just like slicing NumPy arrays. There is an equivalent instance method truncate which slices a TimeSeries between two dates:

In [22]:
ts.truncate(after = '1/9/2011')

2011-01-02    0
2011-01-05    1
2011-01-07    2
2011-01-08    3
dtype: int32

All of the above holds true for DataFrame as well, indexing on its rows:

In [23]:
dates = pd.date_range('1/1/2000', periods= 200, freq='W-WED')

In [26]:
long_df = DataFrame(np.random.randn(200, 4),
            index = dates,
            columns = ['A', 'B', 'C', 'D'])

In [28]:
long_df.loc['05-2001']

Unnamed: 0,A,B,C,D
2001-05-02,-1.01203,1.072159,1.013872,0.700652
2001-05-09,-0.425584,-1.280605,-1.158004,-0.659768
2001-05-16,0.693483,0.111303,1.409487,-0.833265
2001-05-23,0.120371,-0.762161,0.061648,1.193289
2001-05-30,-0.247413,-0.74118,-0.564556,-0.775398
