In [1]:
import pandas as pd
import numpy as np

In [2]:
(pd.__version__, np.__version__)

('0.20.2', '1.11.3')

# String Manipulation in Pandas

## Basic String Object Methods

`split()` 可拆分字串。結果為 list。

In [3]:
val = 'a,b, guido'
val.split(',')

['a', 'b', ' guido']

`split()` 搭配 `strip()` 可以去除結果的空白格。

In [4]:
pieces = [x.strip() for x in val.split(',')]
pieces

['a', 'b', 'guido']

使用 `''.join(str_list)` 將 list 內的字串結合。

In [5]:
''.join(pieces)

'abguido'

可用 `in` 關鍵字檢查某字是否在字串內。

In [6]:
'guido' in val

True

`replace()` 可以新字串取代字串內的某部份。

In [7]:
val.replace(',', '::')

'a::b:: guido'

## Regular Expressions

Regular expressions 可以透過建立搜尋的 pattern 彈性地處理字串。python 的 regular expressions 實作於 re 套件中。

In [8]:
import re

pattern '\s+' 的意思是 \s 為所有形式的空格 ([ \t\n\r\f\v])。+ 為一個或多個。

In [9]:
text = "foo bar\t baz \tqux"
re.split('\s+', text)

['foo', 'bar', 'baz', 'qux']

可用 `re.complie()` 定義要找尋的 pattern。

In [10]:
regex = re.compile('\s+')
regex.split(text)

['foo', 'bar', 'baz', 'qux']

`findall()` 可以找出所有符合形式的字串。  
故使用前面的 regex 的話，可找出 foo bar 間的 ' '、bar baz 間的 '\t '、baz qux 間的 '\t '。

In [11]:
regex.findall(text)

[' ', '\t ', ' \t']

若 pattern 會一直使用，建議可先透過 `re.complie()` 先建立好，以供每次使用。例如搜尋郵件。

In [12]:
text = """Dave dave@google.com Steve steve@gmail.com
Rob rob@gmail.com
Ryan ryan@yahoo.com
"""
pattern = r'[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}'

In [13]:
regex = re.compile(pattern, flags=re.IGNORECASE)
regex.findall(text)

['dave@google.com', 'steve@gmail.com', 'rob@gmail.com', 'ryan@yahoo.com']

`search()` 會找出第一個符合 pattern 的字串。且需透過 `start()` 與 `end()` 從原本字串擷取出來。

In [14]:
m = regex.search(text)
text[m.start():m.end()]

'dave@google.com'

`match()` 只會找到符合 pattern 的字串且此字串為原字串起頭。所以下面的例子會回傳 None，因為起頭是名字 Dave 而非郵件。

In [15]:
print(regex.match(text))

None


`sub()` 會將符合 pattern 的字串以新值取代。

In [16]:
print(regex.sub('REDACTED', text))

Dave REDACTED Steve REDACTED
Rob REDACTED
Ryan REDACTED



透過在 pattern 加括號可將回傳字串加以分組並回傳一個 tuple。

In [17]:
pattern = r'([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})'
regex = re.compile(pattern, flags=re.IGNORECASE)

In [18]:
m = regex.match('wesm@bright.net')
m.groups()

('wesm', 'bright', 'net')

In [19]:
regex.findall(text)

[('dave', 'google', 'com'),
 ('steve', 'gmail', 'com'),
 ('rob', 'gmail', 'com'),
 ('ryan', 'yahoo', 'com')]

`sub()` 搭配 \1, \2 等特殊符號可取出字串組元素。

In [20]:
print(regex.sub(r'Username: \1, Domain: \2, Suffix: \3', text))

Dave Username: dave, Domain: google, Suffix: com Steve Username: steve, Domain: gmail, Suffix: com
Rob Username: rob, Domain: gmail, Suffix: com
Ryan Username: ryan, Domain: yahoo, Suffix: com



In [21]:
regex = re.compile(r"""
(?P<username>[A-Z0-9._%+-]+)
@
(?P<domain>[A-Z0-9.-]+)
\.
(?P<suffix>[A-Z]{2,4})""", flags=re.IGNORECASE|re.VERBOSE)

m = regex.match('wesm@bright.net')
m.groupdict()

{'domain': 'bright', 'suffix': 'net', 'username': 'wesm'}

## Vectorized String Functions in Pandas

Series 的 str 屬性可協助處理字串。比起用 map() 搭配 lambda 可避免 NaN 產生的錯誤。

In [22]:
data = {'Dave': 'dave@google.com', 'Steve': 'steve@gmail.com', 'Rob': 'rob@gmail.com', 'Wes': np.nan}
data = pd.Series(data)
data

Dave     dave@google.com
Rob        rob@gmail.com
Steve    steve@gmail.com
Wes                  NaN
dtype: object

In [23]:
data.str.contains('gmail')

Dave     False
Rob       True
Steve     True
Wes        NaN
dtype: object

In [24]:
data.str.findall(pattern, flags=re.IGNORECASE)

Dave     [(dave, google, com)]
Rob        [(rob, gmail, com)]
Steve    [(steve, gmail, com)]
Wes                        NaN
dtype: object

# Date and Time Data Types and Tools

datetime 模組實作有關處理日期與時間的函數。

In [25]:
from datetime import datetime
now = datetime.now()
now.year, now.month, now.day

(2017, 6, 23)

In [26]:
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
delta.days

926

## Converting Between String and Datetime

透過 `str()` 或 `strftime()` 可將 datetime 物件轉換為字串。

In [27]:
stamp = datetime(2011, 1, 3)
str(stamp)

'2011-01-03 00:00:00'

In [28]:
stamp.strftime('%Y-%m-%d')

'2011-01-03'

透過 `datetime.strptime()` 可將字串轉換為 datetime 物件(需先指定好日期格式)。

In [29]:
value = '2011-01-03'
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

透過在 dateutil 套件的 `parser.parse()` 可不先指定日期格式下，將字串轉換為 datetime 物件。

In [30]:
from dateutil.parser import parse
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [31]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

pandas 提供 `to_datetime()` 可將字串轉換成日期形式的 index。

In [32]:
datestrs = ['7/6/2011', '8/6/2011']
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06', '2011-08-06'], dtype='datetime64[ns]', freq=None)

NaT (Not a Time) 為 pandas timestamp 的 NA 值。

In [33]:
idx = pd.to_datetime(datestrs + [None])
idx

DatetimeIndex(['2011-07-06', '2011-08-06', 'NaT'], dtype='datetime64[ns]', freq=None)

In [34]:
pd.isnull(idx)

array([False, False,  True], dtype=bool)

## Time Series Basics

In [35]:
from datetime import datetime

In [36]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7), 
         datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]

In [37]:
ts = pd.Series(np.random.randn(6), index=dates)
ts

2011-01-02    0.038658
2011-01-05   -0.225977
2011-01-07    0.037566
2011-01-08   -0.058270
2011-01-10    1.675804
2011-01-12    0.556646
dtype: float64

In [38]:
ts + ts[::2]

2011-01-02    0.077317
2011-01-05         NaN
2011-01-07    0.075131
2011-01-08         NaN
2011-01-10    3.351609
2011-01-12         NaN
dtype: float64

### Indexing, Selection and Subsetting Time Series

使用 1. index 或 2. 日期字串篩選值。

In [39]:
stamp = ts.index[2]
ts[stamp]

0.037565564225794344

In [40]:
ts['1/10/2011']

1.6758042526850931

In [41]:
ts['20110110']

1.6758042526850931

日期字串篩選值的範例，可透過年、年月、年月日篩選。

In [42]:
longer_ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))

In [43]:
# 年
longer_ts['2000']

2000-01-01    0.777235
2000-01-02   -0.391175
2000-01-03   -0.056078
2000-01-04    0.595772
2000-01-05   -2.270080
2000-01-06    0.952383
2000-01-07   -1.052315
2000-01-08   -0.098199
2000-01-09   -0.813653
2000-01-10    2.497385
2000-01-11    1.352641
2000-01-12    0.948818
2000-01-13    0.461425
2000-01-14    0.610727
2000-01-15    0.963790
2000-01-16   -1.725343
2000-01-17    0.679043
2000-01-18   -0.107182
2000-01-19    0.196511
2000-01-20   -0.632251
2000-01-21    1.032987
2000-01-22   -0.075529
2000-01-23   -0.050130
2000-01-24   -0.300171
2000-01-25   -0.591597
2000-01-26    0.467255
2000-01-27    1.188971
2000-01-28    0.840311
2000-01-29    0.957291
2000-01-30   -0.349045
                ...   
2000-12-02    0.733718
2000-12-03    1.090790
2000-12-04   -1.026515
2000-12-05    0.485456
2000-12-06    0.476567
2000-12-07   -0.049838
2000-12-08   -1.097233
2000-12-09    0.356104
2000-12-10   -0.638575
2000-12-11    0.768695
2000-12-12   -0.025444
2000-12-13   -0.443147
2000-12-14 

In [44]:
# 年月
longer_ts['2001-03']

2001-03-01   -1.491051
2001-03-02   -0.382234
2001-03-03    0.727387
2001-03-04    0.571757
2001-03-05    0.797736
2001-03-06    0.974772
2001-03-07   -0.687610
2001-03-08    0.602546
2001-03-09   -0.727339
2001-03-10   -1.840302
2001-03-11   -0.785657
2001-03-12    1.577549
2001-03-13    0.226279
2001-03-14   -0.686674
2001-03-15   -0.404360
2001-03-16   -0.702388
2001-03-17    0.371027
2001-03-18   -1.537937
2001-03-19    0.598941
2001-03-20   -0.171682
2001-03-21   -1.201228
2001-03-22   -0.655286
2001-03-23   -0.419676
2001-03-24    0.047703
2001-03-25   -0.485088
2001-03-26    0.616913
2001-03-27    0.720621
2001-03-28   -0.864739
2001-03-29   -0.949985
2001-03-30   -0.595773
2001-03-31   -0.410694
Freq: D, dtype: float64

In [45]:
# 年月日
longer_ts['2001-03-23']

-0.4196755485853978

使用 `:` 進行 slicing。

In [46]:
ts[datetime(2011, 1, 7):]

2011-01-07    0.037566
2011-01-08   -0.058270
2011-01-10    1.675804
2011-01-12    0.556646
dtype: float64

## Date Ranges, Frequencies, and Shifting
本節說明如何產生 Time Series 的時間區間，調整間隔與進行平移。

### Generating Date Ranges
`pd.date_range()` 可產生時間區間，periods 參數決定產生幾個時間點，預設時間間隔單位為天(freq='D')

In [47]:
pd.date_range(start='4/1/2012', periods=20)

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')

In [48]:
pd.date_range(end='6/1/2012', periods=20)

DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16',
               '2012-05-17', '2012-05-18', '2012-05-19', '2012-05-20',
               '2012-05-21', '2012-05-22', '2012-05-23', '2012-05-24',
               '2012-05-25', '2012-05-26', '2012-05-27', '2012-05-28',
               '2012-05-29', '2012-05-30', '2012-05-31', '2012-06-01'],
              dtype='datetime64[ns]', freq='D')

### Setting Frequencies

時間間隔單位搭配 freq 參數可彈性地產生不同間隔的 DatetimeIndex。設定格式可參考[文件](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases)。 

In [49]:
pd.date_range('1/1/2000', '1/3/2000 23:59', freq='4h')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
               '2000-01-01 08:00:00', '2000-01-01 12:00:00',
               '2000-01-01 16:00:00', '2000-01-01 20:00:00',
               '2000-01-02 00:00:00', '2000-01-02 04:00:00',
               '2000-01-02 08:00:00', '2000-01-02 12:00:00',
               '2000-01-02 16:00:00', '2000-01-02 20:00:00',
               '2000-01-03 00:00:00', '2000-01-03 04:00:00',
               '2000-01-03 08:00:00', '2000-01-03 12:00:00',
               '2000-01-03 16:00:00', '2000-01-03 20:00:00'],
              dtype='datetime64[ns]', freq='4H')

In [50]:
pd.date_range('1/1/2000', periods=10, freq='1h30min')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
               '2000-01-01 03:00:00', '2000-01-01 04:30:00',
               '2000-01-01 06:00:00', '2000-01-01 07:30:00',
               '2000-01-01 09:00:00', '2000-01-01 10:30:00',
               '2000-01-01 12:00:00', '2000-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90T')

In [51]:
pd.date_range('1/1/2012', '9/1/2012', freq='WOM-3FRI')

DatetimeIndex(['2012-01-20', '2012-02-17', '2012-03-16', '2012-04-20',
               '2012-05-18', '2012-06-15', '2012-07-20', '2012-08-17'],
              dtype='datetime64[ns]', freq='WOM-3FRI')

### Shifting (Leading and Lagging) Data

`shift()` 可將資料前移或後挪 n 期。freq 參數同樣可以設定時間區隔。

In [52]:
ts = pd.Series(np.random.randn(4), index=pd.date_range('1/1/2000', periods=4, freq='M'))
ts

2000-01-31   -0.778178
2000-02-29    1.168037
2000-03-31    1.405917
2000-04-30   -1.876510
Freq: M, dtype: float64

In [53]:
ts.shift(2)

2000-01-31         NaN
2000-02-29         NaN
2000-03-31   -0.778178
2000-04-30    1.168037
Freq: M, dtype: float64

In [54]:
ts.shift(-2)

2000-01-31    1.405917
2000-02-29   -1.876510
2000-03-31         NaN
2000-04-30         NaN
Freq: M, dtype: float64

In [55]:
ts.shift(2, freq='M')

2000-03-31   -0.778178
2000-04-30    1.168037
2000-05-31    1.405917
2000-06-30   -1.876510
Freq: M, dtype: float64

In [56]:
ts.shift(1, freq='90T')

2000-01-31 01:30:00   -0.778178
2000-02-29 01:30:00    1.168037
2000-03-31 01:30:00    1.405917
2000-04-30 01:30:00   -1.876510
Freq: M, dtype: float64

### Shifting Dates with Offsets

In [57]:
from pandas.tseries.offsets import Day, MonthEnd
now = datetime(2011, 11, 17)
now + 3 * Day()

Timestamp('2011-11-20 00:00:00')

In [58]:
now + MonthEnd()

Timestamp('2011-11-30 00:00:00')

In [59]:
offset = MonthEnd()
offset.rollforward(now)

Timestamp('2011-11-30 00:00:00')

In [60]:
offset.rollback(now)

Timestamp('2011-10-31 00:00:00')

## Time Zone Handling

若有兩個時區的資料相加，則結果資料的時區為 [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time)。

In [61]:
rng = pd.date_range('3/7/2012 9:30', periods=10, freq='B')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2012-03-07 09:30:00    0.190879
2012-03-08 09:30:00    0.991060
2012-03-09 09:30:00    1.002618
2012-03-12 09:30:00    0.787331
2012-03-13 09:30:00    2.392298
2012-03-14 09:30:00   -0.662529
2012-03-15 09:30:00    0.776186
2012-03-16 09:30:00   -1.783516
2012-03-19 09:30:00   -0.235125
2012-03-20 09:30:00   -0.307027
Freq: B, dtype: float64

In [62]:
ts1 = ts[:7].tz_localize('Europe/London')
ts1

2012-03-07 09:30:00+00:00    0.190879
2012-03-08 09:30:00+00:00    0.991060
2012-03-09 09:30:00+00:00    1.002618
2012-03-12 09:30:00+00:00    0.787331
2012-03-13 09:30:00+00:00    2.392298
2012-03-14 09:30:00+00:00   -0.662529
2012-03-15 09:30:00+00:00    0.776186
Freq: B, dtype: float64

In [63]:
ts2 = ts1[2:].tz_convert('Europe/Moscow')
ts2

2012-03-09 13:30:00+04:00    1.002618
2012-03-12 13:30:00+04:00    0.787331
2012-03-13 13:30:00+04:00    2.392298
2012-03-14 13:30:00+04:00   -0.662529
2012-03-15 13:30:00+04:00    0.776186
Freq: B, dtype: float64

In [64]:
result = ts1 + ts2
result.index

DatetimeIndex(['2012-03-07 09:30:00+00:00', '2012-03-08 09:30:00+00:00',
               '2012-03-09 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='B')

## Periods and Period Arithmetic

pd.Period 可以建立 time span 的物件。例如下方範例建立年的 Period。

In [65]:
p = pd.Period(2007, freq='A-DEC')
p

Period('2007', 'A-DEC')

In [66]:
p + 5

Period('2012', 'A-DEC')

### Period Frequency Conversion

`asfreq()` 可以調整時間區段。例如範例，原本 p 為 2007 年，並以 12 月為結尾的一年。透過 p.as.freq('M', how='start') 調整為 2007 年的一月。

In [67]:
p = pd.Period('2007', freq='A-DEC')

In [68]:
p.asfreq('M', how='start')

Period('2007-01', 'M')

In [69]:
p.asfreq('M', how='end')

Period('2007-12', 'M')

A-JUN 表示的是以一月結尾的一年，故前一年二月到今年一月皆視為今年。

In [70]:
p = pd.Period('2007-08', 'M')
p

Period('2007-08', 'M')

In [71]:
p.asfreq('A-JUN')

Period('2008', 'A-JUN')

下面的例子為建立本季前一天 (1/30) 下午 4:00 的 Period。 `p.asfreq('B', 'e')` 代表的是本季工作天的結尾('e' 表示 'end')，減去一天即為 1/30。 `p.asfreq('T', 's')` 代表的是今天起始分鐘('T' 為 分鐘，'s' 表示 'start')，故為 00:00，最後加上 16 * 60 分鐘即所求。

In [72]:
p4pm = (p.asfreq('B', 'e') - 1).asfreq('T', 's') + 16 * 60
p4pm

Period('2007-08-30 16:00', 'T')

In [73]:
p4pm.to_timestamp()

Timestamp('2007-08-30 16:00:00')

一次計算多個季度。

In [74]:
rng = pd.period_range('2011Q3', '2012Q4', freq='Q-JAN')
ts = pd.Series(np.arange(len(rng)), index=rng)
ts

2011Q3    0
2011Q4    1
2012Q1    2
2012Q2    3
2012Q3    4
2012Q4    5
Freq: Q-JAN, dtype: int64

In [75]:
new_rng = (rng.asfreq('B', 'e') - 1).asfreq('T', 's') + 16 * 60
ts.index = new_rng.to_timestamp()
ts

2010-10-28 16:00:00    0
2011-01-28 16:00:00    1
2011-04-28 16:00:00    2
2011-07-28 16:00:00    3
2011-10-28 16:00:00    4
2012-01-30 16:00:00    5
dtype: int64

### Converting Timestamps to Periods (and Back)

`to_period()` 與 `to_timestamp()` 可以在 Timestamps 與 Periods 間轉換。

In [76]:
rng = pd.date_range('1/1/2000', periods=3, freq='M')
ts = pd.Series(len(rng), index=rng)
pts = ts.to_period()
pts

2000-01    3
2000-02    3
2000-03    3
Freq: M, dtype: int64

In [77]:
rng = pd.date_range('1/29/2000', periods=6, freq='D')
ts2 = pd.Series(len(rng), index=rng)
ts2.to_period('M')
pts = ts.to_period()
pts

2000-01    3
2000-02    3
2000-03    3
Freq: M, dtype: int64

In [78]:
pts.to_timestamp(how='end')

2000-01-31    3
2000-02-29    3
2000-03-31    3
Freq: M, dtype: int64

## Resampling and Frequency Conversion

Resampling 指的是將時間序列資料的頻度轉換(如日轉換為年、年轉換為月)，分為高頻轉到低頻 (downsampling)，低頻轉到高頻 (upsampling)，與同頻轉換到同頻(例如 W-WED 轉換為 W-MON)。

### Resampling

In [79]:
rng = pd.date_range('1/1/2000', periods=100, freq='D')
ts = pd.Series((len(rng)), index=rng)
ts.resample('M').sum()

2000-01-31    3100
2000-02-29    2900
2000-03-31    3100
2000-04-30     900
Freq: M, dtype: int64

In [80]:
ts.resample('M', kind='period').mean()

2000-01    100
2000-02    100
2000-03    100
2000-04    100
Freq: M, dtype: int64

### Downsampling

In [81]:
rng = pd.date_range('1/1/2000', periods = 12, freq = 'T')
ts = pd.Series(np.arange(12), index = rng)
ts.resample('5min').sum()

2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int64

closed 參數用來決定左閉或右閉區間。

In [82]:
ts.resample('5min', closed='right').sum()

1999-12-31 23:55:00     0
2000-01-01 00:00:00    15
2000-01-01 00:05:00    40
2000-01-01 00:10:00    11
Freq: 5T, dtype: int64

In [83]:
ts.resample('5min', closed='left', label='left').sum()

2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int64

In [84]:
ts.resample('5min', loffset='-1s').sum()

1999-12-31 23:59:59    10
2000-01-01 00:04:59    35
2000-01-01 00:09:59    21
Freq: 5T, dtype: int64

In [85]:
ts.resample('5min').ohlc()

Unnamed: 0,open,high,low,close
2000-01-01 00:00:00,0,4,0,4
2000-01-01 00:05:00,5,9,5,9
2000-01-01 00:10:00,10,11,10,11


### Upsampling and Interpolation

In [86]:
frame = pd.DataFrame(np.random.randn(2, 4),
                     index=pd.date_range('1/1/2000', periods=2, freq='W-WED'),
                     columns=['Colorado', 'Texas', 'New York', 'Ohio'])
frame[:5]

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-1.707509,-0.128425,-0.511688,1.155253
2000-01-12,-0.773318,-0.389485,-0.414758,0.594776


In [87]:
df_daily = frame.resample('D').mean()
df_daily

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-1.707509,-0.128425,-0.511688,1.155253
2000-01-06,,,,
2000-01-07,,,,
2000-01-08,,,,
2000-01-09,,,,
2000-01-10,,,,
2000-01-11,,,,
2000-01-12,-0.773318,-0.389485,-0.414758,0.594776


In [88]:
# 設定參數 fill_method='ffill' 參數填補 NaN
frame.resample('D').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-1.707509,-0.128425,-0.511688,1.155253
2000-01-06,-1.707509,-0.128425,-0.511688,1.155253
2000-01-07,-1.707509,-0.128425,-0.511688,1.155253
2000-01-08,-1.707509,-0.128425,-0.511688,1.155253
2000-01-09,-1.707509,-0.128425,-0.511688,1.155253
2000-01-10,-1.707509,-0.128425,-0.511688,1.155253
2000-01-11,-1.707509,-0.128425,-0.511688,1.155253
2000-01-12,-0.773318,-0.389485,-0.414758,0.594776


In [89]:
# 設定參數 limit 限制填補期數
frame.resample('D').ffill(limit = 2)

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-1.707509,-0.128425,-0.511688,1.155253
2000-01-06,-1.707509,-0.128425,-0.511688,1.155253
2000-01-07,-1.707509,-0.128425,-0.511688,1.155253
2000-01-08,,,,
2000-01-09,,,,
2000-01-10,,,,
2000-01-11,,,,
2000-01-12,-0.773318,-0.389485,-0.414758,0.594776


### Resampling with Periods

透過 Period 完成 Resampling。

In [90]:
frame = pd.DataFrame(np.random.randn(24, 4),
                     index=pd.period_range('1-2000', '12-2001', freq='M'),
                     columns=['Colorado', 'Texas', 'New York', 'Ohio'])
frame.head(5)

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01,0.525083,0.100503,-0.976826,-0.036054
2000-02,0.530083,1.940982,0.150244,-0.071071
2000-03,-0.263763,-0.119273,-0.193876,1.924442
2000-04,1.253598,0.687396,0.665929,1.004739
2000-05,-2.01788,-1.097395,0.296826,0.9623


In [91]:
annual_frame = frame.resample('A-DEC').mean()
annual_frame

Unnamed: 0,Colorado,Texas,New York,Ohio
2000,-0.536321,0.463805,0.038944,0.16917
2001,0.305868,0.035899,0.069055,-0.049398


In [92]:
annual_frame.resample('Q-DEC', convention='start').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q1,-0.536321,0.463805,0.038944,0.16917
2000Q2,-0.536321,0.463805,0.038944,0.16917
2000Q3,-0.536321,0.463805,0.038944,0.16917
2000Q4,-0.536321,0.463805,0.038944,0.16917
2001Q1,0.305868,0.035899,0.069055,-0.049398
2001Q2,0.305868,0.035899,0.069055,-0.049398
2001Q3,0.305868,0.035899,0.069055,-0.049398
2001Q4,0.305868,0.035899,0.069055,-0.049398
