# 时间序列的处理

在金融数据分析中，最经常遇到的数据就是金融时间序列

时间序列相较于普通的截面数据，有自身的一些特殊需求，如数据时间频率的转换，交易日非交易日的处理等等

pandas针对金融时间序列的特殊数据分析需求，专门开发了一系列用于时间处理的操作

但在学习这些操作之前，我们需要先了解，python是如何存储时间的。

## python的时间模块——datetime模块

在我们的数据库系统里，时间是一个很重要的组成部分，日常最方便的表示时间的方式就是8位字符串或者是8位数字

例如，2008年1月1日，可以用字符串表示为"20080101"，或用数字表示为20080101

但仅仅使用数字或字符串表示时间，仅仅具有 __区分功能__ ，而没有 __运算功能__

意思是可以通过字符串和数字的不同表示不同的时间，但是却无法进行时间的各种运算如时间偏移，频率转换

例如20080131之后一天，如果直接加1的话，得到的是20080132，不是一个合法的时间格式

然而自己写函数进行判断麻烦且没有必要，因为python通过datetime模块较好的解决了这个问题

首先我们import datetime模块

In [62]:
import datetime

使用datetime模块里的datetime类可以表示时间

In [63]:
dt = datetime.datetime(2008, 1, 1)

#### 注意是datetime模块里的datetime类，datetime.datetime才可以访问，很多初学者容易直接键入datetime

#### datetime是模块
#### datetime.datetime才是类

最简单的构造datetime的方式是通过构造函数（注意datetime.datetime的构造函数是支持输入小时和分钟，秒等的，但是我们基本上遇不到高频数据，所以在后面的分析中，我们不会接触到日频率以下的表示）

### 构建datetime对象及与时间字符串的转换

构建一个datetime对象可以通过以下几种方式：

1.直接调用datetime.datetime的构造函数

In [64]:
dt = datetime.datetime(2008, 1, 1)
dt

datetime.datetime(2008, 1, 1, 0, 0)

In [65]:
dt.year

2008

In [66]:
dt.month

1

In [67]:
dt.day

1

2.通过时间字符串格式化得到

In [68]:
dt_str = "20090101"
dt = datetime.datetime.strptime(dt_str, "%Y%m%d")
dt

datetime.datetime(2009, 1, 1, 0, 0)

%加字母用来表示时间格式  %Y表示4位数字年份，%m表示2位数字月份，%d表示2位数字的日期

%Y%m%d格式就会将前4位数字识别为年份，随后的2位数字识别为月份，再随后的2位数字识别为日期。  
如果字符串和格式不匹配就会报错

In [69]:
dt_str = "20092-01-01"
dt = datetime.datetime.strptime(dt_str, "%Y-%m-%d")
dt

ValueError: time data '20092-01-01' does not match format '%Y-%m-%d'

In [70]:
dt_str = "2009-01-31"
dt = datetime.datetime.strptime(dt_str, "%Y-%m-%d")
dt

datetime.datetime(2009, 1, 31, 0, 0)

当然除了这三个时间格式还有很多其他的时间格式

%y 两位数的年份表示（00-99） 

%Y 四位数的年份表示（0000-9999）

%m 月份（01-12）

%d 月内中的一天（00-31）

%H 24小时制小时数（00-23）

%I 12小时制小时数（01-12）

%M 分钟数（00-59）

%S 秒（00-59）

%a 本地简化星期名称 (Thu)

%A 本地完整星期名称 (Thursday)

%b 本地简化的月份名称 (Jan)

%B 本地完整的月份名称 (January)

%c 本地相应的日期表示和时间表示 (Thu Jan  1 00:00:00 2009)

%j 年内的一天（001-366）

%p 本地A.M.或P.M.的等价符

%U 一年中的星期数（00-53）星期天为星期的开始

%w 星期（0-6），星期天为星期的开始

%W 一年中的星期数（00-53）星期一为星期的开始

%x 本地相应的日期表示 (01/31/09)

%X 本地相应的时间表示 (00:00:00)

%Z 当前时区的名称 

当然大多数我们都用不到

datetime.datetime.strptime方法用来将字符串转换为datetime.datetime对象       
也可以使用datetime.datetime.strftime方法将datetime.datetime对象转换成一定格式的字符串    
这种字符串与datetime.datetime对象的互相转换是非常常用的   

In [71]:
dt

datetime.datetime(2009, 1, 31, 0, 0)

In [72]:
dt.strftime("%Y%m%d")

'20090131'

In [73]:
dt.strftime("%Y-%m-%d")

'2009-01-31'

In [74]:
dt.strftime("%Y-%m-%d %H:%M:%S")

'2009-01-31 00:00:00'

例如，我们有一个列表里，全是8位字符串（例如"20090101"）的时间，想转换成靠直线连接年月日的字符串（"2009-01-01"）

In [75]:
dt_str_list1 = ["20090101", "20090102", "20090103", "20090104", "20090105"]
dt_str_list2 = [datetime.datetime.strptime(dt_str, "%Y%m%d").strftime("%Y-%m-%d") for dt_str in dt_str_list1]
dt_str_list2

['2009-01-01', '2009-01-02', '2009-01-03', '2009-01-04', '2009-01-05']

### datetime对象的运算

对时间进行偏移运算，需要使用datetime.timedelta对象

In [76]:
dt

datetime.datetime(2009, 1, 31, 0, 0)

In [77]:
dt + datetime.timedelta(1)

datetime.datetime(2009, 2, 1, 0, 0)

In [78]:
dt + datetime.timedelta(5)

datetime.datetime(2009, 2, 5, 0, 0)

In [79]:
dt + datetime.timedelta(1)*2

datetime.datetime(2009, 2, 2, 0, 0)

## pandas对于时间的处理

In [80]:
import pandas as pd
import numpy as np

当将一个全为datetime.datetime对象的集合数据结构转换为pandas中的数据类型时（Index Series），pandas会自动识别，并将其转换为特定类型

例如

In [81]:
dt_list = [datetime.datetime(2008, 1, 1), datetime.datetime(2008, 1, 2), datetime.datetime(2008, 1, 3), datetime.datetime(2008, 1, 4)]

In [82]:
dt_list

[datetime.datetime(2008, 1, 1, 0, 0),
 datetime.datetime(2008, 1, 2, 0, 0),
 datetime.datetime(2008, 1, 3, 0, 0),
 datetime.datetime(2008, 1, 4, 0, 0)]

In [83]:
pd.Series(dt_list)

0   2008-01-01
1   2008-01-02
2   2008-01-03
3   2008-01-04
dtype: datetime64[ns]

注意这里的Series的dtype不是常见的int64 float64 object，而是专门用来存储时间的datetime64[ns]

In [84]:
pd.Index(dt_list)

DatetimeIndex(['2008-01-01', '2008-01-02', '2008-01-03', '2008-01-04'], dtype='datetime64[ns]', freq=None)

转换为Index则会自动识别时间并生成专用的DatetimeIndex，DatetimeIndex也是一种Index，Index有的属性和方法他都有，但是他还有Index没有的一些专门处理时间的属性和方法（实际上DatetimeIndex类是继承自Index类的，感兴趣的可以百度 python类的继承，不感兴趣可以无视）


当然有的时候是从一个全为时间字符串的集合数据结构进行转换的，pandas提供了to_datetime函数

In [85]:
dt_str_list1 = ["20090101", "20090102", "20090103", "20090104", "20090105"]
dt_str_list2 = ["2009-01-01", "2009-01-02", "2009-01-03", "2009-01-04", "2009-01-05"]
dt_str_ser1 = pd.Series(dt_str_list1)
dt_str_ser2 = pd.Series(dt_str_list2)

In [86]:
pd.to_datetime(dt_str_list1)

DatetimeIndex(['2009-01-01', '2009-01-02', '2009-01-03', '2009-01-04',
               '2009-01-05'],
              dtype='datetime64[ns]', freq=None)

In [87]:
pd.to_datetime(dt_str_list2)

DatetimeIndex(['2009-01-01', '2009-01-02', '2009-01-03', '2009-01-04',
               '2009-01-05'],
              dtype='datetime64[ns]', freq=None)

转list会自动生成DatetimeIndex（因为大多数情况下我们将时间转换成pandas里的数据结构，都是想用作Index，所以pd.to_datetime函数设计的时候就默认了生成的数据结构）

当然转Series就是

In [89]:
pd.to_datetime(dt_str_ser1)

0   2009-01-01
1   2009-01-02
2   2009-01-03
3   2009-01-04
4   2009-01-05
dtype: datetime64[ns]

In [90]:
pd.to_datetime(dt_str_ser2)

0   2009-01-01
1   2009-01-02
2   2009-01-03
3   2009-01-04
4   2009-01-05
dtype: datetime64[ns]

通过上面的例子我们可以发现，常用的"%Y%m%d"或"%Y-%m-%d"格式的时间字符串会被自动识别

如果想要转换特定格式的字符串，也可以传入format参数  
比如下面的例子里，我们将月份和天数调换顺序，默认的识别方式就无法识别了

In [91]:
dt_str_list3 = ["2009-01-01", "2009-02-01", "2009-03-01", "2009-04-01", "2009-05-01", "2009-31-01"]
pd.to_datetime(dt_str_list3)

ValueError: month must be in 1..12

指定了format之后就可以识别了

In [92]:
pd.to_datetime(dt_str_list3, format="%Y-%d-%m")

DatetimeIndex(['2009-01-01', '2009-01-02', '2009-01-03', '2009-01-04',
               '2009-01-05', '2009-01-31'],
              dtype='datetime64[ns]', freq=None)

需要注意的是，无论是DatetimeIndex还是dtype为datetime64[ns]的Series，这些容器里的元素已经不是datetime.datetime对象了，而是pandas里的TimeStamp对象

In [93]:
dt_list = [datetime.datetime(2008, 1, 1), datetime.datetime(2008, 1, 2), datetime.datetime(2008, 1, 3), datetime.datetime(2008, 1, 4)]
dt_ser = pd.Series(dt_list)

In [94]:
dt_list[0]

datetime.datetime(2008, 1, 1, 0, 0)

In [95]:
type(dt_list[0])

datetime.datetime

In [96]:
dt_ser[0]

Timestamp('2008-01-01 00:00:00')

In [97]:
type(dt_ser[0])

pandas._libs.tslibs.timestamps.Timestamp

pandas.TimeStamp是pandas在datetime.datetime对象的基础上，进行了一些包装，但是datetime.datetime有的属性和方法TimeStamp都有，如year属性，strftime方法等等。

In [98]:
time_stamp = dt_ser[0]
time_stamp

Timestamp('2008-01-01 00:00:00')

In [99]:
time_stamp.year

2008

In [100]:
time_stamp.strftime("%Y%m%d")

'20080101'

大家可以理解为TimeStamp和datetime.datetime基本没有区别，大家不要对此产生困惑，pandas不用原始的datetime.datetime，而是自己又建立了一个类TimeStamp的原因可能是出于内部实现的考虑，我们不需要知道细节。

pd.to_datetime的用法很广泛，接收的参数也不一定得是list或是Series，单独的时间字符串也是可以的

In [101]:
pd.to_datetime("20080101")

Timestamp('2008-01-01 00:00:00')

### DatetimeIndex的介绍

金融时间序列数据经常需要以时间作为index，pandas里专门提供了一种元素全为时间（TimeStamp）的index，即DatetimeIndex

构建DatetimeIndex的方法主要有两种  
1.直接调用DatetimeIndex的构造函数，或者使用to_datetime函数构建

In [102]:
dt_str_list = ["2009-01-01", "2009-01-02", "2009-01-03", "2009-01-04", "2009-01-05"]

In [103]:
pd.DatetimeIndex(dt_str_list)

DatetimeIndex(['2009-01-01', '2009-01-02', '2009-01-03', '2009-01-04',
               '2009-01-05'],
              dtype='datetime64[ns]', freq=None)

In [104]:
pd.to_datetime(dt_str_list)

DatetimeIndex(['2009-01-01', '2009-01-02', '2009-01-03', '2009-01-04',
               '2009-01-05'],
              dtype='datetime64[ns]', freq=None)

2.使用pd.date_range函数生成

给定开始时间和结束时间

In [105]:
pd.date_range("2008-01-01", "2008-01-05")

DatetimeIndex(['2008-01-01', '2008-01-02', '2008-01-03', '2008-01-04',
               '2008-01-05'],
              dtype='datetime64[ns]', freq='D')

给定开始时间和需要生成的DatetimeIndex长度

In [106]:
pd.date_range("2008-01-01", periods=6)

DatetimeIndex(['2008-01-01', '2008-01-02', '2008-01-03', '2008-01-04',
               '2008-01-05', '2008-01-06'],
              dtype='datetime64[ns]', freq='D')

可以用freq参数改变频率，常见的如"m"月份，"w-mon"周一等等

In [107]:
pd.date_range("2008-01-01", "2008-12-31", freq="m")

DatetimeIndex(['2008-01-31', '2008-02-29', '2008-03-31', '2008-04-30',
               '2008-05-31', '2008-06-30', '2008-07-31', '2008-08-31',
               '2008-09-30', '2008-10-31', '2008-11-30', '2008-12-31'],
              dtype='datetime64[ns]', freq='M')

In [108]:
pd.date_range("2008-01-01", "2008-12-31", freq="w-mon")

DatetimeIndex(['2008-01-07', '2008-01-14', '2008-01-21', '2008-01-28',
               '2008-02-04', '2008-02-11', '2008-02-18', '2008-02-25',
               '2008-03-03', '2008-03-10', '2008-03-17', '2008-03-24',
               '2008-03-31', '2008-04-07', '2008-04-14', '2008-04-21',
               '2008-04-28', '2008-05-05', '2008-05-12', '2008-05-19',
               '2008-05-26', '2008-06-02', '2008-06-09', '2008-06-16',
               '2008-06-23', '2008-06-30', '2008-07-07', '2008-07-14',
               '2008-07-21', '2008-07-28', '2008-08-04', '2008-08-11',
               '2008-08-18', '2008-08-25', '2008-09-01', '2008-09-08',
               '2008-09-15', '2008-09-22', '2008-09-29', '2008-10-06',
               '2008-10-13', '2008-10-20', '2008-10-27', '2008-11-03',
               '2008-11-10', '2008-11-17', '2008-11-24', '2008-12-01',
               '2008-12-08', '2008-12-15', '2008-12-22', '2008-12-29'],
              dtype='datetime64[ns]', freq='W-MON')

In [109]:
pd.date_range("2008-01-01", "2008-12-31", freq="w-thu")

DatetimeIndex(['2008-01-03', '2008-01-10', '2008-01-17', '2008-01-24',
               '2008-01-31', '2008-02-07', '2008-02-14', '2008-02-21',
               '2008-02-28', '2008-03-06', '2008-03-13', '2008-03-20',
               '2008-03-27', '2008-04-03', '2008-04-10', '2008-04-17',
               '2008-04-24', '2008-05-01', '2008-05-08', '2008-05-15',
               '2008-05-22', '2008-05-29', '2008-06-05', '2008-06-12',
               '2008-06-19', '2008-06-26', '2008-07-03', '2008-07-10',
               '2008-07-17', '2008-07-24', '2008-07-31', '2008-08-07',
               '2008-08-14', '2008-08-21', '2008-08-28', '2008-09-04',
               '2008-09-11', '2008-09-18', '2008-09-25', '2008-10-02',
               '2008-10-09', '2008-10-16', '2008-10-23', '2008-10-30',
               '2008-11-06', '2008-11-13', '2008-11-20', '2008-11-27',
               '2008-12-04', '2008-12-11', '2008-12-18', '2008-12-25'],
              dtype='datetime64[ns]', freq='W-THU')

freq可选的取值有如下，基本满足所需，如有特殊需要需自定义，较为麻烦，有兴趣可以问我（如每年1，4，7，10月的第二个交易日星期五之类的）

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### DatetimeIndex的索引

In [44]:
df = pd.DataFrame(np.random.randn(60, 5),
             index=pd.date_range("2010-01-01", periods=60),
             columns=["000001.SZ", "000002.SZ", "000003.SZ", "000004.SZ", "000005.SZ"])

In [45]:
df

Unnamed: 0,000001.SZ,000002.SZ,000003.SZ,000004.SZ,000005.SZ
2010-01-01,-0.655183,-0.321354,-0.307227,-1.269616,0.424147
2010-01-02,1.876814,-1.823527,0.504216,-1.301826,-0.271545
2010-01-03,0.387241,-0.575357,0.149791,-0.978833,-1.574962
2010-01-04,0.472071,0.482131,1.501535,0.052723,-1.268342
2010-01-05,1.817518,-0.129989,-0.076084,0.577732,0.162989
2010-01-06,-0.363124,-0.914696,-0.447371,-2.105871,-0.533368
2010-01-07,-0.004841,-0.36309,-0.888242,-0.240832,-1.215483
2010-01-08,0.607764,-0.612366,0.534068,-0.338073,2.307136
2010-01-09,0.533676,-0.233975,-0.62375,0.613589,-1.032501
2010-01-10,-0.745895,-0.126659,0.381362,1.577201,-1.758944


带有DatetimeIndex的对象索引是比较特殊的，因为pandas为了使用上的便捷性优化了索引方法

例如，我们想要选取2010年2月28日对应的这一行

例如，虽然我们的DatetimeIndex的元素实际上是一个个TimeStamp，但我们并没有必要先构造一个TimeStamp，再将这个TimeStamp传入进行索引（当然这样做是可以的）

In [46]:
dt_str = "2010-02-28"
dt_ts = pd.to_datetime(dt_str)
dt_ts

Timestamp('2010-02-28 00:00:00')

In [47]:
df.loc[dt_ts, :]

000001.SZ   -0.388485
000002.SZ   -0.824086
000003.SZ   -0.188841
000004.SZ   -0.606532
000005.SZ   -0.302887
Name: 2010-02-28 00:00:00, dtype: float64

事实上直接传入时间字符串就够了，在使用DatetimeIndex进行索引时，pandas会自动将我们输入的字符串转换成TimeStamp再进行索引。

In [48]:
df.loc["2010-02-28", :]

000001.SZ   -0.388485
000002.SZ   -0.824086
000003.SZ   -0.188841
000004.SZ   -0.606532
000005.SZ   -0.302887
Name: 2010-02-28 00:00:00, dtype: float64

时间序列还提供了特殊的索引方式，如选取2010年2月的数据可以按照如下选取，当然按照年份取也是可以的

In [49]:
df.loc["2010-02", :]

Unnamed: 0,000001.SZ,000002.SZ,000003.SZ,000004.SZ,000005.SZ
2010-02-01,-0.57414,0.500905,0.917342,-0.205741,0.847811
2010-02-02,-0.966259,1.308249,1.973468,0.807602,1.790949
2010-02-03,0.742764,-1.135275,0.817245,0.847522,0.817661
2010-02-04,0.264431,0.426858,-0.20721,-0.357112,0.009452
2010-02-05,-0.113237,0.515417,-0.3539,-0.250245,0.715823
2010-02-06,-0.027496,0.957665,0.287068,-0.608993,1.60315
2010-02-07,0.444012,0.357529,1.106892,-0.192699,-0.408744
2010-02-08,0.126679,-0.208966,-0.44589,-0.04653,-0.868133
2010-02-09,0.457656,-2.216723,0.214021,-0.626066,0.710364
2010-02-10,-0.474248,1.392622,0.625108,0.219291,0.47089


### DatetimeIndex的一些有用的属性和方法

In [50]:
dt_index = pd.date_range("2008-01-01", "2008-01-10")

In [51]:
dt_index

DatetimeIndex(['2008-01-01', '2008-01-02', '2008-01-03', '2008-01-04',
               '2008-01-05', '2008-01-06', '2008-01-07', '2008-01-08',
               '2008-01-09', '2008-01-10'],
              dtype='datetime64[ns]', freq='D')

In [52]:
dt_index.year

Int64Index([2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008], dtype='int64')

In [53]:
dt_index.month

Int64Index([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype='int64')

In [54]:
dt_index.day

Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype='int64')

In [55]:
dt_index.is_month_start

array([ True, False, False, False, False, False, False, False, False,
       False])

不一一列举，详细常用属性见下表：

![image.png](attachment:image.png)

### 用resample方法进行时间序列频率转换

我们先使用随机数生成一连串的随机收益率序列

In [56]:
# bdate_range生成的日期index和date_range最大的区别就是bdate_range不包括双休日
# 由于是模拟收益率，我们暂时没考虑交易日
return_df = pd.DataFrame(np.random.randn(700, 5),
             index=pd.bdate_range("2010-01-01", periods=700),
             columns=["000001.SZ", "000002.SZ", "000003.SZ", "000004.SZ", "000005.SZ"])/100

In [57]:
return_df

Unnamed: 0,000001.SZ,000002.SZ,000003.SZ,000004.SZ,000005.SZ
2010-01-01,0.011923,-0.006798,0.004778,0.003055,0.001436
2010-01-04,-0.015966,0.018609,0.017955,0.000695,-0.012176
2010-01-05,0.005992,-0.006899,-0.019676,-0.013499,-0.001734
2010-01-06,-0.006502,-0.004100,0.027375,-0.001137,-0.007803
2010-01-07,-0.006654,0.000966,-0.001974,0.008638,0.005917
2010-01-08,-0.010284,-0.009244,-0.006362,-0.003045,-0.001738
2010-01-11,-0.008316,0.003304,-0.009015,-0.001449,-0.000391
2010-01-12,0.001681,0.001795,0.005839,-0.005130,0.003725
2010-01-13,-0.006363,0.008805,-0.003120,0.002292,-0.000788
2010-01-14,0.002740,-0.012595,0.000643,-0.009280,0.025203


用return_df可以用cumprod函数得到净值(以2010年1月1日前一个交易日的净值为1)

In [58]:
nv_df = (1 + return_df).cumprod()
nv_df

Unnamed: 0,000001.SZ,000002.SZ,000003.SZ,000004.SZ,000005.SZ
2010-01-01,1.011923,0.993202,1.004778,1.003055,1.001436
2010-01-04,0.995767,1.011685,1.022818,1.003753,0.989243
2010-01-05,1.001734,1.004706,1.002693,0.990203,0.987528
2010-01-06,0.995220,1.000587,1.030142,0.989078,0.979822
2010-01-07,0.988598,1.001553,1.028109,0.997621,0.985620
2010-01-08,0.978432,0.992295,1.021568,0.994583,0.983908
2010-01-11,0.970295,0.995573,1.012359,0.993142,0.983522
2010-01-12,0.971926,0.997360,1.018270,0.988047,0.987186
2010-01-13,0.965742,1.006141,1.015093,0.990312,0.986408
2010-01-14,0.968389,0.993469,1.015746,0.981121,1.011269


将净值数据由日频转换为月频，我们通常需要以每个月最后一天净值作为该月净值就可以了

In [59]:
nv_df.resample("m").last()

Unnamed: 0,000001.SZ,000002.SZ,000003.SZ,000004.SZ,000005.SZ
2010-01-31,0.96828,1.013085,1.023747,0.965107,0.995436
2010-02-28,0.899993,1.014233,0.969755,0.96013,1.007542
2010-03-31,0.969803,0.971505,1.018302,0.901023,1.039641
2010-04-30,0.973192,0.902567,1.096964,0.910085,1.106918
2010-05-31,0.939599,0.841835,1.061787,0.993191,1.002989
2010-06-30,0.908706,0.872803,0.934434,0.972775,0.970913
2010-07-31,0.862549,0.920502,0.892108,0.994319,0.879233
2010-08-31,0.862082,0.909,0.982592,0.996697,0.907708
2010-09-30,0.901177,0.894078,0.953046,1.036889,0.875995
2010-10-31,0.874573,0.901366,0.980267,0.960864,0.849154


将收益率由日频转换为月频，我们通常需要计算每个月的收益率和，即 (1+r1)(1+r2)(1+r3)...(1+r31)-1

In [60]:
(1+return_df).resample("m").prod()-1

Unnamed: 0,000001.SZ,000002.SZ,000003.SZ,000004.SZ,000005.SZ
2010-01-31,-0.03172,0.013085,0.023747,-0.034893,-0.004564
2010-02-28,-0.070524,0.001132,-0.05274,-0.005157,0.012162
2010-03-31,0.077568,-0.042128,0.050061,-0.061561,0.031859
2010-04-30,0.003494,-0.07096,0.077249,0.010057,0.064713
2010-05-31,-0.034518,-0.067288,-0.032068,0.091317,-0.093891
2010-06-30,-0.032879,0.036787,-0.119941,-0.020556,-0.031981
2010-07-31,-0.050794,0.05465,-0.045296,0.022147,-0.094427
2010-08-31,-0.000541,-0.012496,0.101427,0.002392,0.032386
2010-09-30,0.04535,-0.016416,-0.030069,0.040325,-0.034937
2010-10-31,-0.029522,0.008152,0.028562,-0.07332,-0.030641


大家会发现这个和groupby的用法很像，除了我们刚刚用的last和sum，实际上还有其他很多可选方法

![image.png](attachment:image.png)

当然，他也可以接收apply方法，用于接收自定义函数

依旧使用我们常用的小技巧，在传入apply的函数里，加上print(),观看resample究竟传入了什么

In [111]:
def myfun(x):
    print(type(x), )
    print(x, "\n\n")
    return 1
nv_df.resample("m").apply(myfun)

<class 'pandas.core.series.Series'>
2010-01-01    1.011923
2010-01-04    0.995767
2010-01-05    1.001734
2010-01-06    0.995220
2010-01-07    0.988598
2010-01-08    0.978432
2010-01-11    0.970295
2010-01-12    0.971926
2010-01-13    0.965742
2010-01-14    0.968389
2010-01-15    0.959405
2010-01-18    0.963959
2010-01-19    0.967649
2010-01-20    0.967309
2010-01-21    0.968756
2010-01-22    0.969487
2010-01-25    0.974736
2010-01-26    0.976396
2010-01-27    0.966264
2010-01-28    0.965948
2010-01-29    0.968280
Name: 000001.SZ, dtype: float64 


<class 'pandas.core.series.Series'>
2010-02-01    0.959901
2010-02-02    0.967313
2010-02-03    0.969284
2010-02-04    0.950945
2010-02-05    0.951262
2010-02-08    0.931572
2010-02-09    0.943808
2010-02-10    0.942223
2010-02-11    0.944740
2010-02-12    0.925727
2010-02-15    0.922469
2010-02-16    0.920297
2010-02-17    0.913297
2010-02-18    0.910871
2010-02-19    0.913013
2010-02-22    0.904233
2010-02-23    0.903352
2010-02-24    0.894

Unnamed: 0,000001.SZ,000002.SZ,000003.SZ,000004.SZ,000005.SZ
2010-01-31,1.0,1.0,1.0,1.0,1.0
2010-02-28,1.0,1.0,1.0,1.0,1.0
2010-03-31,1.0,1.0,1.0,1.0,1.0
2010-04-30,1.0,1.0,1.0,1.0,1.0
2010-05-31,1.0,1.0,1.0,1.0,1.0
2010-06-30,1.0,1.0,1.0,1.0,1.0
2010-07-31,1.0,1.0,1.0,1.0,1.0
2010-08-31,1.0,1.0,1.0,1.0,1.0
2010-09-30,1.0,1.0,1.0,1.0,1.0
2010-10-31,1.0,1.0,1.0,1.0,1.0


大家发现实际上他将每列按照时间频率进行切割成Series,再进行输入,编写适当的函数就可以实现自定义操作