# 窗口计算

DataFrame对象的rolling方法允许我们将数据置于窗口中，然后就可以使用函数对窗口中的数据进行运算和处理。例如，我们获取了某只股票近期的数据，想制作5日均线和10日均线，那么就需要先设置窗口再进行运算。我们可以使用三方库pandas-datareader来获取指定的股票在某个时间段内的数据，具体的操作如下所示。

In [1]:
#安装pandas-datareader三方库
pip install pandas-datareader

SyntaxError: invalid syntax (<ipython-input-1-4fb999c270af>, line 2)

In [2]:
import pandas as pd
import matplotlib as plt

In [5]:
#通过pandas-datareader 提供的get_data_stooq从 Stooq 网站获取百度（股票代码：BIDU）近期股票数据。
import pandas_datareader as pdr
baidu_df = pdr.get_data_stooq('BIDU', start='2021-11-22', end = '2022-11-27')

In [6]:
baidu_df

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2022-11-25,93.51,95.410,93.11,93.36,1991505
2022-11-23,96.16,98.680,95.27,97.00,3442465
2022-11-22,93.04,97.945,91.88,95.08,4005018
2022-11-21,94.00,95.565,92.94,94.56,2669885
2022-11-18,95.61,97.960,95.00,95.97,2954754
...,...,...,...,...,...
2021-11-29,153.00,153.000,148.80,150.29,3745624
2021-11-26,148.52,154.450,147.89,153.06,3267102
2021-11-24,148.93,151.580,146.89,151.39,2663238
2021-11-23,147.90,151.620,147.76,150.49,3376812


In [7]:
baidu_df.sort_index(inplace=True)
baidu_df

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-11-22,152.00,152.470,146.89,147.81,4423414
2021-11-23,147.90,151.620,147.76,150.49,3376812
2021-11-24,148.93,151.580,146.89,151.39,2663238
2021-11-26,148.52,154.450,147.89,153.06,3267102
2021-11-29,153.00,153.000,148.80,150.29,3745624
...,...,...,...,...,...
2022-11-18,95.61,97.960,95.00,95.97,2954754
2022-11-21,94.00,95.565,92.94,94.56,2669885
2022-11-22,93.04,97.945,91.88,95.08,4005018
2022-11-23,96.16,98.680,95.27,97.00,3442465


上面的DataFrame有Open、High、Low、Close、Volume五个列，分别代码股票的开盘价、最高价、最低价、收盘价和成交量，接下来我们对百度的股票数据进行窗口计算。

In [8]:
baidu_df['Open']

Date
2021-11-22    152.00
2021-11-23    147.90
2021-11-24    148.93
2021-11-26    148.52
2021-11-29    153.00
               ...  
2022-11-18     95.61
2022-11-21     94.00
2022-11-22     93.04
2022-11-23     96.16
2022-11-25     93.51
Name: Open, Length: 255, dtype: float64

In [9]:
baidu_df.Close.rolling(5).mean()

Date
2021-11-22        NaN
2021-11-23        NaN
2021-11-24        NaN
2021-11-26        NaN
2021-11-29    150.608
               ...   
2022-11-18     96.290
2022-11-21     96.912
2022-11-22     95.988
2022-11-23     96.438
2022-11-25     95.194
Name: Close, Length: 255, dtype: float64

In [8]:
baidu_df.rolling(5).mean()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-11-22,,,,,
2021-11-23,,,,,
2021-11-24,,,,,
2021-11-26,,,,,
2021-11-29,150.070,152.62400,147.6460,150.608,3495238.0
...,...,...,...,...,...
2022-11-18,94.926,98.26478,93.2205,96.290,3697115.8
2022-11-21,95.490,98.81378,93.9145,96.912,3668509.8
2022-11-22,94.572,98.04000,93.0545,95.988,3244204.4
2022-11-23,94.066,98.00800,93.3005,96.438,3339233.2


baidu_df[['Open','High','Low','Close']].plot(logy=True)
baidu_df[['Open','High','Low','Close']].rolling(5).mean().plot(logy=True)

### 相关性判定

我们用  值判断指标的相关性时遵循以下两个步骤。



$$
\rho = \frac {\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})} {\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2} \sqrt{\sum_{i=1}^{n}(Y_i - \bar{Y})^2}}
$$

我们用 $\rho$ 值判断指标的相关性时遵循以下两个步骤。

1. 判断指标间是正相关、负相关，还是不相关。
    - 当 $ \rho \gt 0 $，认为变量之间是正相关，也就是两者的趋势一致。
    - 当 $ \rho \lt 0 $，认为变量之间是负相关，也就是两者的趋势相反。
    - 当 $ \rho = 0 $，认为变量之间是不相关的，但并不代表两个指标是统计独立的。
2. 判断指标间的相关程度。
    - 当 $ \rho $ 的绝对值在 $ [0.6,1] $ 之间，认为变量之间是强相关的。
    - 当 $ \rho $ 的绝对值在 $ [0.1,0.6) $ 之间，认为变量之间是弱相关的。
    - 当 $ \rho $ 的绝对值在 $ [0,0.1) $ 之间，认为变量之间没有相关性。

皮尔逊相关系数适用于：

 1. 两个变量之间是线性关系，都是连续数据。
 2. 两个变量的总体是正态分布，或接近正态的单峰分布。
 3. 两个变量的观测值是成对的，每对观测值之间相互独立。

`DataFrame`对象的`cov`方法和`corr`方法分别用于计算协方差和相关系数，`corr`方法的第一个参数`method`的默认值是`pearson`，表示计算皮尔逊相关系数；除此之外，还可以指定`kendall`或`spearman`来获得肯德尔系数或斯皮尔曼等级相关系数。

接下来，我们从名为`boston_house_price.csv`的文件中获取著名的[波士顿房价数据集](https://www.heywhale.com/mw/dataset/590bd595812ede32b73f55f2)来创建一个`DataFrame`，我们通过`corr`方法计算可能影响房价的`13`个因素中，哪些跟房价是正相关或负相关的，代码如下所示。

```Python
boston_df = pd.read_csv('data/csv/boston_house_price.csv')
boston_df.corr()

In [15]:
boston_df = pd.read_csv('res/boston_house_price.csv')

In [16]:
boston_df.corr()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,PRICE
CRIM,1.0,-0.200469,0.406583,-0.055892,0.420972,-0.219247,0.352734,-0.37967,0.625505,0.582764,0.289946,-0.385064,0.455621,-0.388305
ZN,-0.200469,1.0,-0.533828,-0.042697,-0.516604,0.311991,-0.569537,0.664408,-0.311948,-0.314563,-0.391679,0.17552,-0.412995,0.360445
INDUS,0.406583,-0.533828,1.0,0.062938,0.763651,-0.391676,0.644779,-0.708027,0.595129,0.72076,0.383248,-0.356977,0.6038,-0.483725
CHAS,-0.055892,-0.042697,0.062938,1.0,0.091203,0.091251,0.086518,-0.099176,-0.007368,-0.035587,-0.121515,0.048788,-0.053929,0.17526
NOX,0.420972,-0.516604,0.763651,0.091203,1.0,-0.302188,0.73147,-0.76923,0.611441,0.668023,0.188933,-0.380051,0.590879,-0.427321
RM,-0.219247,0.311991,-0.391676,0.091251,-0.302188,1.0,-0.240265,0.205246,-0.209847,-0.292048,-0.355501,0.128069,-0.613808,0.69536
AGE,0.352734,-0.569537,0.644779,0.086518,0.73147,-0.240265,1.0,-0.747881,0.456022,0.506456,0.261515,-0.273534,0.602339,-0.376955
DIS,-0.37967,0.664408,-0.708027,-0.099176,-0.76923,0.205246,-0.747881,1.0,-0.494588,-0.534432,-0.232471,0.291512,-0.496996,0.249929
RAD,0.625505,-0.311948,0.595129,-0.007368,0.611441,-0.209847,0.456022,-0.494588,1.0,0.910228,0.464741,-0.444413,0.488676,-0.381626
TAX,0.582764,-0.314563,0.72076,-0.035587,0.668023,-0.292048,0.506456,-0.534432,0.910228,1.0,0.460853,-0.441808,0.543993,-0.468536


In [40]:
boston_df.corr('spearman')

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,PRICE
CRIM,1.0,-0.57166,0.735524,0.041537,0.821465,-0.309116,0.70414,-0.744986,0.727807,0.729045,0.465283,-0.360555,0.63476,-0.558891
ZN,-0.57166,1.0,-0.642811,-0.041937,-0.634828,0.361074,-0.544423,0.614627,-0.278767,-0.371394,-0.448475,0.163135,-0.490074,0.438179
INDUS,0.735524,-0.642811,1.0,0.089841,0.791189,-0.415301,0.679487,-0.75708,0.455507,0.664361,0.43371,-0.28584,0.638747,-0.578255
CHAS,0.041537,-0.041937,0.089841,1.0,0.068426,0.058813,0.067792,-0.080248,0.024579,-0.044486,-0.136065,-0.03981,-0.050575,0.140612
NOX,0.821465,-0.634828,0.791189,0.068426,1.0,-0.310344,0.795153,-0.880015,0.586429,0.649527,0.391309,-0.296662,0.636828,-0.562609
RM,-0.309116,0.361074,-0.415301,0.058813,-0.310344,1.0,-0.278082,0.263168,-0.107492,-0.271898,-0.312923,0.05366,-0.640832,0.633576
AGE,0.70414,-0.544423,0.679487,0.067792,0.795153,-0.278082,1.0,-0.80161,0.417983,0.526366,0.355384,-0.228022,0.657071,-0.547562
DIS,-0.744986,0.614627,-0.75708,-0.080248,-0.880015,0.263168,-0.80161,1.0,-0.495806,-0.574336,-0.322041,0.249595,-0.564262,0.445857
RAD,0.727807,-0.278767,0.455507,0.024579,0.586429,-0.107492,0.417983,-0.495806,1.0,0.704876,0.31833,-0.282533,0.394322,-0.346776
TAX,0.729045,-0.371394,0.664361,-0.044486,0.649527,-0.271898,0.526366,-0.574336,0.704876,1.0,0.453345,-0.329843,0.534423,-0.562411


在 Notebook 或 JupyterLab 中，我们可以为PRICE列添加渐变色，用颜色直观的展示出跟房价负相关、正相关、不相关的列，DataFrame对象style属性的background_gradient方法可以完成这个操作，代码如下所示。

In [17]:
boston_df.corr('spearman').style.background_gradient('RdYlBu', subset=['PRICE'])

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,PRICE
CRIM,1.0,-0.57166,0.735524,0.041537,0.821465,-0.309116,0.70414,-0.744986,0.727807,0.729045,0.465283,-0.360555,0.63476,-0.558891
ZN,-0.57166,1.0,-0.642811,-0.041937,-0.634828,0.361074,-0.544423,0.614627,-0.278767,-0.371394,-0.448475,0.163135,-0.490074,0.438179
INDUS,0.735524,-0.642811,1.0,0.089841,0.791189,-0.415301,0.679487,-0.75708,0.455507,0.664361,0.43371,-0.28584,0.638747,-0.578255
CHAS,0.041537,-0.041937,0.089841,1.0,0.068426,0.058813,0.067792,-0.080248,0.024579,-0.044486,-0.136065,-0.03981,-0.050575,0.140612
NOX,0.821465,-0.634828,0.791189,0.068426,1.0,-0.310344,0.795153,-0.880015,0.586429,0.649527,0.391309,-0.296662,0.636828,-0.562609
RM,-0.309116,0.361074,-0.415301,0.058813,-0.310344,1.0,-0.278082,0.263168,-0.107492,-0.271898,-0.312923,0.05366,-0.640832,0.633576
AGE,0.70414,-0.544423,0.679487,0.067792,0.795153,-0.278082,1.0,-0.80161,0.417983,0.526366,0.355384,-0.228022,0.657071,-0.547562
DIS,-0.744986,0.614627,-0.75708,-0.080248,-0.880015,0.263168,-0.80161,1.0,-0.495806,-0.574336,-0.322041,0.249595,-0.564262,0.445857
RAD,0.727807,-0.278767,0.455507,0.024579,0.586429,-0.107492,0.417983,-0.495806,1.0,0.704876,0.31833,-0.282533,0.394322,-0.346776
TAX,0.729045,-0.371394,0.664361,-0.044486,0.649527,-0.271898,0.526366,-0.574336,0.704876,1.0,0.453345,-0.329843,0.534423,-0.562411


上面代码中的RdYlBu代表的颜色如下所示，相关系数的数据值越接近1，颜色越接近红色；数据值越接近1，颜色越接近蓝色；数据值在0附件则是黄色。

In [18]:
plt.get_cmap('RdYlBu')

AttributeError: module 'matplotlib' has no attribute 'get_cmap'

## Index的应用

我们再来看看Index类型，它为Series和DataFrame对象提供了索引服务，常用的Index有以下几种。

### 范围索引（RangeIndex）

In [20]:
import numpy as np

In [21]:
sales_data = np.random.randint(400, 1000, 12)
month_index = pd.RangeIndex(1, 13, name = '月份')
ser = pd.Series(data =sales_data, index=month_index)
ser

月份
1     590
2     723
3     759
4     612
5     639
6     778
7     827
8     742
9     847
10    433
11    561
12    438
dtype: int32

In [22]:
month_index

RangeIndex(start=1, stop=13, step=1, name='月份')

### 分类索引（CategoricalIndex）

In [32]:
cate_index

CategoricalIndex(['苹果', '香蕉', '苹果', '桃子', '香蕉'], categories=['苹果', '香蕉', '桃子'], ordered=True, dtype='category')

In [23]:
cate_index = pd.CategoricalIndex(['苹果', '香蕉', '苹果','苹果', '桃子', '香蕉'],
                                ordered=True,
                                categories=['苹果','香蕉','桃子'])
amount = np.random.randint(6, 11, 6 )
ser = pd.Series(data=amount, index=cate_index)
ser

苹果    9
香蕉    7
苹果    8
苹果    8
桃子    8
香蕉    7
dtype: int32

In [35]:
ser.groupby(level=0).sum()

苹果    27
香蕉    19
桃子     7
dtype: int32

#### 多级索引（MultiIndex）

In [27]:
np.arange(1001, 1006)

array([1001, 1002, 1003, 1004, 1005])

In [29]:
index

MultiIndex([(1001, '期中'),
            (1001, '期末'),
            (1002, '期中'),
            (1002, '期末'),
            (1003, '期中'),
            (1003, '期末'),
            (1004, '期中'),
            (1004, '期末'),
            (1005, '期中'),
            (1005, '期末')],
           names=['学号', '学期'])

In [28]:
ids = np.arange(1001, 1006)
sms = ['期中', '期末']
index = pd.MultiIndex.from_product((ids, sms), names=['学号','学期'])
courses = ['语文','数学', '英语']
scores = np.random.randint(60, 101, (10, 3))
df = pd.DataFrame(data = scores, columns = courses, index = index)

In [30]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,语文,数学,英语
学号,学期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1001,期中,75,80,86
1001,期末,68,70,63
1002,期中,100,65,60
1002,期末,62,76,66
1003,期中,77,89,66
1003,期末,94,68,93
1004,期中,92,60,85
1004,期末,84,65,65
1005,期中,91,77,85
1005,期末,99,84,86


说明：上面的代码使用了MultiIndex的类方法from_product，该方法通过ids和sms两组数据的笛卡尔积构造了多级索引。

In [32]:
for i in df.groupby(level=0):
    print(i)

(1001,          语文  数学  英语
学号   学期            
1001 期中  75  80  86
     期末  68  70  63)
(1002,           语文  数学  英语
学号   学期             
1002 期中  100  65  60
     期末   62  76  66)
(1003,          语文  数学  英语
学号   学期            
1003 期中  77  89  66
     期末  94  68  93)
(1004,          语文  数学  英语
学号   学期            
1004 期中  92  60  85
     期末  84  65  65)
(1005,          语文  数学  英语
学号   学期            
1005 期中  91  77  85
     期末  99  84  86)


In [48]:
#计算每个学生的成绩，期中站25%，期末占75%.加权平均
df.groupby(level=0).agg(lambda x: x.values[0] * 0.25 +x.values[1] *0.75)

Unnamed: 0_level_0,语文,数学,英语
学号,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1001,89.25,85.5,71.0
1002,66.25,85.0,92.0
1003,79.5,86.5,75.0
1004,83.25,95.25,74.75
1005,70.0,79.25,76.5


#### 日期时间索引（DatetimeIndex）

通过date_range()函数，我们可以创建日期时间索引，代码如下所示。

In [33]:
pd.date_range('2021-1-1', '2021-6-1', periods=10)

DatetimeIndex(['2021-01-01 00:00:00', '2021-01-17 18:40:00',
               '2021-02-03 13:20:00', '2021-02-20 08:00:00',
               '2021-03-09 02:40:00', '2021-03-25 21:20:00',
               '2021-04-11 16:00:00', '2021-04-28 10:40:00',
               '2021-05-15 05:20:00', '2021-06-01 00:00:00'],
              dtype='datetime64[ns]', freq=None)

In [34]:
pd.date_range('2021-1-1', '2021-6-1', freq='W')

DatetimeIndex(['2021-01-03', '2021-01-10', '2021-01-17', '2021-01-24',
               '2021-01-31', '2021-02-07', '2021-02-14', '2021-02-21',
               '2021-02-28', '2021-03-07', '2021-03-14', '2021-03-21',
               '2021-03-28', '2021-04-04', '2021-04-11', '2021-04-18',
               '2021-04-25', '2021-05-02', '2021-05-09', '2021-05-16',
               '2021-05-23', '2021-05-30'],
              dtype='datetime64[ns]', freq='W-SUN')

通过DateOffset类型，我们可以设置时间差并和DatetimeIndex进行运算，具体的操作如下所示。

In [37]:
pd.DateOffset(days=2)

<DateOffset: days=2>

In [41]:
index

DatetimeIndex(['2021-01-03', '2021-01-10', '2021-01-17', '2021-01-24',
               '2021-01-31', '2021-02-07', '2021-02-14', '2021-02-21',
               '2021-02-28', '2021-03-07', '2021-03-14', '2021-03-21',
               '2021-03-28', '2021-04-04', '2021-04-11', '2021-04-18',
               '2021-04-25', '2021-05-02', '2021-05-09', '2021-05-16',
               '2021-05-23', '2021-05-30'],
              dtype='datetime64[ns]', freq='W-SUN')

In [40]:
index = pd.date_range('2021-1-1', '2021-6-1', freq='W')
index - pd.DateOffset(days=2)

DatetimeIndex(['2021-01-01', '2021-01-08', '2021-01-15', '2021-01-22',
               '2021-01-29', '2021-02-05', '2021-02-12', '2021-02-19',
               '2021-02-26', '2021-03-05', '2021-03-12', '2021-03-19',
               '2021-03-26', '2021-04-02', '2021-04-09', '2021-04-16',
               '2021-04-23', '2021-04-30', '2021-05-07', '2021-05-14',
               '2021-05-21', '2021-05-28'],
              dtype='datetime64[ns]', freq=None)

In [42]:
index + pd.DateOffset(days=2)

DatetimeIndex(['2021-01-05', '2021-01-12', '2021-01-19', '2021-01-26',
               '2021-02-02', '2021-02-09', '2021-02-16', '2021-02-23',
               '2021-03-02', '2021-03-09', '2021-03-16', '2021-03-23',
               '2021-03-30', '2021-04-06', '2021-04-13', '2021-04-20',
               '2021-04-27', '2021-05-04', '2021-05-11', '2021-05-18',
               '2021-05-25', '2021-06-01'],
              dtype='datetime64[ns]', freq=None)

可以使用DatatimeIndex类型的相关方法来处理数据，具体包括：

shift()方法：通过时间前移或后移数据，我们仍然以上面百度股票数据为例，代码如下所示。

In [43]:
baidu_df.shift(3, fill_value=9)

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-11-22,9.00,9.0000,9.0000,9.00,9
2021-11-23,9.00,9.0000,9.0000,9.00,9
2021-11-24,9.00,9.0000,9.0000,9.00,9
2021-11-26,152.00,152.4700,146.8900,147.81,4423414
2021-11-29,147.90,151.6200,147.7600,150.49,3376812
...,...,...,...,...,...
2022-11-18,97.63,101.8139,96.1800,99.70,6126545
2022-11-21,98.69,98.8400,94.0400,94.75,2967321
2022-11-22,91.52,99.8900,91.4125,99.58,3624044
2022-11-23,95.61,97.9600,95.0000,95.97,2954754


In [44]:
baidu_df.shift(-1, fill_value=0)

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-11-22,147.90,151.620,147.76,150.49,3376812
2021-11-23,148.93,151.580,146.89,151.39,2663238
2021-11-24,148.52,154.450,147.89,153.06,3267102
2021-11-26,153.00,153.000,148.80,150.29,3745624
2021-11-29,149.12,151.450,147.01,149.84,4815568
...,...,...,...,...,...
2022-11-18,94.00,95.565,92.94,94.56,2669885
2022-11-21,93.04,97.945,91.88,95.08,4005018
2022-11-22,96.16,98.680,95.27,97.00,3442465
2022-11-23,93.51,95.410,93.11,93.36,1991505


asfreq()方法：指定一个时间频率抽取对应的数据，代码如下所示。

In [45]:
baidu_df.asfreq('5D')#每五天抽取一个数据

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-11-22,152.00,152.470,146.8900,147.81,4423414.0
2021-11-27,,,,,
2021-12-02,148.40,151.550,145.2000,148.96,4979653.0
2021-12-07,150.18,151.610,147.1000,149.89,4327345.0
2021-12-12,,,,,
...,...,...,...,...,...
2022-11-02,80.27,81.250,77.8400,78.13,4342411.0
2022-11-07,87.70,88.700,84.1450,85.02,3040988.0
2022-11-12,,,,,
2022-11-17,91.52,99.890,91.4125,99.58,3624044.0


In [78]:
baidu_df.asfreq('5D', method='ffill')

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-11-22,152.00,152.470,146.8900,147.81,4423414
2021-11-27,148.52,154.450,147.8900,153.06,3267102
2021-12-02,148.40,151.550,145.2000,148.96,4979653
2021-12-07,150.18,151.610,147.1000,149.89,4327345
2021-12-12,148.00,150.560,147.3100,149.34,2932520
...,...,...,...,...,...
2022-11-02,80.27,81.250,77.8400,78.13,4342411
2022-11-07,87.70,88.700,84.1450,85.02,3040988
2022-11-12,88.76,89.920,86.5000,89.46,3587437
2022-11-17,91.52,99.890,91.4125,99.58,3624044


resample()方法：基于时间对数据进行重采样，相当于根据时间周期对数据进行了分组操作，代码如下所示。

In [80]:
baidu_df.resample('1M').mean()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-11-30,149.911667,152.428333,147.54,150.48,3715293.0
2021-12-31,143.122955,146.941814,140.987927,144.595455,3856177.0
2022-01-31,151.83425,155.50147,148.75563,152.183,3498542.0
2022-02-28,157.680263,161.643947,155.390863,158.938947,2688915.0
2022-03-31,143.395217,148.291517,138.510143,142.973043,6411250.0
2022-04-30,130.03525,132.49225,126.30183,128.803,3579267.0
2022-05-31,121.388571,124.888419,118.335552,121.821429,3322147.0
2022-06-30,145.988095,148.762329,143.06691,145.682857,3442716.0
2022-07-31,143.9165,146.410655,140.96503,144.106,2078316.0
2022-08-31,137.376087,140.525,134.869565,137.872174,2556926.0


上面的代码中，W表示一周，5D表示5天，1M表示1个月。

#### 时区转换

In [83]:
import pytz
pytz.common_timezones

['Africa/Abidjan', 'Africa/Accra', 'Africa/Addis_Ababa', 'Africa/Algiers', 'Africa/Asmara', 'Africa/Bamako', 'Africa/Bangui', 'Africa/Banjul', 'Africa/Bissau', 'Africa/Blantyre', 'Africa/Brazzaville', 'Africa/Bujumbura', 'Africa/Cairo', 'Africa/Casablanca', 'Africa/Ceuta', 'Africa/Conakry', 'Africa/Dakar', 'Africa/Dar_es_Salaam', 'Africa/Djibouti', 'Africa/Douala', 'Africa/El_Aaiun', 'Africa/Freetown', 'Africa/Gaborone', 'Africa/Harare', 'Africa/Johannesburg', 'Africa/Juba', 'Africa/Kampala', 'Africa/Khartoum', 'Africa/Kigali', 'Africa/Kinshasa', 'Africa/Lagos', 'Africa/Libreville', 'Africa/Lome', 'Africa/Luanda', 'Africa/Lubumbashi', 'Africa/Lusaka', 'Africa/Malabo', 'Africa/Maputo', 'Africa/Maseru', 'Africa/Mbabane', 'Africa/Mogadishu', 'Africa/Monrovia', 'Africa/Nairobi', 'Africa/Ndjamena', 'Africa/Niamey', 'Africa/Nouakchott', 'Africa/Ouagadougou', 'Africa/Porto-Novo', 'Africa/Sao_Tome', 'Africa/Tripoli', 'Africa/Tunis', 'Africa/Windhoek', 'America/Adak', 'America/Anchorage', 'Amer

tz_localize()方法：将日期时间本地化。

In [85]:
baidu_df = baidu_df.tz_localize('Asia/Chongqing')
baidu_df

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-11-22 00:00:00+08:00,152.00,152.470,146.89,147.81,4423414
2021-11-23 00:00:00+08:00,147.90,151.620,147.76,150.49,3376812
2021-11-24 00:00:00+08:00,148.93,151.580,146.89,151.39,2663238
2021-11-26 00:00:00+08:00,148.52,154.450,147.89,153.06,3267102
2021-11-29 00:00:00+08:00,153.00,153.000,148.80,150.29,3745624
...,...,...,...,...,...
2022-11-18 00:00:00+08:00,95.61,97.960,95.00,95.97,2954754
2022-11-21 00:00:00+08:00,94.00,95.565,92.94,94.56,2669885
2022-11-22 00:00:00+08:00,93.04,97.945,91.88,95.08,4005018
2022-11-23 00:00:00+08:00,96.16,98.680,95.27,97.00,3442465


tz_convert()方法：转换时区。

In [None]:
baidu_df.tz_covert('America')