# Tick 转 Bar

## 资料参考

Bilibili （B站）上的高国乐视频讲解的很不错，地址在：[CTP K线合成推送1](https://www.bilibili.com/video/BV1Mt4y1Q7pe)。

## 转换

Tick 转 Bar 可以很简单，利用 pandas 的 resample 就可以了。前提是 DataFrame 的 index 是 DateTimeIndex 格式。

如：Tick 转 Bar 可以很简单，利用 pandas 的 resample 就可以了。前提是 DataFrame 的 index 是 DateTimeIndex 格式。

如：

In [1]:
from pathlib import Path

import pandas as pd

from src.utility import DATA_PATH


# 定义数据文件
data_file: str = 'SHFE.al2111_Tick.csv'

# 转为Path格式。
data_path: Path = DATA_PATH.joinpath(data_file)

# 加载tick数据为DataFrame。
# 【parse_dates=['datetime']】表示在加载时将 datetime 字段解读为 datetime64 格式。
# 【index_col=['datetime']】表示在加载时将 datetime 字段作为 DateTimeIndex。
df: pd.DataFrame = pd.read_csv(data_path, parse_dates=['datetime'], index_col=['datetime'])

# 显示 df。
print(df)

                                  datetime_nano  last_price  highest   lowest  \
datetime                                                                        
2021-06-30 18:51:42.300000  1625050302300000000         NaN      NaN      NaN   
2021-06-30 20:59:00.500000  1625057940500000000     18780.0  18780.0  18780.0   
2021-06-30 21:00:00.500000  1625058000500000000     18780.0  18780.0  18780.0   
2021-06-30 21:00:01.000000  1625058001000000000     18780.0  18780.0  18780.0   
2021-06-30 21:00:01.500000  1625058001500000000     18780.0  18785.0  18780.0   
...                                         ...         ...      ...      ...   
2021-09-25 00:59:58.000000  1632502798000000000     22945.0  23150.0  22855.0   
2021-09-25 00:59:58.500000  1632502798500000000     22945.0  23150.0  22855.0   
2021-09-25 00:59:59.000000  1632502799000000000     22945.0  23150.0  22855.0   
2021-09-25 00:59:59.500000  1632502799500000000     22945.0  23150.0  22855.0   
2021-09-25 00:59:59.500001  

如果没有在加载时选择 datetimeindex，那么可以稍后转化：

In [2]:
df: pd.DataFrame = pd.read_csv(data_path, parse_dates=['datetime'])
print(df)

                          datetime        datetime_nano  last_price  highest  \
0       2021-06-30 18:51:42.300000  1625050302300000000         NaN      NaN   
1       2021-06-30 20:59:00.500000  1625057940500000000     18780.0  18780.0   
2       2021-06-30 21:00:00.500000  1625058000500000000     18780.0  18780.0   
3       2021-06-30 21:00:01.000000  1625058001000000000     18780.0  18780.0   
4       2021-06-30 21:00:01.500000  1625058001500000000     18780.0  18785.0   
...                            ...                  ...         ...      ...   
1468857 2021-09-25 00:59:58.000000  1632502798000000000     22945.0  23150.0   
1468858 2021-09-25 00:59:58.500000  1632502798500000000     22945.0  23150.0   
1468859 2021-09-25 00:59:59.000000  1632502799000000000     22945.0  23150.0   
1468860 2021-09-25 00:59:59.500000  1632502799500000000     22945.0  23150.0   
1468861 2021-09-25 00:59:59.500001  1632502799500001000     22945.0  23150.0   

          lowest  volume        amount 

In [3]:
df.set_index('datetime', inplace=True)
print(df)

                                  datetime_nano  last_price  highest   lowest  \
datetime                                                                        
2021-06-30 18:51:42.300000  1625050302300000000         NaN      NaN      NaN   
2021-06-30 20:59:00.500000  1625057940500000000     18780.0  18780.0  18780.0   
2021-06-30 21:00:00.500000  1625058000500000000     18780.0  18780.0  18780.0   
2021-06-30 21:00:01.000000  1625058001000000000     18780.0  18780.0  18780.0   
2021-06-30 21:00:01.500000  1625058001500000000     18780.0  18785.0  18780.0   
...                                         ...         ...      ...      ...   
2021-09-25 00:59:58.000000  1632502798000000000     22945.0  23150.0  22855.0   
2021-09-25 00:59:58.500000  1632502798500000000     22945.0  23150.0  22855.0   
2021-09-25 00:59:59.000000  1632502799000000000     22945.0  23150.0  22855.0   
2021-09-25 00:59:59.500000  1632502799500000000     22945.0  23150.0  22855.0   
2021-09-25 00:59:59.500001  

转换成 Bar：

In [4]:
# 转成 Bar。
df_1min = df['last_price'].resample('1MIN').ohlc()
print(df_1min)

                        open     high      low    close
datetime                                               
2021-06-30 18:51:00      NaN      NaN      NaN      NaN
2021-06-30 18:52:00      NaN      NaN      NaN      NaN
2021-06-30 18:53:00      NaN      NaN      NaN      NaN
2021-06-30 18:54:00      NaN      NaN      NaN      NaN
2021-06-30 18:55:00      NaN      NaN      NaN      NaN
...                      ...      ...      ...      ...
2021-09-25 00:55:00  22955.0  22960.0  22945.0  22950.0
2021-09-25 00:56:00  22950.0  22960.0  22945.0  22950.0
2021-09-25 00:57:00  22950.0  22955.0  22950.0  22950.0
2021-09-25 00:58:00  22955.0  22960.0  22945.0  22955.0
2021-09-25 00:59:00  22950.0  22960.0  22945.0  22945.0

[124209 rows x 4 columns]


Bar 数据生成了。

可以看出，简单的 resample 之后，DataFrame 有许多不在交易时间中的数据。

这是因为 resample 就是从第一个 tick 时间开始，按照采样参数（本例为一分钟），开始填充数据。

必须加以过滤。