# 聊聊quantos数据

quantos为量化研究提供一站式的解决方案，这篇文章里，我们聊一下quantos提供的数据服务。

## 整体架构

![](https://raw.githubusercontent.com/PKUJohnson/LearnJaqsByExample/master/image/quantos_data.png)

目前quantos提供的数据包括：

+ 基础数据，主要是一些基础信息，包括证券信息、行业代码、指数信息、交易日历等。
+ 市场数据，即由市场行情产生的数据，包括实时行情、实时分钟线、历史tick、历史日线、历史分钟线等。
+ 参考数据，包括股票的复权因子、分红、停复牌、行业分类，指数的成份股，公募基金的净值等。


## 数据获取

使用DataApi，可以获取quantos提供的各种研究数据。使用前需要先登录，代码如下：


In [1]:
import os
from jaqs.data import DataApi
api   = DataApi(addr="tcp://data.quantos.org:8910")
phone = os.environ.get("QUANTOS_USER")
token = os.environ.get("QUANTOS_TOKEN")
df, msg = api.login(phone, token)
print(df, msg)

username: 18612562791 0,


### 市场数据

市场数据分为实时数据和历史数据，通过几个不同的接口来获取。

#### 实时行情快照

通过quote函数，可以查询多只证券的实时行情快照。


In [2]:
df, msg = api.quote(
	symbol="000001.SH, cu1709.SHF", 
	fields="open,high,low,last,volume"
)
df.head(10)

Unnamed: 0,date,high,last,low,open,symbol,time,volume
000001.SH,20180222,3269.9156,3268.5589,3234.1152,3237.5692,000001.SH,150052000,138730445


#### 实时行情订阅

通过subscribe函数，可以订阅多只证券的实时行情，通过回调on_quote函数，将最新的数据返回给用户。


In [3]:
def on_quote(k,v):
    print(v['symbol']) // 标的代码
    print(v['last'])   // 最新成交价
    print(v['time'])   // 最新成交时间

subs_list,msg = api.subscribe("000001.SH, cu1709.SHF",func=on_quote,fields="symbol,last,time,volume")
print(subs_list, msg)

['000001.SH', 'cu1709.SHF'] 0,


quantos实时行情数据包括：

+ 股票level1行情，每3.0秒刷新一次
+ 期货level1行情，每0.5秒刷新一次

实时行情数据主要包括：

+ 时间信息(date, time, trade_date)
+ 最新的OHLC(open, high, low, close）
+ 最新的盘口信息（ask1-ask5, bid1-bid5）
+ 成交统计信息volume, turnover, vwap
+ 静态信息，包括涨停价、跌停价、昨收价、昨结算价

#### 分钟线查询

分钟线是将实时行情数据按照分钟为单位进行统计分析，得到的信息

bar函数查询分钟线信息，quantos支持1M、5M、15M三种分钟线，trade_date=0时，查询的是当日的分钟线，否则查询的是历史分钟线。


In [4]:
df,msg = api.bar(
	symbol="600030.SH", 
	trade_date=20170928, 
	freq="5M",
	start_time=0,
	end_time=160000,
	fields=""
)
df.head(10)

Unnamed: 0,close,code,date,freq,high,low,oi,open,settle,symbol,time,trade_date,turnover,volume,vwap
0,18.05,600030,20170928,5M,18.08,18.0,,18.01,,600030.SH,93500,20170928,13576973.0,752900.0,18.032903
1,18.03,600030,20170928,5M,18.06,18.01,,18.04,,600030.SH,94000,20170928,16145566.0,895110.0,18.037522
2,18.04,600030,20170928,5M,18.05,18.02,,18.03,,600030.SH,94500,20170928,11024829.0,611400.0,18.032105
3,17.99,600030,20170928,5M,18.05,17.97,,18.04,,600030.SH,95000,20170928,30021003.0,1667190.0,18.006948
4,18.02,600030,20170928,5M,18.03,17.97,,17.98,,600030.SH,95500,20170928,13691203.0,761161.0,17.987263
5,18.0,600030,20170928,5M,18.03,17.98,,18.01,,600030.SH,100000,20170928,17562219.0,975400.0,18.005146
6,17.98,600030,20170928,5M,18.0,17.96,,18.0,,600030.SH,100500,20170928,29442839.0,1637650.0,17.978713
7,17.99,600030,20170928,5M,18.0,17.98,,17.99,,600030.SH,101000,20170928,8453291.0,469911.0,17.989132
8,18.0,600030,20170928,5M,18.01,17.99,,18.0,,600030.SH,101500,20170928,9820498.0,545600.0,17.999446
9,17.98,600030,20170928,5M,18.01,17.95,,18.0,,600030.SH,102000,20170928,30884646.0,1719000.0,17.966635


quantos每日根据接收的tick数据，合成分钟线并保存在文件中。

分钟线数据主要包括：

+ 时间信息(date, time, trade_date)
+ 这一分钟的OHLC(open, high, low, close）
+ 这一分钟内最后的盘口信息（ask1-ask5, bid1-bid5）
+ 成交统计信息volume, turnover, vwap

很多交易策略是基于分钟线进行研究的，因为分钟线的统计规律更加稳定。

#### 日线查询

日线，顾名思义就是每日收盘数据。属于日级别的低频数据，很多股票alpha策略都是在日线上进行研究的。


In [5]:
df, msg = api.daily(
	symbol="600832.SH, 600030.SH", 
	start_date=20121026,
	end_date=20121130, 
	fields="", 
	adjust_mode="post"
)
df.head(10)

Unnamed: 0,close,code,freq,high,low,oi,open,presettle,settle,symbol,trade_date,trade_status,turnover,volume,vwap
0,84.55789,600832,1d,87.049772,84.391764,,86.883647,,,600832.SH,20121026,交易,27790568.0,5381800.0,85.78
1,84.724015,600832,1d,85.554643,84.391764,,84.890141,,,600832.SH,20121029,交易,13203328.0,2582557.0,84.93
2,84.890141,600832,1d,86.053019,84.391764,,85.056266,,,600832.SH,20121030,交易,16227051.0,3170615.0,85.02
3,84.890141,600832,1d,85.388517,84.55789,,85.056266,,,600832.SH,20121031,交易,10720069.0,2097770.0,84.89
4,86.053019,600832,1d,86.38527,85.056266,,85.056266,,,600832.SH,20121101,交易,19721000.0,3814712.0,85.88
5,88.212651,600832,1d,88.877153,85.222392,,86.38527,,,600832.SH,20121102,交易,57502794.0,10927010.0,87.42
6,87.8804,600832,1d,89.541655,87.215898,,88.378777,,,600832.SH,20121105,交易,62725248.0,11807741.0,88.25
7,88.378777,600832,1d,88.711028,86.38527,,87.714275,,,600832.SH,20121106,交易,55743439.0,10595902.0,87.4
8,88.046526,600832,1d,88.544902,87.548149,,87.8804,,,600832.SH,20121107,交易,26376465.0,4975333.0,88.07
9,86.883647,600832,1d,88.046526,86.551396,,87.382024,,,600832.SH,20121108,交易,31516248.0,6006363.0,87.17


daily函数可以获取多只证券某段时间内的每日收盘数据，adjust_mode字段是复权方式，这就涉及到股票价格复权的问题。

那股票复权是咋回事呢？

原来，很多股票每年都会进行分红、配股等操作，会导致股票的价格发生突变。比如：

+ 某股票今日日终分红每股1元，则次日其股价自动减少1元。
+ 某股票今日日终实施配股(或送股)，每股配(送)2股，则次日股价自动调整为今日收盘价的1/3.

还有一些股票是分红和配股(送股)一起实施。

复权就是按照红利再投资的原则，复原真实的股价，即在最新的股价上乘以一个复权因子的系数。

复权可以解决股票长周期回测的价格问题。

### 通用数据查询接口


除市场数据外，基础数据和参考数据都是通过一个叫做query的通用数据查询接口api获取的，样例代码如下：

In [6]:
# 通用数据查询接口样例
df, msg = api.query(
    view="jz.instrumentInfo", 
    fields="status,list_date, fullname_en, market", 
    filter="inst_type=1&status=1&symbol="
)
df.head(10)

Unnamed: 0,list_date,market,name,status,symbol
0,19991110,SH,浦发银行,1,600000.SH
1,20001219,SH,民生银行,1,600016.SH
2,20030106,SH,中信证券,1,600030.SH
3,20021009,SH,中国联通,1,600050.SH
4,19970807,SH,国金证券,1,600109.SH
5,20000526,SH,广汇能源,1,600256.SH
6,20010827,SH,贵州茅台,1,600519.SH
7,19930316,SH,东方明珠,1,600637.SH
8,19960312,SH,伊利股份,1,600887.SH
9,20091117,SH,招商证券,1,600999.SH


这里面有三个参数：
+ 第一个参数view需填入对应的接口名
+ 输入参数指的是filter参数里面的内容，通过'&'符号拼接
+ 输出参数指的是fields里面的内容，通过','隔开

也就是说，使用query接口，你需要提供三个信息，接口名、条件参数、输出字段

问题来了，如何知道有哪些接口呢？如何知道每个接口有哪些输入参数和输出参数可选择呢？

彩蛋来了，有两个查询接口信息的接口。help.apiList，help.apiParam


In [7]:
# 查询有哪些接口可以调用
df, msg = api.query(
    view="help.apiList", 
    fields="", 
    filter=""
)
df.head(100)

Unnamed: 0,api,comment,name
0,jz.instrumentInfo,证券基本信息,证券基础信息
1,jz.secTradeCal,交易日历,交易日历
2,lb.indexCons,指数成份股,指数成份股
3,lb.indexInfo,指数基本信息,指数基本信息
4,lb.industryType,行业代码表,行业代码表
5,lb.mfNav,公募基金净值,公募基金净值
6,lb.secAdjFactor,复权因子,复权因子
7,lb.secDividend,分红送股,分红送股表
8,lb.secIndustry,行业分类信息,行业分类
9,lb.secSusp,停复牌数据,停复牌


In [8]:
# 查询jz.instrumentInfo接口的输入输出参数
df, msg = api.query(
    view="help.apiParam", 
    fields="", 
    filter="api=jz.instrumentInfo"
)
df.head(100)

Unnamed: 0,api,comment,dtype,must,param,pname,ptype
0,jz.instrumentInfo,,Int,N,inst_type,证券类型,OUT
1,jz.instrumentInfo,,Int,N,inst_type,证券类型,IN
2,jz.instrumentInfo,,Int,N,status,上市状态,IN
3,jz.instrumentInfo,,String,N,symbol,证券代码,IN
4,jz.instrumentInfo,,String,Y,symbol,证券代码,OUT
5,jz.instrumentInfo,,String,Y,name,证券名称,OUT
6,jz.instrumentInfo,,String,Y,list_date,上市日期,OUT
7,jz.instrumentInfo,,String,N,delist_date,退市日期,OUT
8,jz.instrumentInfo,,Int,N,status,上市状态,OUT
9,jz.instrumentInfo,,String,N,currency,货币,OUT


### 基础数据

+ 证券基础信息
+ 交易日历信息
+ 指数基本信息
+ 行业信息

In [9]:
# 证券基础信息

df, msg = api.query(
    view="jz.instrumentInfo", 
    fields="", 
    filter="inst_type=1&status=1&symbol="
)
df.head(10)

Unnamed: 0,list_date,name,symbol
0,19991110,浦发银行,600000.SH
1,20001219,民生银行,600016.SH
2,20030106,中信证券,600030.SH
3,20021009,中国联通,600050.SH
4,19970807,国金证券,600109.SH
5,20000526,广汇能源,600256.SH
6,20010827,贵州茅台,600519.SH
7,19930316,东方明珠,600637.SH
8,19960312,伊利股份,600887.SH
9,20091117,招商证券,600999.SH


In [10]:
# 指数基本信息

df, msg = api.query(
    view="lb.indexInfo", 
    fields="", 
    filter=""
)
df.head(10)

Unnamed: 0,compname,exchmarket,symbol
0,深圳证券交易所农林牧渔指数,SZ,399110.SZ
1,深圳证券交易所采掘业指数,SZ,399120.SZ
2,深圳证券交易所制造业指数,SZ,399130.SZ
3,深圳证券交易所食品饮料指数,SZ,399131.SZ
4,深圳证券交易所纺织服装指数,SZ,399132.SZ
5,深圳证券交易所木材家具指数,SZ,399133.SZ
6,深圳证券交易所造纸印刷指数,SZ,399134.SZ
7,深圳证券交易所石化塑胶指数,SZ,399135.SZ
8,深圳证券交易所电子指数,SZ,399136.SZ
9,深圳证券交易所金属非金属指数,SZ,399137.SZ


In [11]:
# 交易日历信息，只支持中国的交易日历

df, msg = api.query(
    view="jz.secTradeCal", 
    fields="", 
    filter=""
)
df.head(10)

Unnamed: 0,istradeday,trade_date
0,T,19901219
1,T,19901220
2,T,19901221
3,T,19901224
4,T,19901225
5,T,19901226
6,T,19901227
7,T,19901228
8,T,19901231
9,T,19910102


In [12]:
# 行业代码表

df, msg = api.query(
    view="lb.industryType", 
    fields="", 
    filter="industry_src=SW&level=1"
)
df.head(100)

Unnamed: 0,industry1_code,industry2_code,industry3_code,industry_name,industry_src,level
0,110000,,,农林牧渔,sw,1
1,210000,,,采掘,sw,1
2,220000,,,化工,sw,1
3,230000,,,钢铁,sw,1
4,240000,,,有色金属,sw,1
5,270000,,,电子,sw,1
6,280000,,,汽车,sw,1
7,330000,,,家用电器,sw,1
8,340000,,,食品饮料,sw,1
9,350000,,,纺织服装,sw,1


上面样例获取的是申万一级行业分类代码，一共28个一级行业。

### 参考数据

股票相关

+ 股票分红配股数据
+ 股票停复牌数据
+ 股票复权因子数据
+ 股票行业分类数据

指数相关

+ 指数成分股

基金相关

+ 公募基金净值

In [13]:
# 股票分红配股数据

df, msg = api.query(
    view="lb.secDividend", 
    fields="", 
    filter="symbol=600036.SH"
)
df.head(100)

Unnamed: 0,ann_date,bonus_list_date,cash,cash_tax,cashpay_date,div_enddate,exdiv_date,publish_date,record_date,share_ratio,share_trans_ratio,symbol
0,20020604,,0.166,0.0,20020604.0,20011231,,,,0.0,0.0,600036.SH
1,20030418,,0.12,0.096,20030723.0,20021231,20030716.0,20030708.0,20030715.0,0.0,0.0,600036.SH
2,20040218,20040512.0,0.092,0.074,20040517.0,20031231,20040511.0,20040428.0,20040510.0,0.0,0.2,600036.SH
3,20050331,20050621.0,0.11,0.099,20050624.0,20041231,20050620.0,20050614.0,20050617.0,0.0,0.5,600036.SH
4,20060412,,0.08,0.072,20060621.0,20051231,20060616.0,20060612.0,20060615.0,0.0,0.0,600036.SH
5,20051230,20060227.0,0.0,0.0,,20060223,20060224.0,20060222.0,20060223.0,0.0,0.08589,600036.SH
6,20060720,,0.18,0.162,20060927.0,20060719,20060921.0,20060915.0,20060920.0,0.0,0.0,600036.SH
7,20070417,,0.12,0.108,20070710.0,20061231,20070704.0,20070628.0,20070703.0,0.0,0.0,600036.SH
8,20080319,,0.28,0.252,20080801.0,20071231,20080728.0,20080722.0,20080725.0,0.0,0.0,600036.SH
9,20090425,20090706.0,0.1,0.06,20090709.0,20081231,20090703.0,20090629.0,20090702.0,0.3,0.0,600036.SH


In [14]:
# 股票停复牌数据

df, msg = api.query(
    view="lb.secSusp", 
    fields="", 
    filter="symbol=600036.SH"
)
df.head(100)

Unnamed: 0,ann_date,resu_date,susp_date,susp_reason,symbol
0,20080602,20080603,20080602,重要事项未公告,600036.SH
1,20080603,20080603,20080603,刊登重要公告,600036.SH
2,20080627,20080630,20080627,召开股东大会,600036.SH
3,20090227,20090302,20090227,召开股东大会,600036.SH
4,20090619,20090622,20090619,召开股东大会,600036.SH
5,20091019,20091020,20091019,召开股东大会,600036.SH
6,20100623,20100624,20100623,召开股东大会,600036.SH
7,20110530,20110531,20110530,召开股东大会,600036.SH
8,20110909,20110913,20110909,召开股东大会,600036.SH
9,20120530,20120531,20120530,召开股东大会,600036.SH


In [15]:
# 股票复权因子数据

df, msg = api.query(
    view="lb.secAdjFactor", 
    fields="", 
    filter="symbol=600036.SH"
)
df.tail(100)

Unnamed: 0,adjust_factor,symbol,trade_date
3756,4.62913,600036.SH,20170921
3757,4.62913,600036.SH,20170922
3758,4.62913,600036.SH,20170925
3759,4.62913,600036.SH,20170926
3760,4.62913,600036.SH,20170927
3761,4.62913,600036.SH,20170928
3762,4.62913,600036.SH,20170929
3763,4.62913,600036.SH,20171009
3764,4.62913,600036.SH,20171010
3765,4.62913,600036.SH,20171011


In [16]:
# 股票行业分类数据

df, msg = api.query(
    view="lb.secIndustry", 
    fields="", 
    filter="symbol=600030.SH,600031.SH&industry_src=SW"
)
df.tail(100)

Unnamed: 0,in_date,industry1_code,industry1_name,industry2_code,industry2_name,industry3_code,industry3_name,industry4_code,industry4_name,industry_src,out_date,symbol
0,20140101,490000,非银金融,490100,证券,490101,证券,490101,证券,sw,,600030.SH
1,20140101,640000,机械设备,640200,专用设备,640201,工程机械,640201,工程机械,sw,,600031.SH


In [17]:
# 指数成份股数据

df, msg = api.query(
    view="lb.indexCons", 
    fields="", 
    filter="index_code=000016.SH&start_date=20170101&end_date=20171229"
)
df.tail(100)

Unnamed: 0,in_date,index_code,out_date,symbol
0,20040102,000016.SH,,600000.SH
1,20040102,000016.SH,,600016.SH
2,20171211,000016.SH,,600019.SH
3,20040102,000016.SH,,600028.SH
4,20160613,000016.SH,,600029.SH
5,20040102,000016.SH,,600030.SH
6,20040102,000016.SH,,600036.SH
7,20040102,000016.SH,,600050.SH
8,20161212,000016.SH,20171208.0,600100.SH
9,20040102,000016.SH,,600104.SH


指数成份股接口的正确用法是，系统返回了在start_date和end_date之间所有日期相关的记录。

用户如果要查询某一天的指数成份股，将start_date和end_date设置成一样就可以了。


In [18]:
# 指数成份股数据（某一天）

df, msg = api.query(
    view="lb.indexCons", 
    fields="", 
    filter="index_code=000016.SH&start_date=20180221&end_date=20180221"
)
df.tail(100)

Unnamed: 0,in_date,index_code,out_date,symbol
0,20040102,000016.SH,,600000.SH
1,20040102,000016.SH,,600016.SH
2,20171211,000016.SH,,600019.SH
3,20040102,000016.SH,,600028.SH
4,20160613,000016.SH,,600029.SH
5,20040102,000016.SH,,600030.SH
6,20040102,000016.SH,,600036.SH
7,20040102,000016.SH,,600050.SH
8,20040102,000016.SH,,600104.SH
9,20110104,000016.SH,,600111.SH


In [19]:
# 公募基金净值

df, msg = api.query(
    view="lb.mfNav", 
    fields="", 
    filter="symbol=510050.SH&start_pdate=20170101&end_pdate=20180101"
)
df

Unnamed: 0,ann_date,nav,nav_accumulated,price_date,symbol
0,20170104,2.310,3.142,20170103,510050.SH
1,20170105,2.324,3.158,20170104,510050.SH
2,20170106,2.325,3.160,20170105,510050.SH
3,20170107,2.312,3.144,20170106,510050.SH
4,20170110,2.321,3.155,20170109,510050.SH
5,20170111,2.315,3.148,20170110,510050.SH
6,20170112,2.303,3.134,20170111,510050.SH
7,20170113,2.298,3.128,20170112,510050.SH
8,20170114,2.311,3.143,20170113,510050.SH
9,20170117,2.340,3.177,20170116,510050.SH
