# 数据获取模块

可以通过多种途径获取金融数据
* 许多公司会购买Wind、恒生聚源等数据提供商的数据库，精确
* 通过Python的第三方模块获取数据，tushare,pandas-datareader,yfinance,不太精确，有许多缺失值
* 通过第三方平台获取免费数据，例如优矿、聚宽、米筐等。平台内部提供的数据获取API,这一类数据获取源就比较丰富，历史行情数据，tick级的数据，基本面等因子数据 
* 通过爬虫爬取某些网站上的数据，例如某些私募基金的基金净值数据

A股数据的Python数据获取模块主要来源是新浪，网易，腾讯，东方财富等  
美股的Python数据获取模块主要来源是Yahoo，google的一些数据


* 这一讲：通过Python的第三方模块获取数据
* 这一讲是为后面pandas的应用提供数据支撑

# tushare

tushare获取数据十分方便   
但是由于其数据库费用方便的压力   
和数据销售当中的一些法律风险   
现在的tushare转为tushare_pro  
由原来的免费改为积分制  


网址：http://tushare.org/          tushare-pro    https://tushare.pro/

### 获取历史行情数据

* get_hist_data()   
* get_k_data()  

In [4]:
import tushare as ts
df = ts.get_hist_data('600519','2017-01-01','2020-08-30')
df.head(5)

Unnamed: 0_level_0,open,high,close,low,volume,price_change,p_change,ma5,ma10,ma20,v_ma5,v_ma10,v_ma20,turnover
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2020-08-28,1731.0,1764.0,1757.0,1721.5,42299.62,26.0,1.5,1727.67,1706.035,1676.738,32791.36,31233.14,30845.78,0.34
2020-08-27,1731.99,1733.98,1731.0,1705.0,22635.53,4.0,0.23,1711.47,1696.435,1672.797,29347.92,30768.49,30765.04,0.18
2020-08-26,1734.0,1745.0,1727.0,1715.02,26215.73,0.0,0.0,1698.07,1686.835,1670.247,29297.28,30559.07,31551.92,0.21
2020-08-25,1704.8,1746.78,1727.0,1698.63,41872.49,30.65,1.81,1690.07,1676.83,1667.497,29437.13,30730.89,32778.55,0.33
2020-08-24,1700.0,1708.0,1696.35,1691.01,30933.44,20.35,1.21,1685.67,1668.381,1664.647,27147.19,30125.46,33430.5,0.25


参数说明：

> code：股票代码，即6位数字代码，或者指数代码（sh=上证指数 sz=深圳成指 hs300=沪深300指数 sz50=上证50 zxb=中小板 cyb=创业板）  
> start：开始日期，格式YYYY-MM-DD  
> end：结束日期，格式YYYY-MM-DD  
> ktype：数据类型，D=日k线 W=周 M=月 5=5分钟 15=15分钟 30=30分钟 60=60分钟，默认为D  
> retry_count：当网络异常后重试次数，默认为3  
> pause:重试时停顿秒数，默认为0  

返回值说明：

> date：日期  
> open：开盘价  
> high：最高价  
> close：收盘价  
> low：最低价  
> volume：成交量  
> price_change：价格变动  
> p_change：涨跌幅  
> ma5：5日均价  
> ma10：10日均价  
> ma20:20日均价  
> v_ma5:5日均量  
> v_ma10:10日均量  
> v_ma20:20日均量    
> turnover:换手率[注：指数无此项]  

换手率：指在一定时间内市场中股票转手买卖的频率，是反映股票流通性强弱的指标之一。  
成交量/流通股本×100%）   
流通股本：流通股本是指公司已发行股本中在外流通没有被公司收回的部分。是指可以在二级市场流通的股份。

In [6]:
import tushare as ts
df = ts.get_k_data('600519','2017-01-01','2020-08-30')
df.head(5)

Unnamed: 0,date,open,close,high,low,volume,code
0,2017-01-03,324.689,324.961,327.331,323.261,20763.0,600519
1,2017-01-04,325.019,341.813,342.066,325.0,65257.0,600519
2,2017-01-05,339.958,336.792,341.366,335.529,41704.0,600519
3,2017-01-06,336.694,340.696,349.457,336.17,68095.0,600519
4,2017-01-09,337.821,338.511,342.755,336.597,35405.0,600519


### pandas-datareader

GitHub地址：https://github.com/pydata/pandas-datareader

pandas_datareader.data和pandas_datareader.wb中的函数将各种Internet源中的数据提取到pandas DataFrame中。目前支持以下来源

> * Tiingo  Tiingo是一个跟踪平台，提供股票、基金(equities, mutual funds and ETFs)的历史收盘价格的数据API。需要免费注册才能获得API密钥。免费账户是有费率限制的，可以访问有限数量的数据(目前为500个symbols)。  
> * IEX  投资者交易所(IEX)通过API提供广泛的数据。历史股票价格的有效期最长可达15年。这些读卡器的使用需要IEX云控制台的可发布API密钥，该密钥可以存储IEX_API_KEY环境变量中。  
> * Alpha Vantage   Alpha Vantage提供实时股票和外汇数据。需要免费注册才能获得API密钥。    
> * Quandl  从Quandl来的每日财务数据(股票、ETF等价格)。两部分组成：数据库名称和symbol名称（类似下面的列名）。数据库名称可以是Quandl网站上列出的所有免费名称。symbol名称随数据库名称的不同而不同；对于Wiki(美国股票)，它们是常见的股票代码  

Yahoo! Finance//雅虎金融  
Google Finance//谷歌金融  
Enigma//Enigma是一个公共数据搜索的提供商  
St.Louis FED (FRED)//圣路易斯联邦储备银行  
Kenneth French’s data library//肯尼斯弗兰奇资料库  
World Bank//世界银行  
OECD//经合组织  
Eurostat//欧盟统计局  
Thrift Savings Plan//美国联邦政府管理离退休的组织  
Oanda currency historical rate  //外汇经纪商  
Nasdaq Trader symbol definitions //纳斯达克  

In [8]:
from pandas_datareader import data as web
help(web) dir()

Help on module pandas_datareader.data in pandas_datareader:

NAME
    pandas_datareader.data - Module contains tools for collecting data from various remote sources

FUNCTIONS
    DataReader(name, data_source=None, start=None, end=None, retry_count=3, pause=0.1, session=None, api_key=None)
        Imports data from a number of online sources.
        
        Currently supports Google Finance, St. Louis FED (FRED),
        and Kenneth French's data library, among others.
        
        Parameters
        ----------
        name : str or list of strs
            the name of the dataset. Some data sources (IEX, fred) will
            accept a list of names.
        data_source: {str, None}
            the data source ("iex", "fred", "ff")
        start : string, int, date, datetime, Timestamp
            left boundary for range (defaults to 1/1/2010)
        end : string, int, date, datetime, Timestamp
            right boundary for range (defaults to today)
        retry_count : {int,

In [3]:
from pandas_datareader import data as web
df = web.DataReader('SPY',data_source='yahoo',start='1/1/2018')
df.head(5)

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-01-02,268.809998,267.399994,267.839996,268.769989,86655700.0,256.217468
2018-01-03,270.640015,268.959991,268.959991,270.470001,90070400.0,257.838104
2018-01-04,272.160004,270.540009,271.200012,271.609985,80636400.0,258.924835
2018-01-05,273.559998,271.950012,272.51001,273.420013,83524000.0,260.650299
2018-01-08,274.100006,272.980011,273.309998,273.920013,57319200.0,261.126984


Adj Close：相当于除权收盘价

获取实时数据

> regularMarketOpen 开盘价  
> regularMarketDayLow 最低价  
> regularMarketDayHigh 最高价  
> regularMarketPreviousClose 前一日的收盘价  
> regularMarketDayRange 价格变动范围  
> regularMarketChangePercent涨跌幅  

In [13]:
df = web.get_quote_yahoo('AAPL')
df

Unnamed: 0,language,region,quoteType,quoteSourceName,triggerable,currency,marketState,firstTradeDateMilliseconds,priceHint,preMarketChange,...,shortName,longName,messageBoardId,exchangeTimezoneName,exchangeTimezoneShortName,gmtOffSetMilliseconds,market,esgPopulated,displayName,price
AAPL,en-US,US,EQUITY,Nasdaq Real Time Price,True,USD,PRE,345479400000,2,1.852501,...,Apple Inc.,Apple Inc.,finmb_24937,America/New_York,EDT,-14400000,us_market,False,Apple,124.8075


pandas_datareader获取A股数据
国内股市采用的输入方式“股票代码”+“对应股市”
上证股票在股票代码后面加上“.SS”
深圳股票在股票代码后面加上“.SZ”

In [18]:
from pandas_datareader import data as web
df = web.DataReader('600519.SS',data_source='yahoo',start='1/1/2020')
df.head(5)

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,1145.060059,1116.0,1128.0,1130.0,14809916.0,1116.952637
2020-01-03,1117.0,1076.900024,1117.0,1078.560059,13031878.0,1066.106689
2020-01-06,1092.900024,1067.300049,1070.859985,1077.98999,6341478.0,1065.543213
2020-01-07,1099.0,1076.400024,1077.5,1094.530029,4785359.0,1081.892212
2020-01-08,1088.140015,1088.140015,1088.140015,1088.140015,2500825.0,1075.57605


# jqdata 

安装 pip install jqdatasdk

In [None]:
from jqdatasdk import *
auth("username","password")

In [19]:
import jqdatasdk
jqdatasdk.auth("18xxxxxx","xxxxxxxx")

auth success 


In [25]:
df = jqdatasdk.get_price('600519.XSHG',count=10,end_date = '2020-08-01',frequency='daily')
df

Unnamed: 0,open,close,high,low,volume,money
2020-07-20,1652.5,1636.96,1663.99,1588.88,4528033.0,7363823000.0
2020-07-21,1646.64,1668.0,1702.8,1630.99,4196950.0,7026353000.0
2020-07-22,1658.0,1678.08,1710.0,1658.0,3920121.0,6637265000.0
2020-07-23,1672.0,1676.0,1691.0,1650.0,3432052.0,5739050000.0
2020-07-24,1654.85,1595.3,1666.0,1585.0,7875201.0,12705940000.0
2020-07-27,1616.01,1622.55,1629.8,1600.0,3878817.0,6279664000.0
2020-07-28,1630.0,1670.0,1680.0,1630.0,5491152.0,9133722000.0
2020-07-29,1690.0,1672.0,1690.0,1633.0,5074843.0,8411790000.0
2020-07-30,1671.2,1680.0,1700.59,1670.5,3837304.0,6486855000.0
2020-07-31,1685.0,1678.18,1705.0,1646.0,4068474.0,6846747000.0


# yfinance

In [5]:
import yfinance as yf
help(yf)

Help on package yfinance:

NAME
    yfinance

DESCRIPTION
    # -*- coding: utf-8 -*-
    #
    # Yahoo! Finance market data downloader (+fix for Pandas Datareader)
    # https://github.com/ranaroussi/yfinance
    #
    # Copyright 2017-2019 Ran Aroussi
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    #

PACKAGE CONTENTS
    base
    multi
    shared
    ticker
    tickers
    utils

CLASSES
    builtins.object
        yfinance.tickers.Tickers
   

In [9]:
stock = yf.Ticker("AAPL")
# stock.info

In [11]:
data = stock.history(period='1mo', interval='1d', start='2019-01-01')
data.head(5)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-12-31,38.78,38.98,38.28,38.59,140014000.0,0.0,0.0
2019-01-02,37.89,38.86,37.73,38.63,148158800.0,0.0,0.0
2019-01-03,35.22,35.64,34.73,34.78,365248800.0,0.0,0.0
2019-01-04,35.35,36.34,35.18,36.27,234428400.0,0.0,0.0
2019-01-07,36.37,36.41,35.69,36.19,219111200.0,0.0,0.0


作业：
1.用聚宽获取2020-09-30全部A股股票代码列表 
2.用聚宽获取从2012-01-01开始到2020-01-01的所有交易日