# Python 金融資料的處理應用

> 實作：取得網路金融資料

[郭耀仁](https://hahow.in/@tonykuoyj?tr=tonykuoyj) | yaojenkuo@ntu.edu.tw | April 2024

In [1]:
import datetime
import time
import io
import requests
import pandas as pd

## Yahoo! Finance

## 關於 Yahoo! Finance

- 首頁：<https://finance.yahoo.com>
- 基礎交易資訊頁面：`f"https://finance.yahoo.com/quote/{stock_ticker}/history"`
- 基礎交易資訊 API: `f"https://query1.finance.yahoo.com/v7/finance/chart/{stock_ticker}"`

In [2]:
# 美股代號
stock_ticker = "NVDA"
base_api_url = f"https://query1.finance.yahoo.com/v7/finance/chart/{stock_ticker}"
request_headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}
response = requests.get(base_api_url, headers=request_headers)
response_json = response.json()
print(response.status_code)
print(type(response_json))

200
<class 'dict'>


In [3]:
# 台股代號
stock_ticker = "2330.TW"
base_api_url = f"https://query1.finance.yahoo.com/v7/finance/chart/{stock_ticker}"
request_headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}
response = requests.get(base_api_url, headers=request_headers)
response_json = response.json()
print(response.status_code)
print(type(response_json))

200
<class 'dict'>


## 取得 Yahoo! Finance 個股交易歷史資料

- 在基礎交易資訊 API 之後加入查詢字串（Query Strings）。
    - `period1` 起始日期時間（以 Unix time 秒數表示）。
    - `period2` 截止日期時間（以 Unix time 秒數表示）。

In [4]:
# 美股代號
stock_ticker = "NVDA"
start_date_str = "2024-01-01"
stop_date_str = "2024-01-07"
start_year, start_month, start_day = [int(e) for e in start_date_str.split("-")]
stop_year, stop_month, stop_day = [int(e) for e in stop_date_str.split("-")]
start_datetime = datetime.datetime(start_year, start_month, start_day, 9, 30)
stop_datetime = datetime.datetime(stop_year, stop_month, stop_day, 16, 0)
start_date_unixtime = time.mktime(start_datetime.timetuple())
stop_date_unixtime = time.mktime(stop_datetime.timetuple())
period1 = int(start_date_unixtime)
period2 = int(stop_date_unixtime)
print(period1)
print(period2)

1704072600
1704614400


In [5]:
base_api_url = f"https://query1.finance.yahoo.com/v7/finance/chart/{stock_ticker}"
request_headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}
query_str_params = {
    "interval": "1d",
    "period1": period1,
    "period2": period2
}
response = requests.get(base_api_url, headers=request_headers, params=query_str_params)
response_json = response.json()
print(response.status_code)
print(type(response_json))

200
<class 'dict'>


In [6]:
print(response_json)

{'chart': {'result': [{'meta': {'currency': 'USD', 'symbol': 'NVDA', 'exchangeName': 'NMS', 'instrumentType': 'EQUITY', 'firstTradeDate': 917015400, 'regularMarketTime': 1711137600, 'hasPrePostMarketData': True, 'gmtoffset': -14400, 'timezone': 'EDT', 'exchangeTimezoneName': 'America/New_York', 'regularMarketPrice': 942.89, 'chartPreviousClose': 495.22, 'priceHint': 2, 'currentTradingPeriod': {'pre': {'timezone': 'EDT', 'start': 1711094400, 'end': 1711114200, 'gmtoffset': -14400}, 'regular': {'timezone': 'EDT', 'start': 1711114200, 'end': 1711137600, 'gmtoffset': -14400}, 'post': {'timezone': 'EDT', 'start': 1711137600, 'end': 1711152000, 'gmtoffset': -14400}}, 'dataGranularity': '1d', 'range': '', 'validRanges': ['1d', '5d', '1mo', '3mo', '6mo', '1y', '2y', '5y', '10y', 'ytd', 'max']}, 'timestamp': [1704205800, 1704292200, 1704378600, 1704465000], 'indicators': {'quote': [{'open': [492.44000244140625, 474.8500061035156, 477.6700134277344, 484.6199951171875], 'low': [475.95001220703125

In [7]:
# 台股代號
stock_ticker = "2330.TW"
start_date_str = "2024-01-01"
stop_date_str = "2024-01-07"
start_year, start_month, start_day = [int(e) for e in start_date_str.split("-")]
stop_year, stop_month, stop_day = [int(e) for e in stop_date_str.split("-")]
start_datetime = datetime.datetime(start_year, start_month, start_day, 9, 0)
stop_datetime = datetime.datetime(stop_year, stop_month, stop_day, 13, 30)
start_date_unixtime = time.mktime(start_datetime.timetuple())
stop_date_unixtime = time.mktime(stop_datetime.timetuple())
period1 = int(start_date_unixtime)
period2 = int(stop_date_unixtime)
print(period1)
print(period2)

1704070800
1704605400


In [8]:
base_api_url = f"https://query1.finance.yahoo.com/v7/finance/chart/{stock_ticker}"
request_headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}
query_str_params = {
    "interval": "1d",
    "period1": period1,
    "period2": period2
}
response = requests.get(base_api_url, headers=request_headers, params=query_str_params)
response_json = response.json()
print(response.status_code)
print(type(response_json))

200
<class 'dict'>


In [9]:
print(response_json)

{'chart': {'result': [{'meta': {'currency': 'TWD', 'symbol': '2330.TW', 'exchangeName': 'TAI', 'instrumentType': 'EQUITY', 'firstTradeDate': 946947600, 'regularMarketTime': 1711085403, 'hasPrePostMarketData': False, 'gmtoffset': 28800, 'timezone': 'CST', 'exchangeTimezoneName': 'Asia/Taipei', 'regularMarketPrice': 785.0, 'chartPreviousClose': 593.0, 'priceHint': 2, 'currentTradingPeriod': {'pre': {'timezone': 'CST', 'start': 1711069200, 'end': 1711069200, 'gmtoffset': 28800}, 'regular': {'timezone': 'CST', 'start': 1711069200, 'end': 1711085400, 'gmtoffset': 28800}, 'post': {'timezone': 'CST', 'start': 1711085400, 'end': 1711085400, 'gmtoffset': 28800}}, 'dataGranularity': '1d', 'range': '', 'validRanges': ['1d', '5d', '1mo', '3mo', '6mo', '1y', '2y', '5y', '10y', 'ytd', 'max']}, 'timestamp': [1704157200, 1704243600, 1704330000, 1704416400], 'indicators': {'quote': [{'open': [590.0, 584.0, 580.0, 578.0], 'low': [589.0, 576.0, 577.0, 574.0], 'close': [593.0, 578.0, 580.0, 576.0], 'high'

## 定義函數 `get_yfinance_data()`

In [10]:
def get_yfinance_data(stock_ticker: str, start_date: str, stop_date: str, listed_in_tw: bool=False) -> dict:
    if listed_in_tw:
        stock_ticker = f"{stock_ticker}.TW"
        start_hour, start_min = 9, 0
        stop_hour, stop_min = 13, 30
    else:
        start_hour, start_min = 9, 30
        stop_hour, stop_min = 16, 0
    start_year, start_month, start_day = [int(e) for e in start_date.split("-")]
    stop_year, stop_month, stop_day = [int(e) for e in stop_date.split("-")]
    start_datetime = datetime.datetime(start_year, start_month, start_day, start_hour, start_min)
    stop_datetime = datetime.datetime(stop_year, stop_month, stop_day, stop_hour, stop_min)
    start_date_unixtime = time.mktime(start_datetime.timetuple())
    stop_date_unixtime = time.mktime(stop_datetime.timetuple())
    period1 = int(start_date_unixtime)
    period2 = int(stop_date_unixtime)
    base_api_url = f"https://query1.finance.yahoo.com/v7/finance/chart/{stock_ticker}"
    request_headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
    }
    query_str_params = {
        "interval": "1d",
        "period1": period1,
        "period2": period2
    }
    response = requests.get(base_api_url, headers=request_headers, params=query_str_params)
    response_json = response.json()
    print(f"Request status code: {response.status_code}")
    return response_json

In [11]:
nvda = get_yfinance_data("NVDA", "2024-01-01", "2024-01-07")
print(type(nvda))

Request status code: 200
<class 'dict'>


In [12]:
tsmc = get_yfinance_data("2330", "2024-01-01", "2024-01-07", True)
print(type(tsmc))

Request status code: 200
<class 'dict'>


## 台灣證券交易所

## 關於台灣證券交易所

- 首頁：<https://www.twse.com.tw>
- 成交資訊頁面：<https://www.twse.com.tw/zh/trading/historical/stock-day.html>
- 成交資訊 API: `f"https://www.twse.com.tw/exchangeReport/STOCK_DAY?response=json&date={query_date}&stockNo={stock_ticker}"`

In [13]:
base_api_url = f"https://www.twse.com.tw/exchangeReport/STOCK_DAY"
request_headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}
query_str_params = {
    "date": "202401",
    "stockNo": "2330",
    "response": "json"
}
response = requests.get(base_api_url, headers=request_headers, params=query_str_params)
response_json = response.json()
print(response.status_code)
print(type(response_json))

200
<class 'dict'>


In [14]:
print(response_json)

{'stat': 'OK', 'date': '20240324', 'title': '113年03月 2330 台積電           各日成交資訊', 'fields': ['日期', '成交股數', '成交金額', '開盤價', '最高價', '最低價', '收盤價', '漲跌價差', '成交筆數'], 'data': [['113/03/01', '24,167,721', '16,699,995,060', '697.00', '697.00', '688.00', '689.00', '-1.00', '26,282'], ['113/03/04', '97,210,112', '69,868,348,694', '714.00', '725.00', '711.00', '725.00', '+36.00', '125,799'], ['113/03/05', '73,299,411', '53,751,887,376', '735.00', '738.00', '728.00', '730.00', '+5.00', '69,851'], ['113/03/06', '52,464,833', '38,203,868,985', '718.00', '738.00', '717.00', '735.00', '+5.00', '49,897'], ['113/03/07', '80,382,406', '61,221,034,146', '755.00', '769.00', '754.00', '760.00', '+25.00', '96,348'], ['113/03/08', '98,069,174', '77,295,575,097', '795.00', '796.00', '772.00', '784.00', '+24.00', '110,758'], ['113/03/11', '73,436,931', '56,348,050,108', '768.00', '778.00', '761.00', '766.00', '-18.00', '107,368'], ['113/03/12', '63,336,798', '48,288,411,581', '757.00', '771.00', '754.00', '770.00

## 定義函數 `get_twse_data()`

In [15]:
def get_twse_data(stock_ticker: str, data_year: int, data_month: int) -> dict:
    base_api_url = f"https://www.twse.com.tw/exchangeReport/STOCK_DAY"
    data_year_month = datetime.date(data_year, data_month, 1).strftime("%Y%m")
    request_headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
    }
    query_str_params = {
        "date": data_year_month,
        "stockNo": stock_ticker,
        "response": "json"
    }
    response = requests.get(base_api_url, headers=request_headers, params=query_str_params)
    print(f"Request status code: {response.status_code}")
    response_json = response.json()
    return response_json

In [16]:
tsmc = get_twse_data("2330", 2024, 3)
print(type(tsmc))

Request status code: 200
<class 'dict'>


## 台灣期貨交易所

## 關於台灣期貨交易所

- 首頁：<https://www.taifex.com.tw/>
- 交易資訊頁面：<https://www.taifex.com.tw/cht/3/futAndOptDate>

In [17]:
html_tables = pd.read_html("https://www.taifex.com.tw/cht/3/futAndOptDate")
taifex = html_tables[2]
taifex.iloc[6:, :]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
6,身份別,期貨,選擇權,期貨,選擇權,期貨,選擇權,期貨,選擇權,期貨,選擇權,期貨,選擇權
7,自營商,58556,235407,62737598,839068,73927,241996,66006750,868966,-15371,-6589,-3269152,-29898
8,投信,296,0,1197238,0,107,0,246877,0,189,0,950361,0
9,外資,336141,179245,424746148,902797,359413,181857,473163906,926249,-23272,-2612,-48417758,-23452
10,合計,394993,414652,488680984,1741865,433447,423853,539417533,1795215,-38454,-9201,-50736549,-53350
11,,未平倉餘額,未平倉餘額,未平倉餘額,未平倉餘額,未平倉餘額,未平倉餘額,未平倉餘額,未平倉餘額,未平倉餘額,未平倉餘額,未平倉餘額,未平倉餘額
12,,多方,多方,多方,多方,空方,空方,空方,空方,多空淨額,多空淨額,多空淨額,多空淨額
13,,口數,口數,契約金額,契約金額,口數,口數,契約金額,契約金額,口數,口數,契約金額,契約金額
14,身份別,期貨,選擇權,期貨,選擇權,期貨,選擇權,期貨,選擇權,期貨,選擇權,期貨,選擇權
15,自營商,36991,97475,25434300,863266,317312,105853,132758320,578870,-280321,-8378,-107324020,284396


## 政府資料開放平台

## 關於政府資料開放平台

- 首頁：<https://data.gov.tw>
- 資料集頁面：<https://data.gov.tw/datasets/search?p=1&size=10&s=_score_desc&cgl-3=712&rct=254>
- 資料集 API
    - `f"https://www.twse.com.tw/exchangeReport/{data_name}?response=open_data"`
    - `f"https://www.taifex.com.tw/data_gov/taifex_open_data.asp?data_name={data_name}"`

In [18]:
data_name = "STOCK_DAY_ALL"
api_url = f"https://www.twse.com.tw/exchangeReport/{data_name}?response=open_data"
response = requests.get(api_url)
string_io_response_text = io.StringIO(response.text)
twse = pd.read_csv(string_io_response_text)
print(type(twse))

<class 'pandas.core.frame.DataFrame'>


In [19]:
twse

Unnamed: 0,證券代號,證券名稱,成交股數,成交金額,開盤價,最高價,最低價,收盤價,漲跌價差,成交筆數
0,0050,元大台灣50,7163593.0,1.126524e+09,157.50,158.25,156.50,157.20,-0.20,8945.0
1,0051,元大中型100,44071.0,3.463451e+06,78.65,78.80,78.30,78.40,-0.25,209.0
2,0052,富邦科技,415163.0,6.626355e+07,160.60,160.60,158.40,160.15,0.15,604.0
3,0053,元大電子,13454.0,1.152698e+06,85.80,86.10,85.10,85.90,0.35,111.0
4,0055,元大MSCI金融,886594.0,2.242133e+07,25.31,25.45,25.12,25.30,0.04,516.0
...,...,...,...,...,...,...,...,...,...,...
1225,9944,新麗,153230.0,3.070965e+06,20.10,20.15,20.00,20.05,0.05,108.0
1226,9945,潤泰新,6505834.0,2.282793e+08,35.10,35.35,34.90,35.10,-0.25,4629.0
1227,9946,三發地產,307990.0,7.467439e+06,24.20,24.55,24.00,24.30,-0.20,247.0
1228,9955,佳龍,388365.0,9.266355e+06,23.85,24.15,23.65,23.85,-0.05,291.0


In [20]:
data_name = "MarketDataOfMajorInstitutionalTradersDividedByFuturesAndOptionsBytheDate"
api_url = f"https://www.taifex.com.tw/data_gov/taifex_open_data.asp?data_name={data_name}"
response = requests.get(api_url)
string_io_response_text = io.StringIO(response.text)
taifex = pd.read_csv(string_io_response_text)
print(type(taifex))

<class 'pandas.core.frame.DataFrame'>


In [21]:
taifex

Unnamed: 0,日期,身份別,期貨多方交易口數,選擇權多方交易口數,期貨多方交易契約金額(千元),選擇權多方交易契約金額(千元),期貨空方交易口數,選擇權空方交易口數,期貨空方交易契約金額(千元),選擇權空方交易契約金額(千元),...,期貨多方未平倉契約金額(千元),選擇權多方未平倉契約金額(千元),期貨空方未平倉口數,選擇權空方未平倉口數,期貨空方未平倉契約金額(千元),選擇權空方未平倉契約金額(千元),期貨多空未平倉口數淨額,選擇權多空未平倉口數淨額,期貨多空未平倉契約金額淨額(千元),選擇權多空未平倉契約金額淨額(千元)
0,20240322,自營商,58556,235407,62737598,839068,73927,241996,66006750,868966,...,25434300,863266,317312,105853,132758320,578870,-280321,-8378,-107324020,284396
1,20240322,投信,296,0,1197238,0,107,0,246877,0,...,55654956,877,10299,0,34748783,0,5726,172,20906173,877
2,20240322,外資及陸資,336141,179245,424746148,902797,359413,181857,473163906,926249,...,181490422,287178,228556,43855,215673082,206025,-85179,-12035,-34182660,81153
