趨勢科技 : 台灣ETF價格預測競賽
---
Kenny Hsieh, 2018/4/30

- [官方競賽網站](https://tbrain.trendmicro.com.tw/Competitions/Details/2)
- `ETF_Modeling.ipynb` : 資料讀取、資料處理、模型建立、輸出預測結果()
- `ETF_Price_Performance.ipynb` : 衡量預測結果，計算分數

## Brief Introduction
- 依據主辦單位提供之台灣十八檔上市櫃成分證券ETF (截至4/27) 預測下週星期一 (4/30) 之漲跌及價格
- 由於礙於背景關係，較缺乏股票財金等相關領域知識，此次預測策略採用 Long Short-Term Memory (LSTM)時間序列相關神經網路模型來實作
- 同時，由於訓練 LSTM 網路需耗費相當大量運算資源，因此使用 Google Colaboratory 提供之雲端 GPU 環境執行訓練過程

In [204]:
# 將股價資料上傳至 Google 雲端
from google.colab import files

uploaded = files.upload()

Saving tetfp.csv to tetfp (2).csv


## Data Preprocessing

In [338]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
%matplotlib inline

# 將日期欄位轉變為 datetime 格式
dateparse = lambda x: pd.datetime.strptime(x, '%Y%m%d')

etf_price = pd.read_csv('tetfp.csv', encoding = 'Big5', sep = '","', header = 0, parse_dates=['日期'], date_parser = dateparse)
etf_price.head()

  if __name__ == '__main__':


Unnamed: 0,"""代碼",日期,中文簡稱,開盤價(元),最高價(元),最低價(元),收盤價(元),"成交張數(張)"""
0,"""0050",2013-01-02,元大台灣50,54.0,54.65,53.9,54.4,"16,487"""
1,"""0050",2013-01-03,元大台灣50,54.9,55.05,54.65,54.85,"29,020"""
2,"""0050",2013-01-04,元大台灣50,54.85,54.85,54.4,54.5,"9,837"""
3,"""0050",2013-01-07,元大台灣50,54.55,54.55,53.9,54.25,"8,910"""
4,"""0050",2013-01-08,元大台灣50,54.0,54.2,53.65,53.9,"12,507"""


In [339]:
# 清除讀取資料欄位產生的雜項雕點符號、空格
etf_price.columns = ['Code', 'Date', 'Name', 'Open', 'High', 'Low', 'Close', 'Volume']
etf_price['Code'] = etf_price['Code'].map(lambda x: x.lstrip('"').replace(" ", ""))
etf_price['Volume'] = etf_price['Volume'].map(lambda x: x.rstrip('"').rstrip('",'))
#etf_price = etf_price.drop(['Name'], axis = 1)
etf_price.head()

Unnamed: 0,Code,Date,Name,Open,High,Low,Close,Volume
0,50,2013-01-02,元大台灣50,54.0,54.65,53.9,54.4,16487
1,50,2013-01-03,元大台灣50,54.9,55.05,54.65,54.85,29020
2,50,2013-01-04,元大台灣50,54.85,54.85,54.4,54.5,9837
3,50,2013-01-07,元大台灣50,54.55,54.55,53.9,54.25,8910
4,50,2013-01-08,元大台灣50,54.0,54.2,53.65,53.9,12507


## Auxiliary Function
此次目標為預測18檔ETF股票，每檔股票皆須有特定對應之模型，因此設計函式的型式，以利程式重複利用
- 衡量最後一天預測股價表現 : `evaluate_lastday_result()`
- 預測隔天預測漲跌、股價：`predict_nextday_result()`
- LSTM模型架構設計：`Pipeline_LSTM()`

In [None]:
def evaluate_lastday_result(code, date, today, predict):
  
  lastday_date = date + timedelta(days = 1)

  result = pd.DataFrame(columns = ["Code", "Date", "Actual", "Evaluate"])
  result.loc[0] = [code, lastday_date.date(), today, predict]

  return result

In [None]:
def predict_nextday_result(code, date, today, nextday):
  
  nextday_date = date + timedelta(days = 3)
  trend = 1 if nextday > today else -1 if nextday < today else 0
  
  result = pd.DataFrame(columns = ["Code", "Date", "Trend", "Predict"])
  result.loc[0] = [code, nextday_date.date(), trend, nextday]
  return result

In [None]:
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM, GRU, Dropout

def Pipeline_LSTM(code, dataset):

  # 目前設定 Time Window 為 5，後續可調整觀察模型表現
  sliding_windows = 5
  dataset = dataset.loc[dataset['Code'] == code]
  
  # 資料正規化，使用 MinMaxScaler 正規化至 0 與 1 之間
  sc = MinMaxScaler(feature_range = (0, 1))
  train_sc = sc.fit_transform(np.array(dataset['Close'].values).reshape(-1, 1))
  
  # 切割訓練與測試資料集
  X_train = []
  y_train = []

  for i in range(sliding_windows, train_sc.shape[0]):
    X_train.append(train_sc[i - sliding_windows:i, 0])
    y_train.append(train_sc[i, 0])
  X_train, y_train = np.array(X_train), np.array(y_train)

  X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
  
  # 建立 LSTM 模型 ： 設計三層LSTM層，並加入 Dropout 減少 Overfitting
  model = Sequential()
  model.add(LSTM(units = 128,
                      input_shape = (X_train.shape[1], 1),
                      return_sequences = True))
  model.add(Dropout(0.3))

  model.add(LSTM(units = 128,
                      return_sequences = True))
  model.add(Dropout(0.3))
  
  model.add(LSTM(units = 64))
  model.add(Dropout(0.3))

  model.add(Dense(units = 1))

  print("Handling %s ETF" %code)
  
  # 設定優化器及損失函數
  model.compile(optimizer = 'adam', loss = 'mean_squared_error')
  model.fit(X_train, y_train, epochs = 70, batch_size = 32, validation_split = 0.1)
  
  actual_lastday = dataset["Close"][-1:].values[0]
  
  
  # 衡量最後一天預測股價表現
  lastday_date = dataset['Date'].iloc[-2]
  X_validate = sc.transform(np.reshape(dataset['Close'][-6:-1].values, (-1, 1)))
  X_validate = np.reshape(X_validate, (X_validate.shape[1], X_validate.shape[0], 1))
  
  predict_lastday = model.predict(X_validate)
  predict_lastday = float(sc.inverse_transform(predict_lastday)[0])
  evaluate_lastday_df = evaluate_lastday_result(code, lastday_date, actual_lastday, predict_lastday)
  
  # 預測隔天預測漲跌、股價
  nextday_date = dataset['Date'].iloc[-1]
  X_test = sc.transform(np.reshape(dataset['Close'][-5:].values, (-1, 1)))
  X_test = np.reshape(X_test, (X_test.shape[1], X_test.shape[0], 1))
  
  predict_nextday = model.predict(X_test)
  predict_nextday = float(sc.inverse_transform(predict_nextday)[0])
  predict_nextday_df = predict_nextday_result(code, nextday_date, actual_lastday, predict_nextday)
  
  return evaluate_lastday_df, predict_nextday_df

## Predicting Eighteen ETF Stock Price 

In [348]:
# 將資料集中18檔ETF依次投入模型之中，最後回傳統整之衡量及預測表現
etf_code = pd.unique(pd.Series(etf_price["Code"]))

evaluate_result = pd.DataFrame()
predict_result = pd.DataFrame()

## 相當耗費時間
for etf in etf_code:
  evaluate, predict = Pipeline_LSTM(etf, etf_price)
  evaluate_result = evaluate_result.append(evaluate, ignore_index = True)
  predict_result = predict_result.append(predict, ignore_index = True)

Handling 0050 ETF
Train on 1168 samples, validate on 130 samples
Epoch 1/70
Epoch 2/70
Epoch 3/70
Epoch 4/70
Epoch 5/70
Epoch 6/70
Epoch 7/70
Epoch 8/70
Epoch 9/70
Epoch 10/70
Epoch 11/70
Epoch 12/70
Epoch 13/70
Epoch 14/70
Epoch 15/70
Epoch 16/70
Epoch 17/70
Epoch 18/70
Epoch 19/70
Epoch 20/70
Epoch 21/70
Epoch 22/70
Epoch 23/70
Epoch 24/70
Epoch 25/70
Epoch 26/70
Epoch 27/70
Epoch 28/70
Epoch 29/70
Epoch 30/70
Epoch 31/70
Epoch 32/70
Epoch 33/70
Epoch 34/70
Epoch 35/70
Epoch 36/70
Epoch 37/70
Epoch 38/70
Epoch 39/70
Epoch 40/70
Epoch 41/70
Epoch 42/70
Epoch 43/70
Epoch 44/70
Epoch 45/70
Epoch 46/70
Epoch 47/70
Epoch 48/70
Epoch 49/70
Epoch 50/70
Epoch 51/70
Epoch 52/70
Epoch 53/70
Epoch 54/70
Epoch 55/70
Epoch 56/70
Epoch 57/70
Epoch 58/70
Epoch 59/70
Epoch 60/70
Epoch 61/70
Epoch 62/70
Epoch 63/70
Epoch 64/70
Epoch 65/70
Epoch 66/70
Epoch 67/70
Epoch 68/70
Epoch 69/70
Epoch 70/70
2018-04-27 00:00:00
2018-04-30 00:00:00
Handling 0051 ETF
Train on 1168 samples, validate on 130 samples

In [349]:
# 衡量最後一天預測股價表現
evaluate_result

Unnamed: 0,Code,Date,Actual,Evaluate
0,50,2018-04-27,79.2,79.684006
1,51,2018-04-27,32.11,31.961449
2,52,2018-04-27,53.2,52.51384
3,53,2018-04-27,34.2,34.336716
4,54,2018-04-27,23.09,23.096197
5,55,2018-04-27,17.04,16.86327
6,56,2018-04-27,25.15,25.144646
7,57,2018-04-27,48.74,48.668575
8,58,2018-04-27,45.06,45.990833
9,59,2018-04-27,41.97,40.994087


In [350]:
# 預測隔天預測漲跌、股價
predict_result

Unnamed: 0,Code,Date,Trend,Predict
0,50,2018-04-30,1,79.486275
1,51,2018-04-30,-1,31.813707
2,52,2018-04-30,-1,52.205582
3,53,2018-04-30,-1,34.138184
4,54,2018-04-30,-1,23.049755
5,55,2018-04-30,-1,16.896467
6,56,2018-04-30,-1,25.056149
7,57,2018-04-30,-1,48.434689
8,58,2018-04-30,1,45.758862
9,59,2018-04-30,-1,41.104889


## Save Result to CSV

In [None]:
evaluate_result.to_csv("evaluate_result.csv", sep=',', index = False, encoding='utf-8')
predict_result.to_csv("predict_result.csv", sep=',', index = False, encoding='utf-8')
files.download("evaluate_result.csv")
files.download("predict_result.csv")

## Modeling Process Over
- 此筆記本僅記錄到從資料處理、建立模型到最終預測結果，後續衡量表現可參考 `ETF_Price_Performance.ipynb`