Skip to content

Latest commit

 

History

History
30 lines (19 loc) · 1.53 KB

model_benchmark.md

File metadata and controls

30 lines (19 loc) · 1.53 KB

Model Benchmarks

1. Setting

The sequential models make predictions using 14 days historical data, together with the non-sequential data stock_basics obtained by tushare.

In current version, stock data before March 1st, 2020 are regarded as training set, and the data after are validation set. This might not be a good split point since the American market has been crashed during this specific perioid of time. The fact that val_acc > train_acc also confirms this point.

Early stopping is applied as the model get overfitting quite easily.

2. Model update logs

Model How to process sequential data Feature engineering
SimpleSequential V1 concatenating LSTM final hidden states with stock basic embeddings baseline
LuongAttentionLSTM v2 seq2seq, where decoder only output once SSE Composition incorporated, drop all *ST stocks

3. Model results

The columns 2018 and 2019 represents the percentage of profit obtained via trading simulation in the corresponding year. By defaults, simulation time starts from Dec last year to Dec this year.

Model Version Accuracy Precision Recall 2018 2019 Remarks
SimpleSequentialModel V1 86.31% 92.37% 90.55% 74.33% 52.59% val_acc > train_acc
LuongAttentionLSTM V1 85.52% 92.52% 89.33%
seq2seqAttentionLSTM V2 82.67% 90.90% 87.12%

4. Embedding quality

All models are shit. Their embedding don't even make a fucking sense.