Model Benchmarks

1. Setting

The sequential models make predictions using 14 days historical data, together with the non-sequential data stock_basics obtained by tushare.

In current version, stock data before March 1st, 2020 are regarded as training set, and the data after are validation set. This might not be a good split point since the American market has been crashed during this specific perioid of time. The fact that val_acc > train_acc also confirms this point.

Early stopping is applied as the model get overfitting quite easily.

2. Model update logs

Model	How to process sequential data	Feature engineering
SimpleSequential V1	concatenating LSTM final hidden states with stock basic embeddings	baseline
LuongAttentionLSTM v2	seq2seq, where decoder only output once	SSE Composition incorporated, drop all `*ST` stocks

3. Model results

The columns 2018 and 2019 represents the percentage of profit obtained via trading simulation in the corresponding year. By defaults, simulation time starts from Dec last year to Dec this year.

Model	Version	Accuracy	Precision	Recall	2018	2019	Remarks
SimpleSequentialModel	V1	86.31%	92.37%	90.55%	74.33%	52.59%	val_acc > train_acc
LuongAttentionLSTM	V1	85.52%	92.52%	89.33%
seq2seqAttentionLSTM	V2	82.67%	90.90%	87.12%

4. Embedding quality

All models are shit. Their embedding don't even make a fucking sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_benchmark.md

model_benchmark.md

Model Benchmarks

1. Setting

2. Model update logs

3. Model results

4. Embedding quality

Files

model_benchmark.md

Latest commit

History

model_benchmark.md

File metadata and controls

Model Benchmarks

1. Setting

2. Model update logs

3. Model results

4. Embedding quality