Predicting financial time series by use of shapelets and trend lines while exploring the notion of concept drift within

Aim

This was an honours thesis project. I worked on the part of the project that aimed at extracting recurring patterns/shapes from time series data for prediction. Robby worked on the trendline portion of the project. Andre explored the notion of concept drift and methods for detection and avoiding drift.

Shapelets:

Intro

The inspiration for this project came from the work done by Lines et al. on the Shapelets Transform. Ye et al. introduced the notion of shapelets in 2009 as a new data mining primitive.

Process

Extract variable length shapelets from the datasets of 35 stocks. See below images for examples of the extracted shapelets classes and an illustration of the classes extracted.
Train a LSTM based classifier on the extracted shapelet classes.
Predict prices by inputting "future" sequences into the LSTM so that it can classify which pattern/shapelet the sequence looks like. Use the standardized version of that classified shape to predict prices by unstandardizing using the mean and standard deviation of the "future" sequence. See prediction images below.

Plot of shapelet classes

Extracted classes on the source dataset	Extracted classes (standardized shapes)

Predicting 1 day ahead

Predicting 7 days ahead

Concept Drift:

Intro

Concept drift is when the underlying data changes due to some unknown or unforeseen cause, making it so your machine learning models predictions become less accurate. Machine learning models must be able to identify and deal with multiple changing concepts as time goes on.

Process

Two concept drift detection methods implemented were the Page-Hinkley (PH) and Early Drift Detection Method (EDDM) detectors. These were paired with three mitigation strategies:

The fixed sliding window, used a fixed number of data points to make a prediction.
The adjusting sliding window, used varying number of data points depending on if drifts were detected or not to make a prediction.
An ensemble approach, train four models on their own set of data and take a averaged prediction.

Final results:

Trend Lines:

Intro

Trend lines are used as a feature extraction technique. Machine learning models are used to make predictions using these features. Trend lines are a smoothing additive that can represent the local change of a time series. Trends are represented by a slope and a duration.

Process

Extract trend lines from the financial time series data. Four segmentation algorithms were used to extract the trend lines.
Train two types of machine learning models using these trend lines as input to predict the next trend lines in the time series.
Evaluate trend lines predicted and compare the results to only using point data.

Final results:

Trend lines performed worse than the baseline, which was a MLP trained on the point data. The graph below shows the best predicted trends vs actual trend lines. The best segmentation algorithm being the Sliding Window approach.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
concept_drift		concept_drift
data		data
shapelets		shapelets
trend_lines		trend_lines
.gitignore		.gitignore
FinalResults.png		FinalResults.png
Makefile		Makefile
README.md		README.md
class_len_21_mse04.png		class_len_21_mse04.png
larger_classes_series_1.png		larger_classes_series_1.png
main.py		main.py
naspers_lstm_BEST.png		naspers_lstm_BEST.png
predict.py		predict.py
predictedvsactualtrend.png		predictedvsactualtrend.png
redefine_manual_best.png		redefine_manual_best.png
requirements.txt		requirements.txt
stock_analysis.py		stock_analysis.py
wrong_trends.png		wrong_trends.png

Oliverdeb/time-series-analysis-ml-honours-thesis

Folders and files

Latest commit

History

Repository files navigation

Predicting financial time series by use of shapelets and trend lines while exploring the notion of concept drift within

Aim

Shapelets:

Intro

Process

Plot of shapelet classes

Predicting 1 day ahead

Predicting 7 days ahead

Concept Drift:

Intro

Process

Final results:

Trend Lines:

Intro

Process

Final results:

About

Resources

Stars

Watchers

Forks

Languages