Time Series Machine Learning Analysis and Demand Forecasting with H2O & TSstudio
Traditional approaches to time series analysis and forecasting, like Linear Regression, Holt-Winters Exponential Smoothing, ARMA/ARIMA/SARIMA and ARCH/GARCH, have been well-established for decades and find applications in fields as varied as business and finance (e.g. predict stock prices and analyse trends in financial markets), the energy sector (e.g. forecast electricity consumption) and academia (e.g. measure socio-political phenomena).
In more recent times, the popularisation and wider availability of open source frameworks like Keras, TensorFlow and scikit-learn helped machine learning approaches like Random Forest, Extreme Gradient Boosting, Time Delay Neural Network and Recurrent Neural Network to gain momentum in time series applications. These techniques allow for historical information to be introduced as input to the model through a set of time delays.
In this project I go through the various steps needed to build a time series machine learning pipeline and generate a weekly revenue forecast. I carry out a more “traditional” exploratory time series analysis with TSstudio and create a number of predictors using the insight I gather. I then train and validate an array of machine learning models with the open source library H2O, and compare the models’ accuracy using performance metrics and actual vs predicted plots.
In this post I’m simply loading up the compiled dataset but I’ve also written a post called Loading, Merging and Joining Datasets where I show how I’ve assembled the various data feeds and sorted out the likes of variable naming, new features creation and some general housekeeping tasks. You can find the full data code on this Github repository.
You can find the final article on my website
I've also published the article on The Startup