Skip to content

bank-of-england/MachineLearningCrisisPrediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for the Bank of England Staff Working Paper 848

This repository includes the code used in the Bank of England Staff Working Paper 848 "Credit Growth, the Yield Curve and Financial Crisis Prediction: Evidence from a Machine Learning Approach" by Kristina Bluwstein, Marcus Buckmann, Andreas Joseph, Miao Kang, Sujit Kapadia, and Özgür Şimşek.

In the paper, we develop early warning models for financial crisis prediction using machine learning techniques applied to macrofinancial data for 17 countries over 1870-2016. Machine learning models typically outperform logistic regression in out-of-sample prediction and forecasting experiments. We identify economic drivers of our machine learning models using a novel framework based on Shapley values, uncovering nonlinear relationships between the predictors and the risk of crisis. Across all models, the most important predictors are credit growth and the slope of the yield curve, both domestically and globally.

The dataset we use is the Jordà-Schularick-Taylor Macrohistory Database. It is published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. We accessed Version 3 from the dataset's website. This version is contained in the data folder of this repository.

The code is not intended as a stand-alone package. It can be used to reproduce the results of the paper. Parts of it may be transfered to other applications. No warranty is given. Please consult the licence file.

This repository does not include all results of the experiments. Rather, it contains a small subset of the results to illustrate the empirical methodology and the implementation.

Should you have any questions or spot a bug in the code, please send an email to marcus.buckmann@bankofengland.co.uk or raise an issue within the repository.

Prerequisites

The code has been developed and used under Python 3.6.5, Anaconda distribution and R 3.5.1.

The R script R_installer.R in the setup folder installs all necessary R packages. The file python_env.yml in the setup folder specifies the Anaconda virtual environment in which the experiments were run.

Structure of the code

Estimating the prediction models

The paper is based on two main empirical experiments: cross-validation and forecasting. These experiments are run using the respective Python scripts in the experiments folder. In these scripts, the user can specify the models to be trained, the variables to be included, and how the variables should be transformed. The results of the experiments are written to the results folder. To obtain stable perfomance estimates, we repeat the experiments many times. For this repository, we repeated the 5-fold cross-validation 10 times. Each pickle file in the results folder contains the results of one iteration. Each iteration uses a different random seed and therefore partitions the data into a training and test set differently.

The experiments do not need to be run at once. The user can terminate the experiments after a certain number of iterations and run more iterations at another point in time. Then, new pickle files will be added to the folder. The .txt files in the results folder are written based on the information contained in all the pickle files and are updated after each iteration.

The key files in the results folder are the following: The data[...].txt contains the dataset that is used in the experiment. This is not the raw dataset, rather all transformations and exclusions of data points have been applied.

  • The all_pred[...].txt contains the predictions for each observations, algorithm and iteration.
  • The shapley_append[...].txt show the Shapley values for each observation, predictor and iteration. For each algorithm tested, an individual file is created.
  • The shapley_mean[...].txt file, shows the average Shapely values for each observation and predictor, averaged across all observations. For each algorithm tested, an individual file is created.
  • The mean_fold[...].txt shows the mean performance achieved in the individual folds. The files mean_iter[...].txt and mean_append[...].txt are similar. They just average the results differently. The former measures the performance for each iteration and averages the performance measures across iterations. The latter first averages the predicted values across all observations and then computes the performance on these averaged predicted values.
  • The files se_fold[...].txt and se_iter[...].txt show the standard errors of the respective performance results.

Analysing the results

The analyses of the files in the results folder are conducted in R. In the analysis folder, the files analysis_cross_validation.R and analysis_forecasting.R produce charts and regression models for the two types of experiments.

The Excel sheet visual_params.xlsx in the analysis folder specifies visual characteristics of the plots. The user can alter the name, colour, and symbol of algorithms and variables shown in the charts.

Disclaimer

This package is an outcome of a research project. All errors are those of the authors. All views expressed are personal views, not those of any employer.

Data Classification

Bank of England Data Classification: OFFICIAL BLUE

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published