MACHINE LEARNING WRAPPER for projection analyses
Simple Python (2.7) wrapper for machine learning models in the context of lead-lag projection modelling
based on Bank of England SWP 674: Machine learning at central banks (September 2017).
Authors: Chiranjit Chakraborty & Andreas Joseph.
Disclaimer: licence.txt and SWP 674 disclaimer apply.
Questions, comments and bug reports can be sent to firstname.lastname@example.org.
Please see the "Issues" tab for current and closed issues.
General package features:
- simplified use of sophisticated ML models (wrapper based on scikit-learn package (http://scikit-learn.org/stable/)) - automated bootstrapped training-testing framework (bagging) - time series training, test and projection framework - cross-sectional training & testing - conditional model predictions - set of diagnostic tools, including several specialised plots like prediction interval (fan) charts - general data handling capabilities - I/O options for results, model instances and plots
1. download zipped repository (green ''Clone or download'' button) 2. unpack into desired project directory 3. set this directory as **main_path** in **config_XXX.py** 4. make sure that the **code** sub-directory is on your Python path 5. optional: customisation (model selection, time horizon, bootstraps, etc.) 6. run **__A__ML_main.py**: Triggers sequence of other scripts to be run. Shows/saves eventual outputs accordingly.
If you do not have Python installed, a fully working free scientific distribution which includes all necessary packages can be found here: https://www.anaconda.com/download/. The current wrapper only works for Python 2.7, but can be easily ported to 3.x, mostly based on the print function.
1. load packages configuration file (allow for flexibility and arbitrary new features can be added) two cases implemented for demonstration: - UK inflation forecasting: macro time series (sources: BoE, ONS, Worldbank, BIS) - BJ air quality modelling: hourly dataset (Jan 2010 - Dec 2014, source: Song Xi Chen, csx'@'gsm.pku.edu.cn, Guanghua School of Management, Center for Statistical Science, Peking University 2. load & transform data (__A__ML_load_data.py) 3. time-series training-testing (__B__ML_projections.py) 4. lead-lag shift analysis (__C__ML_shift_analysis.py; model evaluation for different forecast horizons) 5. diagnostic plots (__D__ML_diagnostics.py; data series, projections, conditional model output (heatmap), conditional fan chart, feature importance by horizon)
- dependencies run from bottom to top, i.e. serial execution is recommended - everything depends on the config-part - __B__ and __C__ are independent - __D__ depends on A-C, conditions on options set in config file - also see aux__ML_import_packages.py
Directory structure (can be changed in config files):
- code includes the package - results stores all non-graphical output - figures stores all graphical output - data holds the input data
Data sources and acknowledgement
We would like to kindly thank the below persons and institutions which made this work possible: