Welcome to the EmbeddingPortfolio wiki!
In this repo will find two quantlets: NMFRB and TailRiskAERB. Please read the corresponding papers. To reproduce the analysis please refer to the corresponding quantlet.
You will also find the required package to reproduce the papers' results.
This is the dl_portfolio
package.
- First, create a virtual environment with python3.8 with conda (or virtualenv):
conda create -n NAME_OF_ENV python=3.8
- Set the global variables path in
dl_portfolio/pathconfig.py
- Install the package and requirements from
setup.py
with:
pip install . --upgrade
The data that support the findings of this study are available from Bloomberg and the Blockchain Research Center (BRC). Restrictions apply to the availability of these data, which were used under license for this study. Data are available on request from the corresponding authors with the permission of Bloomberg and BRC.
Our results are available: https://drive.google.com/drive/folders/1oIGQeLlQi6rZ6L-dpxTj9A0TtjICTR6X?usp=drive_link
The training is done using main.py. Please check the arguments in the file.
The configuration for training are located in dl_portolfio/config/
nmf_config.py
for NMF trainingae_config.py
for AE training
The configuration for running NMF training and experiments are in dl_portolfio/config/nmf_config.py
- Copy
nmf_config_dataset1.py
in dl_portolfio/config innmf_config.py
- Then run
python main.py --n=N_EXPERIMENT --n_jobs=N_PARALLEL_JOBS --run=nmf
- Copy
nmf_config_dataset2.py
in dl_portolfio/config innmf_config.py
- Then run
python main.py --n=N_EXPERIMENT --n_jobs=N_PARALLEL_JOBS --run=nmf
The configuration for running AE training and experiments are in dl_portolfio/config/ae_config.py
- Copy
ae_config_dataset1.py
in dl_portolfio/config inae_config.py
- Then run
python main.py --n=N_EXPERIMENT --n_jobs=N_PARALLEL_JOBS --run=ae
The configuration for running AE training and experiments are in dl_portolfio/config/ae_config.py
- Copy
ae_config_dataset2.py
in dl_portolfio/config inae_config.py
- Then run
python main.py --n=N_EXPERIMENT --n_jobs=N_PARALLEL_JOBS --run=ae
ARMA-GARCH modelling is done using R in activationProba
.
- First prepare the data using
create_lin_activation.py
and specify the base_dir where you saved the AE result
python create_lin_activation.py --base_dir=final_models/ae/dataset1/m_0_dataset1_nbb_resample_bl_60_seed_0_1647953383912806
- repeat for dataset2
Before running the script, define the parameters in config/config.json
- Dataset1: Copy config_dataset1.json in config.json and run activationProba.R
- Dataset2: Copy config_dataset2.json in config.json and run activationProba.R
- For AE and NMF model, this is done using
performance.py
script. Check the script argument. - For ARMA-GARCH model, this is done using
hedge_performance.py
script after runningperformance.py
. Check the script argument. You also need to modify the paths for your outputs of garch and ae modelling directly in the code: DATA_BASE_DIR_1, GARCH_BASE_DIR_1, PERF_DIR_1, DATA_BASE_DIR_2, GARCH_BASE_DIR_2, PERF_DIR_2.
Finally, you can look at the notebooks which produce all figure in the paper. Modify the output folders at the beginning of the files with your own outputs.
- Backtest.ipynb: Analysis of the backtest result and production of various statistics
- Interpretation.ipynb: Interpretation of the embedding
- activationProba.ipynb: Analysis of the hedged strategies
For autoencoder training modify the config/ae_config.py
and for NMF training, config/nmf_config.py
.
The training configuration parameters are shared for ae and nmf config, but of course nmf config has less configuration.
dataset
: name of dataset, 'dataset1' or 'dataset2'show_plot
: booleansave
: boolean: save the resultnmf_model
: path to nmf model weightsresample
: dict, resample method, ex:
resample = {
'method': 'nbb',
'where': ['train'],
'block_length': 60,
'when': 'each_epoch'
}
seed
: Optional, random seedencoding_dim
: int, encoding dimensionbatch_normalization
: boolean, perform batch normalization after encodinguncorrelated_features
: boolean, use uncorrelated features constraintweightage
: float, uncorrelated features constraint penaltyortho_weightage
: float, orthogonality constraint penaltyl_name
: string, regularization (follow keras names`: 'l1')l
: float, regularization penaltyactivation
: string, activation function (follow keras names`: 'relu')- `features_config = None
model_name
: stringscaler_func
: dict, scaler method`:
{
'name'`: 'StandardScaler'
}
model_type
: string, ('ae_model')learning_rate
: float, learning rateepochs
: int, number of epochsbatch_size
: intval_size
: int, Optionaltest_size
: int, 0label_param
: Optionalrescale
: Optionalactivity_regularizer
: Optionalkernel_initializer
: tf.keras.initializerskernel_regularizer
: tf.keras.regularizers, use orthogonality`:
WeightsOrthogonality(
encoding_dim,
weightage=ortho_weightage,
axis=0,
regularizer={'name'`: l_name, 'params'`: {l_name`: l}}
)
callback_activity_regularizer
: boolean, use callback (False)kernel_constraint
: tf.keras.constraints, useNonNegAndUnitNorm(max_value=1., axis=0)
callbacks
: Dict, keras callbacks, ex`:
callbacks = {
'EarlyStopping'`: {
'monitor'`: 'val_loss',
'min_delta'`: 1e-3,
'mode'`: 'min',
'patience'`: 100,
'verbose'`: 1,
'restore_best_weights'`: True
}
}
data_specs
: Dict[Dict], with keys 0, 1, 2, ..., N, for each cv fold with keys, 'start', 'val_start', 'test_start' (Optional) and 'end', ex`:
data_specs = {
0`: {
'start'`: '2016-06-30',
'val_start'`: '2019-11-13',
'test_start'`: '2019-12-12',
'end'`: '2020-01-11'
},
1`: {
'start'`: '2016-06-30',
'val_start'`: '2019-12-13',
'test_start'`: '2020-01-12',
'end'`: '2020-02-11'
}
}