Code for the paper Modular meta-learning
The code has been refactored to make it easy to extend with your own variations.
- (Optional) CUDA GPU support. If you don't have it, you can comment all lines with nn_device and cuda() references.
I want to create my own composition; what should I do?
- Composer: defines how to compute the output from the input and a collection of modules.
- Structure: defines a structure initialization and a structure proposal function.
The simplest example of how to do this is the sum. We will shortly add a more complex Graph Neural Network composer.
Finally, you should import your new file in modular_metalearning and add it to the possible compositions at the very beginning of BounceGrad
First you have to generate the datasets using create_functions_datasets. For instance to create the functions dataset:
python create_functions_datasets.py --mode sum --not_alone --out_file sums.hdf5
To create the sines dataset:
python create_functions_datasets.py --mode sines --out_file sines.hdf5
The file has a couple flags to customize how many elements per dataset and how many datasets to create.
Then you run modular_metalearning. For example to run BounceGrad in the sums dataset:
python modular_metalearning.py --type_modules affine-1-16-16-1,affine-1-16-1 --num_modules 10,10 --composer sum-2 --meta_lr 0.003 --plot_name BounceGrad --limit_data 80 --optimization_steps 3000 --split_by_file --meta_split 90,10,0 --data_split 20,80,0 --data_desc HDF5@sums.hdf5 --meta_batch_size 32
or to run MAML in the sines dataset:
python modular_metalearning.py --type_modules affine-1-64-64-1 --num_modules 1 --composer functionComposition-1 --meta_lr 0.001 --plot_name MAML --limit_data 80 --optimization_steps 5000 --split_by_file --meta_split 90,10,0 --data_split 20,80,0 --data_desc HDF5@sines.hdf5 --meta_batch_size 16 --max_datasets 300 --MAML --MAML_step_size 0.1
You can run either of the 4 algorithms described in the paper:
- Pooled: monolithic structure with fixed weights.
- MAML: monolithic structure with adaptable weights.
- BounceGrad: adaptable modular structructure with fixed weights.
- MoMA: adaptable modular structure with adaptable weights.
The 4 algorithms come from two independent decisions: whether to have more than one module and whether to add MAML.
All options are flags in modular_metalearning. To activate MAML use the flag
--MAML and you also have access to customizing the step size and number of steps.
To not have modularity use
--composer functionComposition-1, since a functionComposition of depth 1 is just applying that one function.
To use modularity simply specify your composer and add more than one module in the
--num_modules flag. If your composition requires more than one module type describe both the types and the numbers separated by commas, for example:
--type_modules relu-1-64-64-1,affine-1-16-1 --num_modules 5,10.
How many modules and how many optimization steps?
In our experience the number of modules is a pretty flexible hyperparameter; we recommend putting it 10-30% above the number of modules you think would be needed in an optimal split. For example, in the functions experiments there are 16 functions, thus needing 16 modules; we rounded it up to 20 modules. Experiments with little less and little more did not affect much.
Beware that adding modules makes the use of each module less frequent, and therefore you may need to adjust
Some works using this code
- Learning Quickly to Plan Quickly Using Modular Meta-Learning Chitnis et al.
- Modular meta-learning in Abstract Graph Networks for combinatorial generalization; Alet et al. ; to be presented in NIPS meta-learning workshop 2018
Please email me at alet(at)mit(dot)edu.
From the original paper:
- Add cleaned up version of backboneConcat composition
- Add cleaned up version of attention composition
From following work:
- Add Graph Neural Network composition
- Add support for CNNs