This is a basic implementation of the paper and basically is a toy implementation of the Mixture of Experts algorithm.
So the model basically consist of various expert models which specialize at a particular task rather than a single model being good at that task. And finally weights are assigned to the various experts using a gating network(kind of like attention) where more weight, as a result, is given to the expert good at the particular task in hand.
The code has been tested for Python 3.7
and PyTorch v1.3
.
For training the model
- Clone the repository and go to the repo.
python main.py --training True ### For training
python main.py --testing True ### For testing
- Apart from this, the various hyperparameter flags can also be seen from the
main.py
file and can be tweaked accordingly.
-
main.py
: Specification of various hyperparameters used during training, along with checkpoint location specifications. -
train.py
: Script for training(along with validating) the model and contains the whole training procedure. -
test.py
: Script for testing the already trained model. -
model.py
: Contains the architecture of model and the backbone used. -
utils.py
: Contains the various helper functions along with function for getting dataset.
- I am still not able to completely get the EM algorithm specified in the paper for optimizing the weights, the reason for which has also been specified in the
utils.py
file.