Skip to content

aniket-agarwal1999/Mixture_of_Experts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mixture of Experts

Introduction

This is a basic implementation of the paper and basically is a toy implementation of the Mixture of Experts algorithm.

So the model basically consist of various expert models which specialize at a particular task rather than a single model being good at that task. And finally weights are assigned to the various experts using a gating network(kind of like attention) where more weight, as a result, is given to the expert good at the particular task in hand.

Running the code

The code has been tested for Python 3.7 and PyTorch v1.3.

For training the model

  • Clone the repository and go to the repo.
python main.py --training True    ### For training

python main.py --testing True     ### For testing
  • Apart from this, the various hyperparameter flags can also be seen from the main.py file and can be tweaked accordingly.

Code structure

  • main.py: Specification of various hyperparameters used during training, along with checkpoint location specifications.

  • train.py: Script for training(along with validating) the model and contains the whole training procedure.

  • test.py: Script for testing the already trained model.

  • model.py: Contains the architecture of model and the backbone used.

  • utils.py: Contains the various helper functions along with function for getting dataset.

Further things to be done

  • I am still not able to completely get the EM algorithm specified in the paper for optimizing the weights, the reason for which has also been specified in the utils.py file.

About

Implementation of Mixture of Experts paper

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages