diff --git a/docs/source/features/parallelisms.rst b/docs/source/features/parallelisms.rst index b10477e4232c..b6957a415aa9 100644 --- a/docs/source/features/parallelisms.rst +++ b/docs/source/features/parallelisms.rst @@ -44,8 +44,26 @@ Sequence Parallelism Expert Parallelism ^^^^^^^^^^^^^^^^^^ -Expert Paralellim (EP) distributes experts across GPUs. +**Expert Paralellim (EP)** is a type of model parallelism that distributes experts of an MoE across GPUs. +Enabling Expert Parallelism in NeMo +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To enable it users can pass `model.expert_model_parallel_size=k`, where k is an integer with the desired +expert parallelism level, for example if the model has three experts (i.e. `model.num_moe_experts=3`), we can specify +k=3 (i.e. via CLI using `model.expert_model_parallel_size=3`). The number of experts should be exactly divible by the expert_model_parallel_size. + + .. code-block:: yaml + + expert_model_parallel_size: 3 # Set EP to 3 + +For further information on configuration, refer to the following documentation: `NeMo Megatron GPT Config `_. + + +Implementation +~~~~~~~~~~~~~ + +NeMo's expert parallelism functionality is provided by Megatron-LM repository, please consult the corresponding `Moe-layer `_ for more moe implementation details. .. image:: ../nlp/nemo_megatron/images/ep.png :align: center