NVIDIA · akoumpa · May 8, 2024 · May 8, 2024 · May 8, 2024 · May 10, 2024
diff --git a/docs/source/features/parallelisms.rst b/docs/source/features/parallelisms.rst
@@ -44,8 +44,26 @@ Sequence Parallelism
 
 Expert Parallelism
 ^^^^^^^^^^^^^^^^^^
-Expert Paralellim (EP) distributes experts across GPUs.
+**Expert Paralellim (EP)** is a type of model parallelism that distributes experts of an MoE across GPUs.
 
+Enabling Expert Parallelism in NeMo
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To enable it users can pass `model.expert_model_parallel_size=k`, where k is an integer with the desired
+expert parallelism level, for example if the model has three experts (i.e. `model.num_moe_experts=3`), we can specify
+k=3 (i.e. via CLI using `model.expert_model_parallel_size=3`). The number of experts should be exactly divible by the expert_model_parallel_size.
+
+   .. code-block:: yaml
+
+       expert_model_parallel_size: 3  # Set EP to 3
+
+For further information on configuration, refer to the following documentation: `NeMo Megatron GPT Config <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L68>`_.
+
+
+Implementation
+~~~~~~~~~~~~~
+
+NeMo's expert parallelism functionality is provided by Megatron-LM repository, please consult the corresponding `Moe-layer <https://github.com/NVIDIA/Megatron-LM/blob/e2ec14ab5690fead7e33760b0f8fb20c83b4fd1f/megatron/core/transformer/moe/moe_layer.py#L29>`_ for more moe implementation details.
 
 .. image:: ../nlp/nemo_megatron/images/ep.png
     :align: center