Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/_pages/config-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -427,7 +427,7 @@ Enabling and configuring ZeRO memory optimizations

| Description | Default |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
| Stage 2 optimization for CPU offloading that parallelizes gradient copying to CPU memory among ranks by fine-grained gradient partitioning. Performance benefit grows with gradient accumulation steps (more copying between optimizer steps) or GPU count (increased parallelism). | `False` |
| Stage 1 and 2 optimization for CPU offloading that parallelizes gradient copying to CPU memory among ranks by fine-grained gradient partitioning. Performance benefit grows with gradient accumulation steps (more copying between optimizer steps) or GPU count (increased parallelism). | `False` |

***offload_param***: [dictionary]

Expand All @@ -439,7 +439,7 @@ Enabling and configuring ZeRO memory optimizations

| Description | Default |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------- |
| Enable offloading of optimizer state to CPU or NVMe, and optimizer computation to CPU. This frees up GPU memory for larger models or batch sizes. Valid only with stage 2 and 3. See [here](#optimizer-offloading) for more details. | `False` |
| Enable offloading of optimizer state to CPU or NVMe, and optimizer computation to CPU. This frees up GPU memory for larger models or batch sizes. Valid for ZeRO stage 1, 2, 3. See [here](#optimizer-offloading) for more details. | `False` |

***stage3_max_live_parameters***: [integer]

Expand Down Expand Up @@ -481,7 +481,7 @@ Enabling and configuring ZeRO memory optimizations

| Description | Default |
| ------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
| Enable offloading of optimizer memory and computation to CPU. This frees up GPU memory for larger models or batch sizes. Valid only with stage 2. | `False` |
| Enable offloading of optimizer memory and computation to CPU. This frees up GPU memory for larger models or batch sizes. Valid with stage 1 and 2. | `False` |


### Parameter offloading
Expand Down Expand Up @@ -536,7 +536,7 @@ Note that if the value of "device" is not specified or not supported, an asserti
| Number of parameter elements to maintain in CPU memory when offloading to NVMe is enabled. | 1e9 |

### Optimizer offloading
Enabling and configuring ZeRO optimization of offloading optimizer computation to CPU and state to CPU/NVMe. CPU offloading is available with ZeRO stage 2 or 3. NVMe offloading is available only with ZeRO stage 3.
Enabling and configuring ZeRO optimization of offloading optimizer computation to CPU and state to CPU/NVMe. CPU offloading is available with ZeRO stage 1, 2, 3. NVMe offloading is available only with ZeRO stage 3.
Note that if the value of "device" is not specified or not supported, an assertion will be triggered.
```json
"offload_optimizer": {
Expand Down
8 changes: 5 additions & 3 deletions docs/_tutorials/zero-offload.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,20 +38,22 @@ Second, we need to apply the following changes to ensure that only one GPU is us
```

### DeepSpeed Configuration Changes
ZeRO-Offload leverages many ZeRO stage 2 mechanisms, and so the configuration changes to enable ZeRO-Offload are an extension of those required to enable ZeRO stage 2. The `zero_optimization` configuration to enable ZeRO-Offload is shown below:
ZeRO-Offload leverages many ZeRO stage 1 and 2 mechanisms, and so the configuration changes to enable ZeRO-Offload are an extension of those required to enable ZeRO stage 1 or 2. The `zero_optimization` configuration to enable ZeRO-Offload is shown below:

```json
{
"zero_optimization": {
"stage": 2,
"cpu_offload": true,
"offload_optimizer": {
"device": "cpu",
}
"contiguous_gradients": true,
"overlap_comm": true
}
}
```

As seen above, in addition to setting the _stage_ field to **2** (to enable ZeRO stage 2), we also need to set _cpu_offload_ flag to **true** to enable ZeRO-Offload optimizations. In addition, we can set other ZeRO stage 2 optimization flags, such as _overlap_comm_ to tune ZeRO-Offload performance. With these changes we can now run the model. We share some screenshots of the training below.
As seen above, in addition to setting the _stage_ field to **2** (to enable ZeRO stage 2, but stage 1 also works), we also need to set the _offload\_optimizer_ device to **cpu** to enable ZeRO-Offload optimizations. In addition, we can set other ZeRO stage 2 optimization flags, such as _overlap\_comm_ to tune ZeRO-Offload performance. With these changes we can now run the model. We share some screenshots of the training below.

Here is a screenshot of the training log:

Expand Down