Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

Simple examples of FP16_Optimizer functionality

To use FP16_Optimizer on a half-precision model, or a model with a mixture of half and float parameters, only two lines of your training script need to change:

  1. Construct an FP16_Optimizer instance from an existing optimizer.
  2. Replace loss.backward() with optimizer.backward(loss).

Full API Documentation

See "Other Options" at the bottom of this page for some cases that require special treatment.

Minimal Working Sample

minimal.py shows the basic usage of FP16_Optimizer with either static or dynamic loss scaling. Test via python minimal.py.

Closures

FP16_Optimizer supports closures with the same control flow as ordinary Pytorch optimizers.
closure.py shows an example. Test via python closure.py.

See the API documentation for more details.

Serialization/Deserialization

FP16_Optimizer supports saving and loading with the same control flow as ordinary Pytorch optimizers. save_load.py shows an example. Test via python save_load.py.

See the API documentation for more details.

Distributed

distributed_apex shows an example using FP16_Optimizer with Apex DistributedDataParallel. The usage of FP16_Optimizer with distributed does not need to change from ordinary single-process usage. Test via

cd distributed_apex
bash run.sh

distributed_pytorch shows an example using FP16_Optimizer with Pytorch DistributedDataParallel. Again, the usage of FP16_Optimizer with distributed does not need to change from ordinary single-process usage. Test via

cd distributed_pytorch
bash run.sh

Other Options

Gradient clipping requires that calls to torch.nn.utils.clip_grad_norm be replaced with fp16_optimizer_instance.clip_master_grads(). The word_language_model example uses this feature.

Multiple losses will work if you simply replace

loss1.backward()
loss2.backward()

with

optimizer.backward(loss1)
optimizer.backward(loss2)

but FP16_Optimizer can be told to handle this more efficiently using the update_master_grads() option.