Move the module to the precision dtype #12591

yuvalkirstain · 2022-04-03T05:28:36Z

🐛 Bug

@SeanNaren
When I use bf-16 and check the dtype of the model, it seems like the model's precision is fp32 (and I do not see the memory gains I expect). On other frameworks that support bf-16 (like fairseq) the model's dtype is torch.bfloat16. Is there a simple example that "proves" that this feature reduces the memory consumption as it should? I suspect that there might be something wrong (but of course, I might be wrong).
Thank you!

To Reproduce

launch any job with precision=bf16 and compare with precision=32.

Expected behavior

This feature should save 30-50% memory but I do not see such gains in lightning.

Environment

CUDA:
- GPU:
  - GeForce RTX 3090
- available: True
- version: 11.3
Packages:
- numpy: 1.21.2
- pyTorch_debug: False
- pyTorch_version: 1.11.0
- pytorch-lightning: 1.6.0dev
- tqdm: 4.62.3
System:
- OS: Linux
- architecture:
  - 64bit
  - ELF
- processor: x86_64
- python: 3.8.12
- version: Load fix #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019

Additional context

BF-16 is a very important feature. It is usually more stable than fp16 and lightning should support it effectively (models that are pretrained with bf-16 should not be used with fp-16) :)

cc @Borda @tchaton @rohitgr7 @carmocca @justusschock @awaelchli @akihironitta

The text was updated successfully, but these errors were encountered:

SeanNaren · 2022-04-03T10:37:44Z

Thanks @yuvalkirstain!

I investigated this as it definitely seemed strange however I think there may be a misunderstanding of behaviour.

using AMP, the model remains in FP32 but operations that can happen in FP16 are auto-cast. This is the same with BF16 (see this example):

import os

import torch
from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, Trainer


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        assert self.layer.weight.dtype == torch.float32 # the weights are in 32!
        x = self.layer(x) 
        assert x.dtype == torch.bfloat16 # output was bfloat16 as the operation is bf16 compatible
        return x

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


def run():
    train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    model = BoringModel()

    trainer = Trainer(
        default_root_dir=os.getcwd(),
        limit_train_batches=1,
        max_epochs=1,
        enable_model_summary=False,
        precision="bf16",
        gpus=1,
    )

    trainer.fit(model, train_dataloaders=train_data)


if __name__ == "__main__":
    run()

I then checked FairSeq, which converts the entire model into BF16 even in AMP mode: https://github.com/pytorch/fairseq/blob/main/fairseq/trainer.py#L105-L106

This might be why you see the memory improvements. If this is something you'd like to do, you can simply do the same in your __init__.

class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2).bfloat16()

Let me know if this helps! We might need to improve our documentation to explain this case.

yuvalkirstain · 2022-04-03T10:50:33Z

Thank you so much for the detailed answer! I will try it out and update :)

SeanNaren · 2022-04-03T11:38:06Z

Nope thank you @yuvalkirstain let me know how it goes! You do also bring up a great point which is if we should support converting the pl_module internally for users. I think it is a potentially good idea, however, I wonder what the API for this would be.

carmocca · 2022-04-04T13:08:40Z

This would be just a flag on the precision plugin (name up for debate)

from pytorch_lightning import Trainer
from pytorch_lightning.plugins import NativeMixedPrecisionPlugin

Trainer(plugins=NativeMixedPrecisionPlugin(convert_modules=True))

Another question is whether this should be enabled by default.

As per the test added by @rohitgr7 in https://github.com/PyTorchLightning/pytorch-lightning/pull/12508/files#diff-3e387bfea0892d3f7033341769075f9e8b17dbe2fbe4a44a68e3d450bdba2e0eR1223, the layers are also moved using DeepSpeed

yuvalkirstain · 2022-04-05T13:19:28Z

@SeanNaren Yes, doing so results in less memory on the GPU with identical results, thank you!

T5-XL (3B parameters) - inference on SQuAD dataset
8058MiB / 24268MiB (with model = model.bfloat16())
13196MiB / 24268MiB (without)

Regarding converting the pl_module internally for users, definitely, it makes more sense IMO that the trainer will take care of that rather than the model.

SeanNaren · 2022-04-05T13:36:18Z

@yuvalkirstain I'm glad it worked! Hopefully we'll get the feature in soon :)

rohitgr7 · 2022-04-05T13:55:49Z

Another question is whether this should be enabled by default.

I think yes because:

less memory consumption
other frameworks like deepspeed does it too

issues:

BC

but if there are no other side effects, I think enabling it by default should be good.

SeanNaren · 2022-04-06T11:37:54Z

Another question is whether this should be enabled by default.

I think yes because:

less memory consumption

other frameworks like deepspeed does it too

issues:

BC

but if there are no other side effects, I think enabling it by default should be good.

Just to be clear, we're talking about just BF16 precision? DeepSpeed's handling of AMP is very different and should be treated as such.

carmocca · 2022-04-07T15:30:56Z

For DeepSpeed in particular, we let it make the choice, at the moment that's moving the module too

But I was thinking this should apply to all precision values.

yuvalkirstain added the needs triage Waiting to be triaged by maintainers label Apr 3, 2022

yuvalkirstain mentioned this issue Apr 3, 2022

Add bfloat16 support to Lightning Trainer #9049

Merged

12 tasks

akihironitta added precision: amp Automatic Mixed Precision and removed needs triage Waiting to be triaged by maintainers labels Apr 4, 2022

carmocca added the feature Is an improvement or enhancement label Apr 4, 2022

carmocca added this to the 1.7 milestone Apr 4, 2022

carmocca changed the title ~~bf16 does not seems to work as expected~~ Move the module to the precision dtype Apr 5, 2022

carmocca added the performance label Jul 19, 2022

carmocca modified the milestones: pl:1.7, pl:future Jul 19, 2022

carmocca added the priority: 1 Medium priority task label Jul 19, 2022

carmocca removed the priority: 1 Medium priority task label Feb 6, 2023

awaelchli mentioned this issue Jul 19, 2023

DeepSpeed precision simplifications #18113

Merged

awaelchli closed this as completed in #18113 Jul 20, 2023

carmocca modified the milestones: future, 2.1 Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move the module to the precision dtype #12591

Move the module to the precision dtype #12591

yuvalkirstain commented Apr 3, 2022 •

edited by github-actions bot

SeanNaren commented Apr 3, 2022 •

edited

yuvalkirstain commented Apr 3, 2022

SeanNaren commented Apr 3, 2022 •

edited

carmocca commented Apr 4, 2022

yuvalkirstain commented Apr 5, 2022 •

edited

SeanNaren commented Apr 5, 2022

rohitgr7 commented Apr 5, 2022

SeanNaren commented Apr 6, 2022

carmocca commented Apr 7, 2022

Move the module to the precision dtype #12591

Move the module to the precision dtype #12591

Comments

yuvalkirstain commented Apr 3, 2022 • edited by github-actions bot

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

SeanNaren commented Apr 3, 2022 • edited

yuvalkirstain commented Apr 3, 2022

SeanNaren commented Apr 3, 2022 • edited

carmocca commented Apr 4, 2022

yuvalkirstain commented Apr 5, 2022 • edited

SeanNaren commented Apr 5, 2022

rohitgr7 commented Apr 5, 2022

SeanNaren commented Apr 6, 2022

carmocca commented Apr 7, 2022

yuvalkirstain commented Apr 3, 2022 •

edited by github-actions bot

SeanNaren commented Apr 3, 2022 •

edited

SeanNaren commented Apr 3, 2022 •

edited

yuvalkirstain commented Apr 5, 2022 •

edited