Skip to content
This repository was archived by the owner on Aug 28, 2025. It is now read-only.
This repository was archived by the owner on Aug 28, 2025. It is now read-only.

Mistake in GLUE code example #133

@Valahaar

Description

@Valahaar

Hi, I found out there is a mistake in how the total_steps is computed in https://pytorch-lightning.readthedocs.io/en/stable/notebooks/lightning_examples/text-transformers.html

For instance, in the setup() method, we have:

# Calculate total steps
tb_size = self.hparams.train_batch_size * max(1, self.trainer.gpus)
ab_size = self.trainer.accumulate_grad_batches * float(self.trainer.max_epochs)
self.total_steps = (len(train_loader.dataset) // tb_size) // ab_size

If I'm not mistaken, it should be something along the lines of:

# Calculate total steps
tb_size = self.hparams.train_batch_size * max(1, self.trainer.gpus)
ab_size = tb_size * self.trainer.accumulate_grad_batches
self.total_steps = int((len(train_loader.dataset) / ab_size) * float(self.trainer.max_epochs))

In the first version, on MRPC (3668 instances), with 30 epochs, 32 batch size, 1 gpu and 1 batch accumulation total_steps amounts to 3; in the second version, it amounts to 3438.

cc @Borda

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug / fixSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions