Trouble reproducing results from PL 1.5.7 in PL 1.6.0 #13150

ssharpe42 · 2022-05-25T12:27:30Z

ssharpe42
May 25, 2022

I am trying to reproduce results (for a publication) for a simple dense NN on a CPU. Training the model in PL 1.5.7 through 1.5.10 produce the same results, however 1.6.0 produces different results.

Could this be due to the "current epoch/global step boundary" change in 1.6.0?

Val loss on the first epoch is different and the step is off by 1. Any ideas on how to alter training to reproduce across versions?

2022-05-24 14:19:42,345 - pytorch_lightning.utilities.rank_zero - INFO - Epoch 0, global step 16: 'val_loss' reached 0.69475 (best 0.69475), saving model to

2022-05-24 14:21:36,574 - pytorch_lightning.utilities.distributed - INFO - Epoch 0, global step 15: val_loss reached 0.68147 (best 0.68147), saving model to

Here is my trainer config below.

trainer = Trainer(
            deterministic=True,
            max_epochs=100,
            callbacks=[
                EarlyStopping(monitor="val_loss", patience=5),
                ModelCheckpoint(
                    monitor="val_loss",
                    dirpath=path,
                    filename= "checkpoint",
                    save_top_k=1,
                    verbose=True,
                    mode="min",
                ),
            ],
        )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trouble reproducing results from PL 1.5.7 in PL 1.6.0 #13150

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Trouble reproducing results from PL 1.5.7 in PL 1.6.0 #13150

ssharpe42 May 25, 2022

Replies: 0 comments

ssharpe42
May 25, 2022