Create saving-and-loading-Pytorch-checkpoints.rst #1364

Linardos · 2022-08-09T08:25:54Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

A minimal example for storing a PyTorch checkpoint. Several extra steps are added compared to the original saving progress example. Furthermore a way to load the latest checkpoint is added as well.

Any other comments?

This was developed with Daniel's help. We thought it was a resolved issue but apparently when loading the checkpoint, performance drops. It's as if training from scratch. We need to debug this further.

danieljanes

Thanks for the PR @Linardos ! Looks good overall, I've added a few small suggestions. Could you re-compile the docs and check if everything looks good?

doc/source/saving-and-loading-Pytorch-checkpoints.rst

Co-authored-by: Daniel J. Beutel <daniel@adap.com>

Linardos · 2022-08-31T16:32:27Z

Changes committed! I look forward to your thoughts after running the code. I hope we can resolve the issue.

danieljanes · 2022-08-31T17:28:43Z

@Linardos, lightning fast! Did you have a chance to build the docs and check if the formatting looks good?

Linardos · 2022-08-31T22:45:50Z

Yeap, I used a VS code plugin that allowed me to built them as I worked and your changes were small so it was easy to check pretty quickly.

Create saving-and-loading-Pytorch-checkpoints.rst

e3ef9c8

Linardos requested review from danieljanes and tanertopal as code owners August 9, 2022 08:25

Merge branch 'main' into main

b672cb1

danieljanes requested changes Aug 31, 2022

View reviewed changes

Linardos and others added 4 commits August 31, 2022 19:30

Update doc/source/saving-and-loading-Pytorch-checkpoints.rst

2f15860

Co-authored-by: Daniel J. Beutel <daniel@adap.com>

Update doc/source/saving-and-loading-Pytorch-checkpoints.rst

6778907

Co-authored-by: Daniel J. Beutel <daniel@adap.com>

Update doc/source/saving-and-loading-Pytorch-checkpoints.rst

d81060a

Co-authored-by: Daniel J. Beutel <daniel@adap.com>

Update doc/source/saving-and-loading-Pytorch-checkpoints.rst

13f3db6

Co-authored-by: Daniel J. Beutel <daniel@adap.com>

Merge branch 'main' into main

1090af3

danieljanes approved these changes Sep 6, 2022

View reviewed changes

danieljanes merged commit 2d4e434 into adap:main Sep 6, 2022

danieljanes mentioned this pull request Sep 6, 2022

Update checkpointing docs #1409

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create saving-and-loading-Pytorch-checkpoints.rst #1364

Create saving-and-loading-Pytorch-checkpoints.rst #1364

Linardos commented Aug 9, 2022

danieljanes left a comment

Linardos commented Aug 31, 2022

danieljanes commented Aug 31, 2022

Linardos commented Aug 31, 2022

Create saving-and-loading-Pytorch-checkpoints.rst #1364

Create saving-and-loading-Pytorch-checkpoints.rst #1364

Conversation

Linardos commented Aug 9, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

danieljanes left a comment

Choose a reason for hiding this comment

Linardos commented Aug 31, 2022

danieljanes commented Aug 31, 2022

Linardos commented Aug 31, 2022