-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fix to resume pre-training when val outputs is empty #6281
Conversation
Signed-off-by: Abhishree <abhishreetm@gmail.com>
@@ -681,6 +681,9 @@ def validation_step(self, dataloader_iter, batch_idx): | |||
return [] | |||
|
|||
def validation_epoch_end(self, outputs): | |||
if len(outputs) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change the config to have strict false and add it to the PR.
@@ -738,6 +738,9 @@ def validation_step(self, batch, batch_idx, dataloader_idx=0): | |||
return loss_mean | |||
|
|||
def validation_epoch_end(self, outputs): | |||
if len(outputs) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as other comment.
@@ -453,6 +453,9 @@ def validation_step(self, dataloader_iter, batch_idx): | |||
return loss_mean[0] | |||
|
|||
def validation_epoch_end(self, outputs): | |||
if len(outputs) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as other comment.
Signed-off-by: Abhishree <abhishreetm@gmail.com>
c460b57
to
8b52ea2
Compare
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this go to r1.17.0 instead of main?
@okuchaiev yes I missed to base it off of r1.17 instead of main. Let me do that. |
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
This PR was closed because it has been inactive for 7 days since being marked as stale. |
|
lets merge this 1.18.0 ? @ericharper what do you think? |
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
This PR was closed because it has been inactive for 7 days since being marked as stale. |
What does this PR do ?
Resuming pre-training sometimes fails immediately complaining of empty validation outputs tensor. This PR provides a fix for this issue.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information