-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update label_models.py #4891
Update label_models.py #4891
Conversation
- Removed AUROC metric since it's slow and cost a lot of memory to store the logits&labels. - Fixed macro-accuracy metric usage for multiple validation manifests. - Refactored validation and test function since they share the same code but just different tags in display ("val_" vs. "test_") Signed-off-by: stevehuang52 <heh@nvidia.com>
39b15e8
to
864a8a9
Compare
'val_acc_macro': macro_accuracy_score, | ||
'val_auroc': auroc_score, | ||
} | ||
return {'log': tensorboard_log} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also update this tensorboard_logs, refer to #4876
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, just fixed.
'test_acc_macro': macro_accuracy_score, | ||
'test_auroc': auroc_score, | ||
} | ||
return self.multi_validation_epoch_end(outputs, dataloader_idx, 'test') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's a good idea to merge val and test here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think? @nithinraok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can merge, why do you think its not a good idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nithinraok just worried it would break things such eval on multi gpu and reduce flexibility, but I guess I think too much. Then let's merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think since its re-use of function, but @fayejf or @stevehuang52 , can you run an experiment and see this working as expected before merging? We have a unit test for training but do not check validation and test time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do it, but do you have any examples of how to do it quickly? I think it will work correctly, and ASR models are also doing it the same way: https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/models/ctc_models.py#L650
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fayejf can you provide steve with some lang id examples. Notice the validation time before and after the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me run it really quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What might be cleaner is to separate out into a function if we're worried about using the validation hook within the test hook:
def calculate_metrics(self, outputs, dataloader_idx, tag):
...
def multi_test_epoch_end(self, outputs, dataloader_idx):
return self.calculate_metrics(outputs, dataloader_idx, 'test')
def multi_validation_epoch_end(self, outputs, dataloader_idx):
return self.calculate_metrics(outputs, dataloader_idx, 'val')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this would be neat.
@stevehuang52 since you are removing auroc, remove it from instantiation and also corresponding import |
It's already removed |
@nithinraok @stevehuang52 I've tested with a toy sample for validation time, but probably my sample is not large enough, I didn't observe time reduction with this fix than main, even though my validation set is about 656809 files, not small. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good. could you update with sean's suggestion?
|
||
logging.info("val_loss: {:.3f}".format(val_loss_mean)) | ||
self.log('val_loss', val_loss_mean) | ||
logging.info(f'{tag}_loss: {loss_mean:.3f}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you remove the duplicate line here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
Did you also test it with r1.10 to compare, can you please do that as well, to find the cause of the delay in the validation time issue? @fayejf |
I don't think we add auroc and val_macro_acc in r1.10, right? |
yes, hence want to compare with it and test |
Done~ |
@nithinraok So I compared this PR with main and r1.10 (val acc since val acc macro is not implemented) and I didn't reinstall anything, just switched branch. No much diff in terms of time actually. time per epoch for first 5 epochs. I do observe some weird 16:36 run above and let me try it again. Besides that, no time diff/reduction in terms of the fix. Possibly because our val data is not huge enough. But I think it's okay to merge it to fix and remove auroc for now. |
I think it didn't run what you might be intended, since r1.10 runs with PTL 1.6, and throws errors with PTL 1.7. So, you might need to reinstall for each test. |
what kind of error do we expect to see for 1.10 with PTL1.7? I'm with PTL 1.7.3 and r1.10 and it works fine. |
It would be from lightningmodule.trainer attribute, But I think you ran correctly I guess, vice versa shouldn't work (latest and PTL 1.6). |
@nithinraok I run with PTL 1.7.3 for all of the above tests. Though I think I should run r1.10 with PTL 1.6 but the run was fine with 1.7.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks. Some more runs need to be done to find the cause of the increase in validation time.
@nithinraok I didn't see an increase in validation time, but I didn't see the expected reduction of this commit actually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks! i found using weighted cross entropy loss in test in speech_to_label.py
is problematic. let me patch in another PR.
* Update label_models.py - Removed AUROC metric since it's slow and cost a lot of memory to store the logits&labels. - Fixed macro-accuracy metric usage for multiple validation manifests. - Refactored validation and test function since they share the same code but just different tags in display ("val_" vs. "test_") Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>
* Update label_models.py - Removed AUROC metric since it's slow and cost a lot of memory to store the logits&labels. - Fixed macro-accuracy metric usage for multiple validation manifests. - Refactored validation and test function since they share the same code but just different tags in display ("val_" vs. "test_") Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
* Update label_models.py - Removed AUROC metric since it's slow and cost a lot of memory to store the logits&labels. - Fixed macro-accuracy metric usage for multiple validation manifests. - Refactored validation and test function since they share the same code but just different tags in display ("val_" vs. "test_") Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
What does this PR do ?