-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🚀 Feature
An option on the trainer (or maybe logger?), called ignore_minor_warnings or, even better for me, invert that and have runtime_linting_warnings or something like that.
Then in the cases where a PL warning is more of a suggestion for something to check, it would only show it in the runtime if ignore_minor_warnings = False or runtime_linting_warnings = True.
For example, #7734 introduced a warning on the log_every_n_steps which triggers with Trainer default arguments if you ever have < 50 batches. The sugestion would be to make the code https://github.com/PyTorchLightning/pytorch-lightning/blob/44f62435c8d5860b542a1cb2cca66afd9e75cbcc/pytorch_lightning/trainer/data_loading.py#L306-L311 become
if self.logger and self.logger.runtime_linting_warnings and self.num_training_batches < self.log_every_n_steps:
rank_zero_warn(
f"The number of training samples ({self.num_training_batches}) is smaller than the logging interval"
f" Trainer(log_every_n_steps={self.log_every_n_steps}). Set a lower value for log_every_n_steps if"
f" you want to see logs for the training epoch."
)etc.
There are a number of these in PL that shouldn't be shown everytime someone runs the model. The ones that cause me the most grief:
- It is perfectly fine to have < 50 batches with default arguments: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/data_loading.py#L265-L270 is confusing and incorrect
- Having > 0 workers in the data_loader actually slows things down considerably in many cases, so https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/data_loading.py#L106-L112 is not correct or helpful.
- Using a GPU often slows things down on my computer, so https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L1533-L1537 is not helpful,.
Those are the only ones I have run into that are clearly a "linting" sort of situation, but I suspect there are many others. Just to be clear, I don't think this should be a global way to turn off warnings - just a way to turn off warnings that have a high probability of being a false positive.
Motivation
Everytime I run my code I see the following.
C:\Users\jesse\anaconda3\lib\site-packages\pytorch_lightning\trainer\data_loading.py:110: UserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 20 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
C:\Users\jesse\anaconda3\lib\site-packages\pytorch_lightning\trainer\data_loading.py:393: UserWarning: The number of training samples (8) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
C:\Users\jesse\anaconda3\lib\site-packages\pytorch_lightning\trainer\data_loading.py:110: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 20 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
C:\Users\jesse\anaconda3\lib\site-packages\pytorch_lightning\trainer\data_loading.py:110: UserWarning: The dataloader, test_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 20 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
and even more if I have a GPU accessible on my machine and don't want to use it (as if frequent). I am also concerned since there seems to be even more of these cropping into PL so many it will get bigger and bigger. The signal-to-noise ratio of the warnings is limited here because I have no idea if something is a legit problem or not.
Pitch
To me there are two types of warnigns involved here: (1) a "true warning" which is something that you would want to know everytime the code runs, rather than just show it once and let me decide if it is relevant; and (2) more like "linting" done at runtime. Tips and tricks that may or may not be helpful depending on the exact model. More likely to have false positives than true warnings, but the biggest difference is that it is a one-time decision: "Look at this warning and decide if it is relevant to your setup, then never look at it again". That is why I am calling this more of a linting situation.
I think it is great that PL has more of the "linting" style stuff since it probably helps in some cases, but it is an annoyance and noise if it always shows - even after I have made a decision on its relevance.
Alternatives
Alternative one is just to ignore the warnings. This is tricky because it masks real warnings. New users would have a lot of difficulty figuring out if issues are relevant or not. What generally happens is that the user tries the suggestion (e.g. num_workers = 4) to see if it helps speed. If it doesn't, they would rather never see it again.
The second alternative is for a user to use a hack to manually turn off the warnings wiht the import warnings in their code.
For example, I used to have the following code at the top of my file:
warnings.filterwarnings(
"ignore",
category=UserWarning,
module="pytorch_lightning.trainer.data_loading",
lineno=102,
)to ignore one of these.
The problems with this are very simple:
- It is highly version specific. So
lineno=102may only apply to a particular point release of PL. Even if I pin the PL version in myrequirements.txtfile, anytime things are upgraded I need to go through the source in PL and change line numbers. Forget about supporting multiple versions of PL.... and if I forget to change things, there is a chance that the lineno will now point to a totally different warning (as the warnings are usually grouped together) which might be a legit one I want to know. - The users have to maintain this themselves, which is a lot of work.
If you enjoy Lightning, check out our other projects! ⚡
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
-
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
-
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @Borda