Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect number of batches when multiple test loaders are used and test_percent_check is specified #1899

Closed
binshengliu opened this issue May 20, 2020 · 5 comments 路 Fixed by #1920 or #2226
Assignees
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@binshengliu
Copy link

馃悰 Bug

When there are multiple test dataloaders and test_percent_check is specified. The estimated total batches are incorrect and progress bar doesn't show properly.

For example, when I specify two dataloaders each of which has 100 batches and test_percent_check=0.1. The expected total batches are 200*0.1=20. But actually, 40 batches are run.

At this line, num_batches is the global number of batches and will be assigned to self.num_test_batches. https://github.com/PyTorchLightning/pytorch-lightning/blob/3459a546672303204a4ae6efcc2613a90f003903/pytorch_lightning/trainer/data_loading.py#L243

while in the evaluation loop, max_batches is regarded as the number of batches for one data loader.
https://github.com/PyTorchLightning/pytorch-lightning/blob/3459a546672303204a4ae6efcc2613a90f003903/pytorch_lightning/trainer/evaluation_loop.py#L262

To Reproduce

Steps to reproduce the behavior:

  1. Return multiple dataloaders from test_dataloaders()
  2. Specify test_percent_check.
  3. Run trainer.test()
  4. Observe expected_batches * num_loaders be run. The progress bar also fails to show progress after expected_batches as it exceeds its specified total steps.

Expected behavior

Run correct number of batches.

Environment

  • CUDA:
    • GPU:
    • available: False
    • version: 10.2
  • Packages:
    • numpy: 1.18.4
    • pyTorch_debug: False
    • pyTorch_version: 1.5.0
    • pytorch-lightning: 0.7.6
    • tensorboard: 2.2.0
    • tqdm: 4.45.0
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor:
    • python: 3.7.6
    • version: Proposal for help聽#1 SMP Debian 4.19.118-2 (2020-04-29)
@binshengliu binshengliu added bug Something isn't working help wanted Open to be worked on labels May 20, 2020
@awaelchli
Copy link
Member

awaelchli commented May 20, 2020

Just had a look at this. The problem is in the trainer as you say, not the progress bar.
There are two loops, the outer runs through the number of dataloaders and the inner loop runs through each.

so the max batches should be the number of batches to run in each dataloader, not totally.

We can easily fix this.
There should really be a test. There seems to be no test that checks that *_percent_check works with the correct amount of data. we should definitely have these tests.

@awaelchli awaelchli self-assigned this May 20, 2020
@rohitgr7
Copy link
Contributor

Same case might be hapenning with val_dataloaders. max_batches should be a list I suggest.

@awaelchli
Copy link
Member

That's true yes, I agree, because they could have different length.

@rohitgr7
Copy link
Contributor

@awaelchli Anyone working on this or should I submit a PR? Need this to be fixed for a personal project debugging and testing.

@awaelchli
Copy link
Member

if you like, that would help us a lot .)
I could help with the tests if you need help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
3 participants