Skip to content

Conversation

@kuynzereb
Copy link
Contributor

Now progress bar doesn't show total number of batches in test mode. This PR fixes it.

Copy link
Collaborator

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide more information to reproduce not showing the total number of iterations, Thx

@kuynzereb
Copy link
Contributor Author

Yeah, here is a dummy example:

from time import sleep
import torch
from torch.utils.data import DataLoader, Dataset

import pytorch_lightning as pl


class DummyDataset(Dataset):
    def __init__(self):
        super().__init__()

    def __len__(self):
        return 10

    def __getitem__(self, idx):
        return torch.rand(1)


class CoolSystem(pl.LightningModule):
    def __init__(self):
        super(CoolSystem, self).__init__()

    def forward(self, x):
        return 0

    def training_step(self, batch, batch_nb):
        return {}

    def test_step(self, batch, batch_nb):
        sleep(1)
        return {}

    def test_end(self, outputs):
        return {}

    def configure_optimizers(self):
        return []

    @pl.data_loader
    def train_dataloader(self):
        return []

    @pl.data_loader
    def test_dataloader(self):
        return DataLoader(DummyDataset(), batch_size=1)

model = CoolSystem()
trainer = pl.Trainer(weights_summary=None, nb_sanity_val_steps=0)
trainer.test(model)

If you run this code with current master you will obtain the following output:
3it [00:03, 1.00s/it]

Whereas with this PR you will obtain:
30%|██████████████▋ | 3/10 [00:03<00:08, 1.00s/it]

Copy link
Collaborator

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested and it looks good to me, @williamFalcon

@kuynzereb
Copy link
Contributor Author

I have just realized that it is slightly more complicated. This PR only fixes the problem when .test(model) is called without .fit(). But if you call .test() after .fit(model) there again will be strange behavior. For example, run the following code:

from time import sleep
import torch
from torch.utils.data import DataLoader, Dataset

import pytorch_lightning as pl


class DummyDataset(Dataset):
    def __init__(self, n):
        super().__init__()
        self.n = n

    def __len__(self):
        return self.n

    def __getitem__(self, idx):
        return torch.rand(10)


class CoolSystem(pl.LightningModule):
    def __init__(self):
        super(CoolSystem, self).__init__()
        self.layer = torch.nn.Linear(10, 10)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_nb):
        # REQUIRED
        sleep(1)
        return {'loss': torch.mean(self.forward(batch) ** 2)}

    def test_step(self, batch, batch_nb):
        # OPTIONAL
        sleep(1)
        return {}

    def test_end(self, outputs):
        # OPTIONAL
        return {}

    def configure_optimizers(self):
        # REQUIRED
        # can return multiple optimizers and learning_rate schedulers
        # (LBFGS it is automatically supported, no need for closure function)
        return [torch.optim.Adam(self.layer.parameters())]

    @pl.data_loader
    def train_dataloader(self):
        # REQUIRED
        return DataLoader(DummyDataset(10), batch_size=1)

    @pl.data_loader
    def test_dataloader(self):
        # OPTIONAL
        return DataLoader(DummyDataset(5), batch_size=1)

model = CoolSystem()
trainer = pl.Trainer(weights_summary=None, nb_sanity_val_steps=0, early_stop_callback=False,
                     check_val_every_n_epoch=100, max_nb_epochs=1)
trainer.fit(model)
trainer.test()

It will end with

15it [00:15,  1.01s/it, batch_nb=9, epoch=0, loss=0.107, v_nb=26] 

We can reset the progress bar in that case too and it will show correct total number of iterations. But then this testing progress bar will show old postfixes from the training. So it seems that actually we should distinguish between train_progress_bar and test_progress_bar. In that sense it seems related to #420.

@Borda
Copy link
Collaborator

Borda commented Oct 25, 2019

maybe think about moving from tqdm to enlighten
https://github.com/Rockhopper-Technologies/enlighten
the advantage is that the progress bar is not affected by mean-time messages
also see: https://pydigger.com/keyword/bar

@williamFalcon
Copy link
Contributor

let’s keep it tqdm for now. we can consider this in a separate PR

@williamFalcon
Copy link
Contributor

williamFalcon commented Oct 25, 2019

should we just have the following bar setup?
train bar
val bar
test bar

each shown on top of each other depending on what's happening?

@kuynzereb
Copy link
Contributor Author

Yes, it sounds good. I like your idea that main train bar should have total number of batches (train + val) and that validation bar just pop ups as additional bar. And I just point out that test bar seems to be totally independent of the main train bar.

@williamFalcon williamFalcon merged commit f79bdf2 into Lightning-AI:master Oct 30, 2019
@williamFalcon
Copy link
Contributor

@kuynzereb thanks! want to do a PR for splitting the bars?

@kuynzereb
Copy link
Contributor Author

Yeah, I can give it a try!

@kuynzereb kuynzereb mentioned this pull request Nov 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants