Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation loader flat loss #59

Closed
Dewald928 opened this issue Aug 4, 2020 · 5 comments
Closed

Validation loader flat loss #59

Dewald928 opened this issue Aug 4, 2020 · 5 comments

Comments

@Dewald928
Copy link

Dewald928 commented Aug 4, 2020

I copied your example notebook to colab and ran the code without changing anything. But the validation loss I get goes flat, which is clearly a mistake when compared to your example. I also experienced this with my other networks which do the same, the loss just goes flat.

You can see my results from colab and your example in the figures below.

EDIT: If I replace val_iter with val_loader inside loss = self._validate(...) it does seem to "work" as I'd expect. So somewhere there seems to be a mistake in how the val_iter is iterated.

Colab Your Example notebook
colab notebook
@NaleRaphael
Copy link
Contributor

NaleRaphael commented Aug 4, 2020

@Dewald928 Thanks for reporting this.

After a quick run on that notebook ("examples/lrfinder_mnist.ipynb"), I found that running range_test() with val_loader is as quickly as running without val_loader, which is absolutely abnormal. I'll keep investigating it.

UPDATE: Currently, it seems there is something wrong in commit 52c189a.

@NaleRaphael
Copy link
Contributor

OK, I figure out the reason why range_test() is running as quickly as it runs without a val_loader.

In range_test(), the following loop works normally at the first iteration:

# @LRFinder.range_test()
# ...
for iteration in tqdm(range(num_iter)):
    # Train on batch and retrieve loss
    loss = self._train_batch(
        train_iter,
        accumulation_steps,
        non_blocking_transfer=non_blocking_transfer,
    )
    if val_loader:
        loss = self._validate(
            val_iter, non_blocking_transfer=non_blocking_transfer
        )
# ...

However, val_iter._iterator has run out of values after that iteration and won't be reset in the following execution. Hence that self._validate() won't do anything and just return the default output: 0.0 (known as running_loss in that method). Therefore, loss returned by self._train_batch() is overwritten by 0.0, and it will be re-calculated by the following code:

# Track the best loss and smooth it if smooth_f is specified
if iteration == 0:
    self.best_loss = loss
else:
    if smooth_f > 0:
        loss = smooth_f * loss + (1 - smooth_f) * self.history["loss"][-1]
    if loss < self.best_loss:
        self.best_loss = loss

And that's why the lr-loss curve goes flat like the result you provided.

I'll make a patch for it later.

NaleRaphael added a commit to NaleRaphael/pytorch-lr-finder that referenced this issue Aug 4, 2020
acquired by the syntax of normal `iterator`

In `LRFinder.range_test()`, `val_iter` won't be reset after it runs
out of values, and it makes `LRFinder._validate()` failed to work
correctly after the first iteration of `range_test()`.

To fix it, we add a counter to count the times a `ValDataLoaderIter`
has run (i.e. times of `__next__()` is called). And reset it only
when its `__iter__()` is called. So that it won't be reset
automatically like the way `TrainDataLoaderIter` works.

See also davidtvs#59 and the docstring of `ValDataLoaderIter` for further
details.
davidtvs pushed a commit that referenced this issue Aug 20, 2020
… the syntax of normal `iterator` (#60)

* MAINT: make `ValDataLoaderIter` able to be reset only when it is
acquired by the syntax of normal `iterator`

In `LRFinder.range_test()`, `val_iter` won't be reset after it runs
out of values, and it makes `LRFinder._validate()` failed to work
correctly after the first iteration of `range_test()`.

To fix it, we add a counter to count the times a `ValDataLoaderIter`
has run (i.e. times of `__next__()` is called). And reset it only
when its `__iter__()` is called. So that it won't be reset
automatically like the way `TrainDataLoaderIter` works.

See also #59 and the docstring of `ValDataLoaderIter` for further
details.

* TST: add tests for `TrainDataLoaderIter` and `ValDataLoaderIter`

* MAINT: remove redundant argument in `ValDataLoaderIter.__init__()`

* TST: add tests to check valid types of `train_loader` and `val_loader`
used in `range_test()`

* TST: explicitly specify the message of exception we expect to get
@ivanpanshin
Copy link

Hey, man. Any progress on this issue?

@davidtvs
Copy link
Owner

@NaleRaphael fixed te issue in #60 which has been merged to the master branch.

I'm closing this issue, thanks for reporting it and feel free to reopen if needed

@ivanpanshin
Copy link

Yeah, that was my bad, didn't see those commits. Updated the package, now the evaluation with valid loader takes much longer to finish, so I guess now it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants