Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot determine batch_size from a list of string while running range_test() with val_loader #57

Closed
NaleRaphael opened this issue Jul 21, 2020 · 7 comments

Comments

@NaleRaphael
Copy link
Contributor

NaleRaphael commented Jul 21, 2020

Hey @davidtvs, this issue is found while I was writing an example for utilizing this package with huggingface/transformers for #55 .

Condition

  • Input data: list of string (Dataset returns string)
  • Running range_test() with val_loader

Error message

---> 10 lr_finder.range_test(train_loader, val_loader=valid_loader, start_lr=1e-5, end_lr=10, num_iter=100, step_mode='linear')

1 frames

/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    288             if val_loader:
    289                 loss = self._validate(
--> 290                     val_iter, non_blocking_transfer=non_blocking_transfer
    291                 )
    292 

/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in _validate(self, val_iter, non_blocking_transfer)
    398 
    399                 if isinstance(inputs, tuple) or isinstance(inputs, list):
--> 400                     batch_size = inputs[0].size(0)
    401                 else:
    402                     batch_size = inputs.size(0)

AttributeError: 'str' object has no attribute 'size'

Description

In current implementation, batch_size is determined dynamically according to the shape of inputs in LRFinder._validate(). (v0.2.0) L399-L402 will work normally only when given inputs is a torch.tensor. And that's why it failed when inputs is a list of string.

Maybe it's not a usual case that Dataset returns non-torch.tensor values, but I think it would be more easier to access it from DataLoader.batch_size since it's going to iterate a val_loader in LRFinder._validate().

Hence that I proposed a fix for this in that notebook, it's simply add a line batch_size = val_iter.data_loader.batch_size before entering the loop and remove those if-else statement, you can check it out here.

But I'm having doubts about adding a property batch_size in DataLoaderIter, e.g.

class DataLoaderIter(object):
    # ...
    @property
    def batch_size(self):
        return self.data_loader.batch_size

With this property, proposed fix can be simplified a little into this:

class LRFinder(object):
    def _validate(self, val_iter, non_blocking_transfer=True):
        # Set model to evaluation mode and disable gradient computation
        running_loss = 0
        self.model.eval()

        with torch.no_grad():
            for inputs, labels in val_iter:
                # Move data to the correct device
                inputs, labels = self._move_to_device(
                    inputs, labels, non_blocking=non_blocking_transfer
                )

                # Forward pass and loss computation
                outputs = self.model(inputs)
                loss = self.criterion(outputs, labels)
                running_loss += loss.item() * val_iter.batch_size

        return running_loss / len(val_iter.dataset)

What do you think of it?

@davidtvs
Copy link
Owner

The current code is like that so that it can handle last batches that don't have the same batch size. Your suggestion is cleaner and works if we force drop_last=True for the validation data loader. Ideally, we would not force drop_last=True and still support datasets that return objects that don't have a size method. I googled a bit and couldn't find a way that is not overcomplicated.

I'll try to think about this a bit more and come back tomorrow/next few days. But I think if we can't find a reasonable way of having both then we should do this change and document that drop_last=True.

@davidtvs
Copy link
Owner

After some experimenting, replacing size with len works better. Like this:

if isinstance(inputs, tuple) or isinstance(inputs, list):
    batch_size = len(inputs[0])
else:
    batch_size = len(inputs)

but not sure if this would fail for some other type of dataset.

@NaleRaphael
Copy link
Contributor Author

NaleRaphael commented Jul 23, 2020

You are right, I forgot the case that drop_last is False in DataLoader by default.

I was thinking about making some changes in DataLoaderIter to get a general solution, e.g.

class DataLoaderIter(object):
    def get_current_batch_size(self, batch):
        # Users can override this according to their dataset
        return batch.size(0)

# So that we can get batch size dynamically
class LRFinder(object):
    def _validate(self, val_iter, non_blocking_transfer=True):
        # ...
        batch_size = val_iter.get_current_batch_size(batch)
        # ...

But I suddenly got an idea that size of each batch data is already accessible from the outputs or loss (Update: should be labels. Because outputs can still be an non-tensor object and shape of loss is determined by the argument reduction of loss function).

class LRFinder(object):
    def _validate(self, val_iter, non_blocking_transfer=True):
        # ...

        with torch.no_grad():
            for inputs, labels in val_iter:
                # Move data to the correct device
                inputs, labels = self._move_to_device(
                    inputs, labels, non_blocking=non_blocking_transfer
                )

                # Forward pass and loss computation
                outputs = self.model(inputs)
                loss = self.criterion(outputs, labels)
                running_loss += loss.item() * labels.size()

        # ...

It seems like an easier solution for this issue. 🤔

@davidtvs
Copy link
Owner

ya that looks like a nice solution. From what I can see all loss functions expect the label/target to be a tensor. Could also replace size with len to be even safer.

Either way, you can create a PR with this change

@NaleRaphael
Copy link
Contributor Author

Great, I'll create a PR for it later.

@davidtvs
Copy link
Owner

Merged #58. Thanks @NaleRaphael for raising the issue and fixing it

@NaleRaphael
Copy link
Contributor Author

It has been my pleasure to help. 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants