batch_norm error in fit() at end of training epoch #1723

lcadalzo · 2022-05-30T15:24:20Z

Describe the bug
At the end of this for loop, depending on batch_size and the first dim of x_preprocessed it's possible that i_batch will end up having a batch_size of 1. This occurs when x_preprocessed.shape[0] % batch_size == 1. In my case, x_preprocessed is of shape (39209, 48, 48, 3) and batch_size is 8, and 39209 % 8 is 1. When i_batch has batch_size 1, this then causes an error in torch's batch normalization. Specifically in this function which is called here. The end result is an error that looks like this:

File "/opt/conda/lib/python3.8/site-packages/art/estimators/classification/pytorch.py", line 1115, in forward
    x = module_(x)
        │       └ tensor([[  7.4983,  33.5128,   3.2305,   0.0000,  60.0542,   0.0000,   0.0000,
        │                    0.0000,   0.0000,  11.4205,   0.000...
        └ BatchNorm1d(300, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
           │             │        └ {}
           │             └ (tensor([[  7.4983,  33.5128,   3.2305,   0.0000,  60.0542,   0.0000,   0.0000,
           │                          0.0000,   0.0000,  11.4205,   0.00...
           └ <bound method _BatchNorm.forward of BatchNorm1d(300, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)>
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 168, in forward
    return F.batch_norm(
           │ └ <function batch_norm at 0x7f1e6b6bc550>
           └ <module 'torch.nn.functional' from '/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py'>
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2280, in batch_norm
    _verify_batch_size(input.size())
    │                  │     └ <method 'size' of 'torch._C._TensorBase' objects>
    │                  └ tensor([[  7.4983,  33.5128,   3.2305,   0.0000,  60.0542,   0.0000,   0.0000,
    │                               0.0000,   0.0000,  11.4205,   0.000...
    └ <function _verify_batch_size at 0x7f1e6b6bc4c0>
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2248, in _verify_batch_size
    raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
                                                                                                      └ torch.Size([1, 300])

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 300])

This doesn't appear to be strictly an ART bug but rather an error occurring between the interaction of ART and PyTorch

To Reproduce
Call a PyTorch estimator's fit() method using a dataset and batch_size such that # of elements in dataset % batch_size == 1. Below is a snippet of code that does so:

import numpy as np
from armory.baseline_models.pytorch import micronnet_gtsrb
from armory.scenarios.utils import to_categorical

model = micronnet_gtsrb.get_art_model({}, {})
num_batches = 5
x = np.random.randn(num_batches, 48, 48, 3).astype(np.float32)
y = to_categorical(np.random.randint(10, size=num_batches)).astype(np.float32)
model.fit(x, y, batch_size=4, nb_epochs=1

Notice that if you change num_batches to 6, e.g., the error goes away

Using Armory run: set dataset "batch_size" in this config to 4 or 8 and run armory run <config>.

System information (please complete the following information):

OS: ubuntu
Python version: 3.8.10
ART version: 1.10.1
PyTorch version: 1.10.2

The text was updated successfully, but these errors were encountered:

beat-buesser · 2022-06-01T21:17:51Z

Hi @lcadalzo Thank you very much for reporting this issue!

davidslater · 2022-10-14T19:51:08Z

One way to deal with this would be to add a drop_last kwarg to fit similar to what pytorch dataloaders do. Here's how it is defined in https://pytorch.org/docs/stable/_modules/torch/utils/data/dataloader.html:

        drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
            if the dataset size is not divisible by the batch size. If ``False`` and
            the size of dataset is not divisible by the batch size, then the last batch
            will be smaller. (default: ``False``)

lcadalzo mentioned this issue May 30, 2022

Error running scenario_configs/poisoning_gtsrb_dirty_label.json with batch_size of 1 twosixlabs/armory#1537

Open

beat-buesser added the bug Something isn't working label Jun 1, 2022

beat-buesser added this to Issues open in ART 1.11.0 via automation Jun 1, 2022

beat-buesser added this to the ART 1.11.0 milestone Jun 1, 2022

beat-buesser removed this from Issues open in ART 1.11.0 Jun 29, 2022

beat-buesser removed this from the ART 1.11.0 milestone Jun 29, 2022

beat-buesser added this to the ART 1.12.2 milestone Oct 18, 2022

beat-buesser self-assigned this Oct 19, 2022

beat-buesser mentioned this issue Oct 19, 2022

Add drop_last option to method fit of PyTorchClassifier #1883

Merged

5 tasks

beat-buesser linked a pull request Oct 19, 2022 that will close this issue

Add drop_last option to method fit of PyTorchClassifier #1883

Merged

5 tasks

beat-buesser added this to Issues open in ART 1.12.2 Oct 19, 2022

beat-buesser moved this from Issues open to Issues in progress in ART 1.12.2 Oct 19, 2022

beat-buesser moved this from Issues in progress to Issues closed in ART 1.12.2 Nov 10, 2022

beat-buesser closed this as completed Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch_norm error in fit() at end of training epoch #1723

batch_norm error in fit() at end of training epoch #1723

lcadalzo commented May 30, 2022

beat-buesser commented Jun 1, 2022

davidslater commented Oct 14, 2022

batch_norm error in fit() at end of training epoch #1723

batch_norm error in fit() at end of training epoch #1723

Comments

lcadalzo commented May 30, 2022

beat-buesser commented Jun 1, 2022

davidslater commented Oct 14, 2022