-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BatchNorm1d throws exception during eval with batch size of 1 #500
Comments
@FusionCarcass -- I'll take a look at it. If you have a smaller repro case, that would be great, otherwise I'll try to come up with one myself. Could you check whether having a channel dimension of 1 makes any difference? BatchNorm1d is supposed to take either (N,L) or (N,C,L). |
@FusionCarcass -- here's what I'm seeing in Python: bn1 = torch.nn.BatchNorm1d(28)
p = bn1(torch.randn(16,28))
p = bn1(torch.randn(1,28))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Miniconda3\lib\site-packages\torch\nn\modules\batchnorm.py", line 168, in forward
return F.batch_norm(
File "D:\Miniconda3\lib\site-packages\torch\nn\functional.py", line 2280, in batch_norm
_verify_batch_size(input.size())
File "D:\Miniconda3\lib\site-packages\torch\nn\functional.py", line 2248, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 28]) Which seems to be the same error you're seeing with TorchSharp. It only fails if the C dimension is missing. There's no error for this in either Python or .NET: bn1 = torch.nn.BatchNorm1d(3)
p = bn1(torch.randn(1,3,28)) |
BatchNorm1d should work with tensor of shape (1, L) when set to eval mode. Training would require (N, L) tensors where N > 1. This code works for me in python with input tensors (1, C, L) which gets flattened to (1, L) before the batchnorm where the exception occurs. class InceptionResNetV4(nn.Module):
def __init__(self):
super().__init__()
self.stack = nn.Sequential(
Stem(1),
InceptionResNetA(384),
InceptionResNetA(384),
InceptionResNetA(384),
InceptionResNetA(384),
InceptionResNetA(384),
nn.AdaptiveMaxPool1d(10),
nn.Flatten(),
nn.Dropout(p=0.5),
nn.Linear(3840, 1024),
nn.BatchNorm1d(1024),
nn.ReLU(),
nn.Linear(1024, 2)
)
def forward(self, x):
output = self.stack(x)
print(output.shape)
return output I export my model and load into the following C# model: public class InceptionResNetV4 : Module {
private readonly Sequential stack;
public InceptionResNetV4() : base(string.Empty) {
this.stack = Sequential(
new Stem(1),
new InceptionResNetA(384),
new InceptionResNetA(384),
new InceptionResNetA(384),
new InceptionResNetA(384),
new InceptionResNetA(384),
AdaptiveMaxPool1d(10),
Flatten(),
Dropout(probability: 0.5f),
Linear(3840, 1024),
BatchNorm1d(1024),
ReLU(),
Linear(1024, 2)
);
this.RegisterComponents();
}
public override torch.Tensor forward(torch.Tensor t) {
return this.stack.forward(t);
}
} |
I modified your example to make it work in python. import torch
import torch.nn as nn
import torch.nn.functional as F
bn1 = torch.nn.BatchNorm1d(28)
bn1.eval()
p = bn1(torch.randn(16,28))
p = bn1(torch.randn(1,28)) |
Thanks! |
Seems like the bug is in Sequential. This blows up in the second block, not the first: using (var pool = BatchNorm1d(28)) {
pool.Eval();
pool.forward(torch.ones(1, 28));
}
using (var pool = BatchNorm1d(28))
using (var seq = Sequential(pool)) {
seq.Eval();
seq.forward(torch.ones(1, 28));
} |
Eval() does not seem to be propagated properly to all submodules, whether in a Sequential or a custom module. I have a fix, and I'm going to push it together with the fix for #499, which is a big one. |
I hope we will see last fixes in nuget soon :) |
It's coming... :-) |
Allow register_parameter to take a null tensor.
@241721, @FusionCarcass -- in case you didn't see, 0.96.0 was just released on NuGet with the fix for this bug in it. |
I saw that! Thank you so much :-) Great job |
I have a model that uses BatchNorm1d after a Linear layer that results in the exception below when using a batch size of 1 during eval. The same operation works fine in Python with a batch size of 1 during eval. I believe the correct behavior here should be to allow batch sizes of 1 during eval mode.
Troubleshooting
I have verified that all BatchNorm1d modules are set to eval prior to prediction by running the following command:
Theories
I am training in python, then exporting the model to a .dat file and loading with the module.load method. I verified that the mean and var parameters of the BatchNorm1d are successfully loaded. I thought maybe the mean or var would be null, which might trigger a batching requirement during eval.
I think the issue is either related to loaded from a .dat file OR something not getting passed down to the native torch libraries correctly.
The text was updated successfully, but these errors were encountered: