-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using trainer with Batch Size leads to Conv Model shape errors in Training User Guide #93
Comments
Thanks for raising this @nilsleh - if you are using your local version of |
This is for
And this for
The shapes for the masked objects are: |
Thanks @nilsleh! Those shapes all check out. The bug is caused by one of the context sets having >1 dimensions. Here's a MWE which produces the same error:
@wesselb, do you remember when we found that calling N.B Our training unit test includes batching and is passing, but it only tests a 1D context set. Once we patch this bug we should add a test with an N-D context set. |
@tom-andersson Ah, I don't quite recall precisely what that problem was. :( Any chance you could post a small example of the repeated density channel issue here? |
Hey @wesselb, I created an MWE in pure My hypothesis is that it's something to do with applying a numpy NaN mask after merging the context sets into I'll dig into this. |
Yeah, found it. Was slightly esoteric but it was an array shape bug in the way NaNs were being removed from the Fixed in v0.3.5 on PyPI, thanks for catching this @nilsleh and thanks @wesselb for helping me realise the bug was on the |
Ah, I'm glad to hear that you managed to find the bug, @tom-andersson! :) |
Description
I am running the new User Guide Training Notebook to better understand the details of Conv Training. I downloaded the jupyter notebook and by default it runs fine, however, I wanted to run training with a defined batch size and therefore added a
batch_size
argument to the trainer. But ifbatch_size>1
, like 4 in this example then I get a shape error in a neuralprocess conv layer:I find it quiet counterintuitive that changing the batch size argument would have such an effect, because I didn't find another place where I would have to change the batch size in the code, or adapt something. So
batch_size=1
works, butbatch_size>1
leads to error.These are the shape outputs from before entering the decoder with
batch_size = 1
,with
batch_size = 4
,Thus, it seems like the sampled
z
from theDirac
pz
output from the encoder is causing the difference, but I didn't get further into debugging yet about why changing the batch size would have that effect.Reproduction steps
1. Download the Training Notebook 2. In the training loop give a batch size argument to the trainer: `batch_losses = trainer(train_tasks, batch_size=4)` 3. See error
Version
0.3.4
Screenshots
![DESCRIPTION](LINK.png)
OS
Linux
The text was updated successfully, but these errors were encountered: