Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added forced_even to BatchIterator ensure equal batch size #11

Merged
merged 1 commit into from
Dec 29, 2014

Conversation

cancan101
Copy link
Contributor

dnouri added a commit that referenced this pull request Dec 29, 2014
Added forced_even to BatchIterator ensure equal batch size
@dnouri dnouri merged commit 3221f17 into dnouri:master Dec 29, 2014
@dnouri
Copy link
Owner

dnouri commented Dec 29, 2014

Thanks!

@BenjaminBossan
Copy link
Collaborator

Nice. I just wonder whether it may cause some problems if the model tacitly ignores some samples (since they should always be the same samples) and the user not noticing it.

And the trouble with the StratifiedKFold still remains, right?

@cancan101
Copy link
Contributor Author

@BenjaminBossan what is the issue with StratifiedKFold?

@cancan101
Copy link
Contributor Author

Pylearn2 address then issue with the missing samples: "Batches of size unequal to batch_size will be discarded. Those examples will never be visited."

One way to deal with this is to randomize before each new iteration over the data set.

@dnouri
Copy link
Owner

dnouri commented Dec 29, 2014

Yes, shuffling before each new epoch would fix that.

@BenjaminBossan
Copy link
Collaborator

The issue with StratifiedKFold is mentioned here:
dnouri/kfkd-tutorial#1
Basically, for classification, StratifiedKFold is used, which generates unpredictable sample sizes.

And yes, shuffling would fix the issue above, but one would have to make sure that the samples set aside for evaluation are not used for training.

@cancan101
Copy link
Contributor Author

@BenjaminBossan see also #12

@dnouri
Copy link
Owner

dnouri commented Dec 31, 2014

@BenjaminBossan I don't think there's an issue with StratifiedKFold, since the batch iterator will only ever see those samples that the KFold yielded, and it will simply discard the last n_samples mod batch_size samples.

Shuffling is also not an issue. The batch iterator could simply shuffle X and y as the first thing in __iter__. Again, it will only ever see the training set (if self.test == False), so there's no chance it will accidentally mix validation set and training set. (It should not shuffle if self.test == True, unless you want to get predictions out in a different order. ;-)

@cancan101 I think I've nevertheless found a problem with this patch. It's that net.predict will return predictions that have a different size than len(X) unless that's divisible by batch_size without remainder. Certainly not what one would expect.

@cancan101
Copy link
Contributor Author

@dnouri What do you mean about net.predict?

@dnouri
Copy link
Owner

dnouri commented Dec 31, 2014

Take a look at the code. If forced_even is used, then it's not going to return predictions for all examples if len(X) % batch_size != 0.

@cancan101
Copy link
Contributor Author

It seems like that code will just raise an exception if forced_even is not
used and len(X) % batch_size != 0.

On Wed Dec 31 2014 at 9:25:24 AM Daniel Nouri notifications@github.com
wrote:

Take a look at the code
https://github.com/dnouri/nolearn/blob/master/nolearn/lasagne.py#L235.
If forced_even is used, then it's not going to return predictions for all
examples if len(X) % batch_size != 0.


Reply to this email directly or view it on GitHub
#11 (comment).

@dnouri
Copy link
Owner

dnouri commented Dec 31, 2014

EDIT: Ah, if it's not used. Well, yes, but that's only a problem for some conv layer implementations, right?
@cancan101 What makes you think so?

@cancan101
Copy link
Contributor Author

Won't it hit the issue in #8? I can give it a test.

@dnouri
Copy link
Owner

dnouri commented Dec 31, 2014

I certainly wouldn't expect it to return predictions for only some of the samples that I passed in, if I set forced_even = True, regardless of anything else.

@cancan101
Copy link
Contributor Author

I suppose one option would be to pad the data out to the correct length (perhaps when test=True) rather than dropping data and then only use the correct number of rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enhancement: convnet ValueError: 'total size of new array must be unchanged'
3 participants