Added forced_even to BatchIterator ensure equal batch size #11

cancan101 · 2014-12-29T03:30:09Z

Closes #8
Closes #1

Naming inspired by https://github.com/lisa-lab/pylearn2/blob/0e26c340d2e607dc5190c8ee68a2dc471d45e1af/pylearn2/utils/iteration.py#L177

Added forced_even to BatchIterator ensure equal batch size

dnouri · 2014-12-29T11:11:27Z

Thanks!

BenjaminBossan · 2014-12-29T19:54:42Z

Nice. I just wonder whether it may cause some problems if the model tacitly ignores some samples (since they should always be the same samples) and the user not noticing it.

And the trouble with the StratifiedKFold still remains, right?

cancan101 · 2014-12-29T20:16:58Z

@BenjaminBossan what is the issue with StratifiedKFold?

cancan101 · 2014-12-29T20:19:08Z

Pylearn2 address then issue with the missing samples: "Batches of size unequal to batch_size will be discarded. Those examples will never be visited."

One way to deal with this is to randomize before each new iteration over the data set.

dnouri · 2014-12-29T21:03:09Z

Yes, shuffling before each new epoch would fix that.

BenjaminBossan · 2014-12-30T19:42:20Z

The issue with StratifiedKFold is mentioned here:
dnouri/kfkd-tutorial#1
Basically, for classification, StratifiedKFold is used, which generates unpredictable sample sizes.

And yes, shuffling would fix the issue above, but one would have to make sure that the samples set aside for evaluation are not used for training.

cancan101 · 2014-12-30T20:41:25Z

@BenjaminBossan see also #12

dnouri · 2014-12-31T09:52:28Z

@BenjaminBossan I don't think there's an issue with StratifiedKFold, since the batch iterator will only ever see those samples that the KFold yielded, and it will simply discard the last n_samples mod batch_size samples.

Shuffling is also not an issue. The batch iterator could simply shuffle X and y as the first thing in __iter__. Again, it will only ever see the training set (if self.test == False), so there's no chance it will accidentally mix validation set and training set. (It should not shuffle if self.test == True, unless you want to get predictions out in a different order. ;-)

@cancan101 I think I've nevertheless found a problem with this patch. It's that net.predict will return predictions that have a different size than len(X) unless that's divisible by batch_size without remainder. Certainly not what one would expect.

cancan101 · 2014-12-31T14:21:46Z

@dnouri What do you mean about net.predict?

dnouri · 2014-12-31T14:25:21Z

Take a look at the code. If forced_even is used, then it's not going to return predictions for all examples if len(X) % batch_size != 0.

cancan101 · 2014-12-31T15:25:31Z

It seems like that code will just raise an exception if forced_even is not
used and len(X) % batch_size != 0.

On Wed Dec 31 2014 at 9:25:24 AM Daniel Nouri notifications@github.com
wrote:

Take a look at the code
https://github.com/dnouri/nolearn/blob/master/nolearn/lasagne.py#L235.
If forced_even is used, then it's not going to return predictions for all
examples if len(X) % batch_size != 0.

—
Reply to this email directly or view it on GitHub
#11 (comment).

dnouri · 2014-12-31T15:28:47Z

EDIT: Ah, if it's not used. Well, yes, but that's only a problem for some conv layer implementations, right?
~~@cancan101 What makes you think so?~~

cancan101 · 2014-12-31T15:29:26Z

Won't it hit the issue in #8? I can give it a test.

dnouri · 2014-12-31T15:31:37Z

I certainly wouldn't expect it to return predictions for only some of the samples that I passed in, if I set forced_even = True, regardless of anything else.

cancan101 · 2014-12-31T16:08:34Z

I suppose one option would be to pad the data out to the correct length (perhaps when test=True) rather than dropping data and then only use the correct number of rows.

Added forced_even to BatchIterator ensure equal batch size

d7690ab

dnouri added a commit that referenced this pull request Dec 29, 2014

Merge pull request #11 from cancan101/forced_even

3221f17

Added forced_even to BatchIterator ensure equal batch size

dnouri merged commit 3221f17 into dnouri:master Dec 29, 2014

dnouri mentioned this pull request Dec 29, 2014

Running net2 without a GPU dnouri/kfkd-tutorial#1

Closed

cancan101 mentioned this pull request Dec 29, 2014

Ability to Shuffle Data Before Each Epoch #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added forced_even to BatchIterator ensure equal batch size #11

Added forced_even to BatchIterator ensure equal batch size #11

cancan101 commented Dec 29, 2014

dnouri commented Dec 29, 2014

BenjaminBossan commented Dec 29, 2014

cancan101 commented Dec 29, 2014

cancan101 commented Dec 29, 2014

dnouri commented Dec 29, 2014

BenjaminBossan commented Dec 30, 2014

cancan101 commented Dec 30, 2014

dnouri commented Dec 31, 2014

cancan101 commented Dec 31, 2014

dnouri commented Dec 31, 2014

cancan101 commented Dec 31, 2014

dnouri commented Dec 31, 2014

cancan101 commented Dec 31, 2014

dnouri commented Dec 31, 2014

cancan101 commented Dec 31, 2014

Added forced_even to BatchIterator ensure equal batch size #11

Added forced_even to BatchIterator ensure equal batch size #11

Conversation

cancan101 commented Dec 29, 2014

dnouri commented Dec 29, 2014

BenjaminBossan commented Dec 29, 2014

cancan101 commented Dec 29, 2014

cancan101 commented Dec 29, 2014

dnouri commented Dec 29, 2014

BenjaminBossan commented Dec 30, 2014

cancan101 commented Dec 30, 2014

dnouri commented Dec 31, 2014

cancan101 commented Dec 31, 2014

dnouri commented Dec 31, 2014

cancan101 commented Dec 31, 2014

dnouri commented Dec 31, 2014

cancan101 commented Dec 31, 2014

dnouri commented Dec 31, 2014

cancan101 commented Dec 31, 2014