MnistDatasetFetcher shuffles on reset causing unexpected behavior #6299

GuutBoy · 2018-08-28T19:41:36Z

If the shuffle parameter is set on the MnistDatasetFetcher it will shuffle the dataset on each call to reset. Below is the relevant code from MnistDatasetFetcher:

@Override
public void reset() {
  cursor = 0;
  curr = null;
  if (shuffle)
    MathUtils.shuffleArray(order, rng);
}

This appears to be a bug. In particular, this seem to cause an MnistDatasetIterator constructed with numExamples < 60000 to iterate over a new dataset on each call to reset(). The documentation is not super explicit but this does not seem to be the intended behavior.

An example of this is demonstrated by the code below:

import java.io.IOException;
import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator;
import org.nd4j.linalg.dataset.DataSet;

public class MnistDatasetExperiments {

  public static void main(String[] args) throws IOException {
    MnistDataSetIterator it = new MnistDataSetIterator(1, 1, false, false, true, 0);
    DataSet data1 = it.next();
    it.reset();
    DataSet data2 = it.next();
    if (data1.get(0).getFeatures().equals(data2.get(0).getFeatures())) {
      System.out.println("Success 😃");
    } else {
      System.out.println("Failure 😭");
    }
  }
}

Here we construct an iterator over a single example with a batch size of 1. I would expect this iterator to return the same dataset on the first and second call to next (after the call to reset).

The text was updated successfully, but these errors were encountered:

Charele · 2018-08-29T02:49:13Z

Why must we get the same dataset?
I think it isn't necessary.

AlexDBlack · 2018-08-29T03:09:46Z

Thanks for reporting.
Fixed here: #6302

GuutBoy · 2018-08-29T06:35:19Z

@Charele It is a design choice and of course you could go with the current behavior. Either way, the actual behavior should probably be clearly documented to avoid confusion.

In my case I wanted to train a classifier over a number of epochs on a limited set of examples. To do this I used a MnistDatasetIterator with numExamples set to, say 500, and call fit on a MultiLayerNetwork with the iterator as the argument. Now I would expect this to just train the classifier on a data set of size 500 but, because of this issue, the classifier is trained on a new random subset for each epoch. This was very confusing to me because I would end up with a very good classifier even when training on a very small data set.

* Another pass on javadoc link formatting * #6299 Mnist iterator subset shuffling repeatability * #6128 fix StackVertex output type * #6101 DataVec ObjectDetectionRecordReader image center validation * #6280 validate and throw exception for invalid loss/activation combinations * Cleanup and fix tests given new validation * Another round of javadoc link fixes * Re-enable some now passing tests * Tweak arbiter max candidates condition to exclude queued candidates * Small final test fix

lock · 2018-09-29T03:00:41Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

AlexDBlack self-assigned this Aug 29, 2018

AlexDBlack added a commit that referenced this issue Aug 29, 2018

#6299 Mnist iterator subset shuffling repeatability

bab1b78

AlexDBlack mentioned this issue Aug 29, 2018

DL4J: Misc fixes #6302

Merged

AlexDBlack added a commit that referenced this issue Aug 30, 2018

#6299 Mnist iterator subset shuffling repeatability

efd02dc

AlexDBlack closed this as completed in #6302 Aug 30, 2018

lock bot locked and limited conversation to collaborators Sep 29, 2018

eclipsewebmaster unassigned AlexDBlack Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MnistDatasetFetcher shuffles on reset causing unexpected behavior #6299

MnistDatasetFetcher shuffles on reset causing unexpected behavior #6299

GuutBoy commented Aug 28, 2018

Charele commented Aug 29, 2018

AlexDBlack commented Aug 29, 2018

GuutBoy commented Aug 29, 2018 •

edited

Loading

lock bot commented Sep 29, 2018

MnistDatasetFetcher shuffles on reset causing unexpected behavior #6299

MnistDatasetFetcher shuffles on reset causing unexpected behavior #6299

Comments

GuutBoy commented Aug 28, 2018

Charele commented Aug 29, 2018

AlexDBlack commented Aug 29, 2018

GuutBoy commented Aug 29, 2018 • edited Loading

lock bot commented Sep 29, 2018

GuutBoy commented Aug 29, 2018 •

edited

Loading