Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DL4J: Cifar iterator can crash (array workspace leak?) #6834

Closed
AlexDBlack opened this issue Dec 11, 2018 · 5 comments

Comments

@AlexDBlack
Copy link
Contributor

commented Dec 11, 2018

To reproduce: Run on CUDA (maybe CPU too, not sure):

    private static int height = 32;
    private static int width = 32;
    private static int channels = 3;
    private static int batchSize = 48;
    private static boolean preProcessCifar = true;//use Zagoruyko's preprocess for Cifar

    public static void main(String[] args) throws Exception {
        CifarDataSetIterator cifar = new CifarDataSetIterator(batchSize, CifarLoader.NUM_TRAIN_IMAGES,
            new int[] {height, width, channels}, preProcessCifar, true);

        //This is fine:
//        int iter = 0;
//        while(cifar.hasNext()){
//            DataSet ds = cifar.next();
//            System.out.println(iter++);
//        }

        //This is not:
        DataSetIterator async = new AsyncDataSetIterator(cifar);
        int iter = 0;
        while(async.hasNext()){
            DataSet ds = async.next();
            System.out.println(iter++);
        }
    }

Workaround: use async shield iterator

@crockpotveggies

This comment has been minimized.

Copy link

commented Dec 11, 2018

Also to add to this, when running with AsyncShieldDataSetIterator there's another error:

Exception in thread "main" java.lang.IllegalArgumentException: invalid example number: must be 0 to 9999, got 10000
	at org.nd4j.linalg.dataset.DataSet.get(DataSet.java:649)
	at org.datavec.image.loader.CifarLoader.next(CifarLoader.java:429)
	at org.deeplearning4j.datasets.iterator.impl.CifarDataSetIterator.next(CifarDataSetIterator.java:141)
	at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:440)
	at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:84)
	at org.deeplearning4j.datasets.iterator.AsyncShieldDataSetIterator.next(AsyncShieldDataSetIterator.java:166)
	at org.deeplearning4j.datasets.iterator.AsyncShieldDataSetIterator.next(AsyncShieldDataSetIterator.java:35)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(MultiLayerNetwork.java:1571)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1521)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1508)
	at org.deeplearning4j.examples.convolution.Cifar.main(Cifar.java:88)
@AlexDBlack

This comment has been minimized.

Copy link
Contributor Author

commented Dec 11, 2018

As noted in slack: that's a separate bug in CIFAR iterator.
Seems like it loads batches of 10k examples into memory (5 batches total I think?)
But then doesn't account for those splits when it tries to get a subset from what it has in memory.
Overall looks like cifar iterator needs some work/optimization and proper tests...

@AlexDBlack

This comment has been minimized.

Copy link
Contributor Author

commented Dec 11, 2018

After discussion: we might just remove this CIFAR 10 iterator.
However, we may add a thin CIFAR 10 iterator that downloads PNG format and uses ImageRecordReader internally...

@crockpotveggies

This comment has been minimized.

Copy link

commented Dec 17, 2018

@lock

This comment has been minimized.

Copy link

commented Jan 16, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 16, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants
You can’t perform that action at this time.