Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exception when changing the number of dataset using BalancedPathFilter #7738

Closed
forever1078 opened this issue May 15, 2019 · 11 comments

Comments

@forever1078
Copy link

commented May 15, 2019

I want to control the dataset size using BalancedPathFilter(new Random(seed), labelMaker, 20000, numLabels, 6000).The total dataset size is 60000,but I only want to use 20000 to train.
However,
java.lang.RuntimeException: java.lang.UnsupportedOperationException: Cannot do conversion to one hot using batched reader: 10 output classes, but array.size(1) is 4 (must be equal to 1 or numClasses = 10)
at org.deeplearning4j.datasets.iterator.AsyncDataSetIterator$AsyncPrefetchThread.run(AsyncDataSetIterator.java:430)
Caused by: java.lang.UnsupportedOperationException: Cannot do conversion to one hot using batched reader: 10 output classes, but array.size(1) is 4 (must be equal to 1 or numClasses = 10)
at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.convertWritablesBatched(RecordReaderMultiDataSetIterator.java:413)
at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.convertFeaturesOrLabels(RecordReaderMultiDataSetIterator.java:359)
at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.nextMultiDataSet(RecordReaderMultiDataSetIterator.java:332)
at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.next(RecordReaderMultiDataSetIterator.java:212)
at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:364)
at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:439)
at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:84)
at org.deeplearning4j.datasets.iterator.AsyncDataSetIterator$AsyncPrefetchThread.run(AsyncDataSetIterator.java:404)

How can I control the dataset size for training

@forever1078 forever1078 changed the title exception when changing the number of dataset using exception when changing the number of dataset using BalancedPathFilter May 15, 2019

@AlexDBlack

This comment has been minimized.

Copy link
Contributor

commented May 15, 2019

Please show the code you have used.

@forever1078

This comment has been minimized.

Copy link
Author

commented May 15, 2019

just the MnistClassifier example

 val trainData = File(rootDir + "mnist_png/training")
 val randNumGen = Random(1234)
 val trainSplit = FileSplit(trainData, NativeImageLoader.ALLOWED_FORMATS, Random(1234))
 val labelMaker = ParentPathLabelGenerator() // use parent directory name as the image label
 val pathFilter = BalancedPathFilter(Random(42),labelMaker,20000,10,6000)
 val split = trainSplit.sample(pathFilter,1.0)
 val trainRR = ImageRecordReader(28, 28, 1, labelMaker)

 trainRR.initialize(split[0])
 samplesUsedInTraining = split[0].length().toInt()

 val trainIter = RecordReaderDataSetIterator(trainRR, 54, 1, 10)

 // pixel values from 0-255 to 0-1 (min-max scaling)
 val imageScaler = ImagePreProcessingScaler()
 imageScaler.fit(trainIter)
 trainIter.preProcessor = imageScaler
 model!!.fit(trainIter)

the errors are

java.lang.RuntimeException: java.lang.UnsupportedOperationException: Cannot do conversion to one hot using batched reader: 10 output classes, but array.size(1) is 4 (must be equal to 1 or numClasses = 10)
        at org.deeplearning4j.datasets.iterator.AsyncDataSetIterator$AsyncPrefetchThread.run(AsyncDataSetIterator.java:430)
     Caused by: java.lang.UnsupportedOperationException: Cannot do conversion to one hot using batched reader: 10 output classes, but array.size(1) is 4 (must be equal to 1 or numClasses = 10)
        at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.convertWritablesBatched(RecordReaderMultiDataSetIterator.java:413)
        at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.convertFeaturesOrLabels(RecordReaderMultiDataSetIterator.java:359)
        at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.nextMultiDataSet(RecordReaderMultiDataSetIterator.java:332)
        at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.next(RecordReaderMultiDataSetIterator.java:212)
        at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:364)
        at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:439)
        at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:84)
        at org.deeplearning4j.datasets.iterator.AsyncDataSetIterator$AsyncPrefetchThread.run(AsyncDataSetIterator.java:404)

when I print the trainRR.numLabels(),it is less than 10.But how can I solve it?

@Charele

This comment has been minimized.

Copy link

commented May 15, 2019

Hmmm, I think there's a question in RandomPathFilter.filter() method.

@saudet

This comment has been minimized.

Copy link
Member

commented May 16, 2019

@Charele Good idea!

@forever1078 Does RandomPathFilter work or are you having issues only with BalancedPathFilter?

@forever1078

This comment has been minimized.

Copy link
Author

commented May 16, 2019

RandomPathFilter has the same issue

 val trainData = File(rootDir + "mnist_png/training")
 val randNumGen = Random(1234)
 val trainSplit = FileSplit(trainData, NativeImageLoader.ALLOWED_FORMATS, Random(1234))
 val labelMaker = ParentPathLabelGenerator() // use parent directory name as the image label
 val pathFilter = RandomPathFilter(Random(42),labelMaker,20000)
 val split = trainSplit.sample(pathFilter,1.0)
 val trainRR = ImageRecordReader(28, 28, 1, labelMaker)

 trainRR.initialize(split[0])
 samplesUsedInTraining = split[0].length().toInt()

 val trainIter = RecordReaderDataSetIterator(trainRR, 54, 1, 10)

 // pixel values from 0-255 to 0-1 (min-max scaling)
 val imageScaler = ImagePreProcessingScaler()
 imageScaler.fit(trainIter)
 trainIter.preProcessor = imageScaler
 model!!.fit(trainIter)
java.lang.RuntimeException: java.lang.UnsupportedOperationException: Cannot do conversion to one hot using batched reader: 10 output classes, but array.size(1) is 4 (must be equal to 1 or numClasses = 10)
        at org.deeplearning4j.datasets.iterator.AsyncDataSetIterator$AsyncPrefetchThread.run(AsyncDataSetIterator.java:430)
     Caused by: java.lang.UnsupportedOperationException: Cannot do conversion to one hot using batched reader: 10 output classes, but array.size(1) is 4 (must be equal to 1 or numClasses = 10)
        at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.convertWritablesBatched(RecordReaderMultiDataSetIterator.java:413)
        at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.convertFeaturesOrLabels(RecordReaderMultiDataSetIterator.java:359)
        at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.nextMultiDataSet(RecordReaderMultiDataSetIterator.java:332)
        at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.next(RecordReaderMultiDataSetIterator.java:212)
        at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:364)
        at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:439)
        at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:84)
        at org.deeplearning4j.datasets.iterator.AsyncDataSetIterator$AsyncPrefetchThread.run(AsyncDataSetIterator.java:404)
@saudet

This comment has been minimized.

Copy link
Member

commented May 16, 2019

It sounds like your dataset only has 4 classes, not 10. Make sure you have 10 classes in there.

@forever1078

This comment has been minimized.

Copy link
Author

commented May 16, 2019

@saudet It's true,I want to use a part of the MNist dataset. So I try to use the BalancedPathFilter or RandomPathFilter to select a part of the dataset. However the selected samples only contains 4 classes.
How can I select a part of dataset that contains all classes?

@saudet saudet self-assigned this May 18, 2019

@saudet

This comment has been minimized.

Copy link
Member

commented May 18, 2019

I see. I think that could be fixed by shuffling the paths before filtering instead of after in filter():
https://github.com/deeplearning4j/deeplearning4j/blob/master/datavec/datavec-api/src/main/java/org/datavec/api/io/filters/RandomPathFilter.java#L66

@Charele

This comment has been minimized.

Copy link

commented May 20, 2019

@saudet yes, I think so, Maybe we should shuffle before the "maxPaths" rule.

But I find another question in BalancedPathFilter.filter(),
I can't get the expected number of paths.

@Charele

This comment has been minimized.

Copy link

commented May 20, 2019

val trainSplit =
new FileSplit(new File("c:\\mnist_png\\training"), NativeImageLoader.ALLOWED_FORMATS, new Random())

val pathFilter = new BalancedPathFilter(new Random(), null, null);

val splits = trainSplit.sample(pathFilter, 1.0)

val trainRR =
new ImageRecordReader(28, 28, 1, new ParentPathLabelGenerator())

trainRR.initialize(splits(0))

val trainIter =
new RecordReaderDataSetIterator(trainRR, 1, 1, 10)

println("I get: " + trainIter.size)

I think I should get the full dataset,it's 60000.I can't

o.n.n.Nd4jBlas - Number of threads used for BLAS: 4
o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CPU]; OS: [Windows 7]
o.n.l.a.o.e.DefaultOpExecutioner - Cores: [8]; Memory: [3.5GB];
o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [MKL]
I get: 54210

@saudet

This comment has been minimized.

Copy link
Member

commented May 21, 2019

@Charele That's normal for BalancedPathFilter, use RandomPathFilter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.