Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception in thread "main" java.lang.NegativeArraySizeException while invoking totalOutcomes() on RecordReaderDataSetIterator #7140

Closed
rahul-raj opened this issue Feb 11, 2019 · 4 comments

Comments

Projects
None yet
2 participants
@rahul-raj
Copy link

commented Feb 11, 2019

Issue Description

Please describe our issue, along with:
- expected behavior
totalOutcomes() should return the number of labels. In the below dataset:
http://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv
The first one is output label. So, it is expected to be 7.

- encountered behavior
I'm getting the below exception:

Exception in thread "main" java.lang.NegativeArraySizeException
	at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.convertWritablesBatched(RecordReaderMultiDataSetIterator.java:396)
	at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.convertFeaturesOrLabels(RecordReaderMultiDataSetIterator.java:360)
	at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.nextMultiDataSet(RecordReaderMultiDataSetIterator.java:327)
	at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.next(RecordReaderMultiDataSetIterator.java:213)
	at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:365)
	at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.next(RecordReaderDataSetIterator.java:440)
	at org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator.totalOutcomes(RecordReaderDataSetIterator.java:390)
	at com.javadeeplearningcookbook.app.CSVRecordExample.main(CSVRecordExample.java:54)

Version Information

Please indicate relevant versions, including, if relevant:

  • Deeplearning4j version -> 1.0.0-beta3
  • platform information (OS, etc) -> Windows 7, 64-bit
  • CUDA version, if used -> NA
  • NVIDIA driver version, if in use -> NA

Contributing

You can use the below gist as reference to code:
https://gist.github.com/rahul-raj/d1c4dc088b94d784f49a6aaf023dbad7

Let me know if further inputs required. Thank you.

Aha! Link: https://skymindai.aha.io/features/ND4J-114

@AlexDBlack AlexDBlack self-assigned this Feb 11, 2019

@AlexDBlack

This comment has been minimized.

Copy link
Member

commented Feb 11, 2019

OK, turns out this was a simple one: your data has 7 columns, hence should be indexed 0 to 6 inclusive, but you're indexing 0 to 7 (as if you had 8 total columns).
Runs fine with this:
DataSetIterator dataSetIterator = new RecordReaderDataSetIterator(transformProcessRecordReader,writableConverter,8,1,6,2,-1,true);
Note 6 not 7

As an aside, looks like you've got features and labels reversed... this might be what you want?

        DataSetIterator dataSetIterator = new RecordReaderDataSetIterator.Builder(transformProcessRecordReader, 8)
                .classification(0, 2)   //Column 0, 2 possible classes
                .build();

(You can also do the same with the constructors instnead of the builder)

Edit: I'll add some better validation so it's more obvious what the problem actually is.

@rahul-raj

This comment has been minimized.

Copy link
Author

commented Feb 11, 2019

@AlexDBlack
Thank you very much for the response. The alternate approach that you suggested will suit better in my case. I still got one question.
image
Are you using the same dataset that I have? For me there are 8 columns in total.
In this case, the index should be still be from 1 till 7 provided index 0 is output label? Trying to understand why it cannot accept index 7 even though there's an 8th column.

I checked the schema transformation process too. Schema had total 8 columns. Then it was passed to transformation process:

                                                .removeColumns("Name","Fare") //remove 2 columns
                                                .categoricalToInteger("Sex") //no change
                                                .categoricalToOneHot("Pclass") //add 3 columns
                                                .removeColumns("Pclass[1]") //remove 1 column

At the end it's still having 8 columns as per my calculation. So I was trying to figure out how the index bound is still rounded to 6 instead of 7.

@AlexDBlack

This comment has been minimized.

Copy link
Member

commented Feb 11, 2019

@rahul-raj You can check the output of the transformProcessRecordReader by using .next() on it and looking at the size of the returned list. That has 7 elements when I run your code locally.

AlexDBlack added a commit that referenced this issue Feb 12, 2019

[WIP] DL4J/SameDiff Misc (#7145)
* Small SameDiff fix (variable creation)

* #7140 RRDSI better validation for invalid indices

* GELU tests + polishing

* Deconv3d

* Deconv3d fixes, test

* Switch to FB 1.10.0

* Small deconv3d tweaks

* Javadoc
@lock

This comment has been minimized.

Copy link

commented Mar 14, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 14, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.