Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DL4J: MultiLayerNetwork.output(DataSetIterator,boolean) can fail for variable length RNN data #7352

Closed
AlexDBlack opened this issue Mar 26, 2019 · 1 comment

Comments

Projects
None yet
1 participant
@AlexDBlack
Copy link
Member

commented Mar 26, 2019

/**
* Generate the output for all examples/batches in the input iterator, and concatenate them into a single array.
* See {@link #output(INDArray)}<br>
* NOTE: The output array can require a considerable amount of memory for iterators with a large number of examples
*
* @param iterator Data to pass through the network
* @return output for all examples in the iterator, concatenated into a
*/
public INDArray output(DataSetIterator iterator, boolean train) {
List<INDArray> outList = new ArrayList<>();
while (iterator.hasNext()) {
DataSet next = iterator.next();
INDArray features = next.getFeatures();
if (features == null)
continue;
INDArray fMask = next.getFeaturesMaskArray();
INDArray lMask = next.getLabelsMaskArray();
outList.add(this.output(features, train, fMask, lMask));
}
return Nd4j.concat(0, outList.toArray(new INDArray[outList.size()]));
}

For variable length time series, we can't just use concat, as the array lengths won't match - we need to pad.
However, if we introduce padding in this method, we have no way of returning the mask array also - i.e., the user has no way (other than looking for all 0s) to work out what data is real vs. padded.

At the very least, we should check this and throw a useful exception; perhaps a better solution would be to return either a mask array or the sequence lengths somehow.

@AlexDBlack AlexDBlack self-assigned this Mar 27, 2019

AlexDBlack added a commit that referenced this issue Mar 27, 2019

AlexDBlack added a commit that referenced this issue Mar 28, 2019

[WIP] Misc DL4J/ND4J/DataVec Issues (#7340)
* Add FirstDigitTransform (Benfords law) + tests

* Javadoc, polish

* #7325 Fix SameDiff.asFlatPrint

* Refactor DataVec readers to remove hard-coded use of Files, in favor of streams

* Add StreamInputSplit (partly complete)

* More tests, fixes

* Fixes for model import test failures

* DataVec fixes after earlier changes

* Another DataVec fix

* #7355 SameDiff array reuse fix

* #7343 SameDiff method for Pad op

* #7305 Fix getColumn on row vector (returning scalar, not view)

* #7168 Empty arrays - create only once

* #7002 Remove newFormat arg/field

* #7352 MultiLayerNetwork.output(DataSetIterator) validation

* Fixes

* Small fixes

* SameDiff variables: Switch to LinkedHashMap for consitent iteration order

* Fix validation NPE for LogFileWriter

* Reduce3 fixes

* Small test fix

* Small test fix

* Fix bad test

* Small test threshold tweak

* OpProfiler fix: null x array (random ops etc)

* Fix issue with array order not matching flattening order when Nd4j.ordering() == f - Nd4j.createFromArray
@lock

This comment has been minimized.

Copy link

commented Apr 27, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Apr 27, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.