Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train model from streams #7351

Closed
LuZhenHuan opened this issue Mar 26, 2019 · 4 comments

Comments

Projects
None yet
2 participants
@LuZhenHuan
Copy link

commented Mar 26, 2019

Use stream to train model

I am currently working on a predictive task of unequal sequences(many to many),which will save many files in a folder. Now some requirements require the use of streams to build datasets(i.e Others pass me some streams, I use these streams to train the model without saving the file locally)

how do you build a dataset and store it in HDFS without saving local files? Alex said he would support this.

thanks a lot

@AlexDBlack

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

Thanks for the issue. Functionality to support this is being added here: #7340
Keep an eye on that pull request; after it has been merged, the functionality will be available on snapshots: https://deeplearning4j.org/docs/latest/deeplearning4j-config-snapshots

I will provide an example (in the form of a unit test) before that is merged.

@AlexDBlack

This comment has been minimized.

Copy link
Member

commented Mar 27, 2019

Update: functionality is done, but PR is not yet merged. It should get merged later today or tomorrow at the latest.
For HDFS (or similar) there are two parts:
(a) List your URIs, and pass them to StreamInputSplit constructor
(b) Make your own Function<URI,InputStream> to open the streams for a given URI.

https://github.com/deeplearning4j/deeplearning4j/blob/169bf67354100233fa5db80a009ca1d93b4e7306/datavec/datavec-api/src/test/java/org/datavec/api/split/TestStreamInputSplit.java

For opening streams from HDFS, you can use this as a reference:
https://github.com/deeplearning4j/deeplearning4j/blob/master/datavec/datavec-spark/src/main/java/org/datavec/spark/transform/utils/SparkUtils.java#L130-L131

@LuZhenHuan

This comment has been minimized.

Copy link
Author

commented Mar 28, 2019

Alex You are very professional, thanks a lot!

@lock

This comment has been minimized.

Copy link

commented Apr 27, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Apr 27, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.