Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A general method for packaging additional data with a serialized model #4957

Closed
treo opened this issue Apr 20, 2018 · 6 comments

Comments

@treo
Copy link

commented Apr 20, 2018

A common problem that keeps coming up in the DL4J Gitter, is how to get the labels back after predicting an output. And the common answer is, that labels have to be saved separately and reloaded separately.

It would be nice if there was a general method of packaging additional data with a serialized model, just like normalizers can already be added to it.

Serialization and deserialization would still be the responsibility of the programmer, but at least all additional data that's needed to use the model would be packed into a single file.

I'm aware that the serialized model file is basically a zip file, and there is nothing that's preventing me from doing exactly that. But I think it would be nice to have pair of simple utility methods to add/retrieve additional entries from the file.

@AlexDBlack

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2018

This seems reasonable to me, though (depending on the scope) perhaps a little challenging to design well.

Here's some stuff we might want in a saved model, or be able to do with a saved model:

  • Data pipeline
    • Labels
    • Normalizers
    • RecordReader/RecordReaderDataSetIterator etc configuration
    • Some way of saying "here's some raw <images, text, etc>, give me net outputs applying the record reader, normalizer, etc"
    • Some way of getting output in a post-processed/non-INDArray format - for example a list of detected objects for an object detection model
  • Model metadata
    • Training data info, timestamps, etc
  • Some easy way to work out what is actually stored in the model file... i.e., what's present? what's not?
@treo

This comment has been minimized.

Copy link
Author

commented Apr 24, 2018

The scope I'm asking for is pretty much the smallest possible: just a pair of methods to add/retrieve data from the model zip file. Something along the lines of the following should be enough for the start:

OutputStream addNamedDataToModel(String name);
InputStream retrieveNamedDataFromModel(String name);

(couldn't really think of better names, so it is more about the signature)

@AlexDBlack

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2018

Hm... that seems reasonable.

What about putting/getting objects though (perhaps in addition to the stream options)? It'd be even more convenient/easy to use - though perhaps prone to serialization changes between versions, at least for things like normalizers etc... should be fine for labels and the like, however.
Though for normalizers, we could intercept the addObject(Object) method and use our serialization, isntead of Java object serialization.

@raver119

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2018

Let's just put properties there, as we've discussed long ago?

@AlexDBlack AlexDBlack self-assigned this May 9, 2018

AlexDBlack added a commit that referenced this issue May 9, 2018
@AlexDBlack

This comment has been minimized.

Copy link
Contributor

commented May 9, 2018

ModelSerializer now has the following methods:
addObjectToFile(File f, String key, Object o)
getObjectFromFile(File f, String key)
List listObjectsInFile(File f)

#5097

@lock

This comment has been minimized.

Copy link

commented Sep 22, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Sep 22, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants
You can’t perform that action at this time.