Keras backend - support for Theano sequential models #2562

Merged
merged 54 commits into from Jan 10, 2017

Projects

None yet

4 participants

@pkoperek
Contributor
pkoperek commented Dec 22, 2016 edited

This is a very basic version of the Server for dl4j for Keras integration. It has some shortcomings:

  • different dimension ordering (Tensorflow vs Theano) isn't handled
  • only sequential model is supported for now
  • there is no feedback sent back to python
  • not sure what should be a good automated test here (i just used a simple script from https://github.com/pkoperek/keras-dl4j to test it) - a sample model/dataset dumped into test dir and checking if training works (there are no exceptions)?
  • there should be a jar containing all the deps created at the end - not sure how to handle different backend (multiple versions of the jar???)

All comments welcome!

@agibsonccc
Member

here should be a jar containing all the deps created at the end - not sure how to handle different backend (multiple versions of the jar???)

You can use maven-assembly for this

not sure what should be a good automated test here (i just used a simple script from https://github.com/pkoperek/keras-dl4j to test it) - a sample model/dataset dumped into test dir and checking if training works (there are no exceptions)?

Training without exception is acceptable.

@agibsonccc

A lot to do yet.

deeplearning4j-keras/pom.xml
+ <groupId>org.deeplearning4j</groupId>
+ <artifactId>deeplearning4j-keras</artifactId>
+ <version>0.7.2-SNAPSHOT</version>
+ <packaging>jar</packaging>
@agibsonccc
agibsonccc Dec 23, 2016 Member

Don't specify the version or groupId it's inherited from the parent already.

deeplearning4j-keras/pom.xml
+ <name>DL4J Keras support</name>
+ <description>DL4J as a backend for Keras</description>
+ <url>http://maven.apache.org</url>
+
@agibsonccc
agibsonccc Dec 23, 2016 Member

Strip the url please.

deeplearning4j-keras/pom.xml
+ <groupId>org.deeplearning4j</groupId>
+ <artifactId>deeplearning4j-modelimport</artifactId>
+ <version>${dl4j.version}</version>
+ <exclusions>
@agibsonccc
agibsonccc Dec 23, 2016 Member

Change this to project.version, there's zero reason to use dl4.version anywhere in the project.

deeplearning4j-keras/pom.xml
+ <groupId>org.deeplearning4j</groupId>
+ <artifactId>deeplearning4j-core</artifactId>
+ <version>${dl4j.version}</version>
+ <exclusions>
@agibsonccc
agibsonccc Dec 23, 2016 Member

Change this to project.version, there's zero reason to use dl4.version anywhere in the project.

deeplearning4j-keras/pom.xml
+ <groupId>junit</groupId>
+ <artifactId>junit</artifactId>
+ <scope>test</scope>
+ </dependency>
@agibsonccc
agibsonccc Dec 23, 2016 Member

Don't add the scope here, dependency management from the parent poms already declares the scope.

deeplearning4j-keras/pom.xml
+ <artifactId>nd4j-native</artifactId>
+ <version>${nd4j.version}</version>
+ <classifier>linux-x86_64</classifier>
+ </dependency>
@agibsonccc
agibsonccc Dec 23, 2016 Member

Don't specify the classifier here. Maven auto figures out what to use based on the os. This makes cross platform builds impossible otherwise. Also make sure to specify test (this is how we test all dl4j code) No code should EVER depend on a backend.

+ String type,
+ String trainFeaturesFile,
+ String trainLabelsFile,
+ int batchSize,
@agibsonccc
agibsonccc Dec 23, 2016 Member

This is a lot. Maybe consider a POJO using lombok to generate the builder would be easier?

+ try {
+ MultiLayerNetwork multiLayerNetwork;
+ if ("sequential".equals(type)) {
+ multiLayerNetwork = KerasModelImport.importKerasSequentialModelAndWeights(modelFilePath);
@agibsonccc
agibsonccc Dec 23, 2016 Member

Computation graph? /cc @turambar @eraly

+ }
+ }
+
+ logger.info("Learning model finished");
@agibsonccc
agibsonccc Dec 23, 2016 Member

Use lombok @Slf4j notation instead.

+ private long mulArray(long[] shape) {
+ long retVal = 1L;
+ for (int i = 0; i < shape.length; i++) {
+ retVal *= shape[i];
@agibsonccc
agibsonccc Dec 23, 2016 Member

Please use ArrayUtil.prodLong instead.

@pkoperek
Contributor
pkoperek commented Dec 26, 2016 edited

@agibsonccc: Updated with changes from comments - please have a look in a free moment. I'll be submitting the package building part together with the python code - i need to think about how they should be distributed together.

@agibsonccc

A lot more to go yet.

deeplearning4j-keras/pom.xml
+ <groupId>net.sf.py4j</groupId>
+ <artifactId>py4j</artifactId>
+ <version>0.10.4</version>
+ </dependency>
@agibsonccc
agibsonccc Dec 28, 2016 Member

Could you put a py4j.version at the root of the dl4j project?

@pkoperek
pkoperek Dec 28, 2016 Contributor

👍

deeplearning4j-keras/pom.xml
+ <groupId>org.nd4j</groupId>
+ <artifactId>nd4j-native</artifactId>
+ <version>${nd4j.version}</version>
+ </dependency>
@agibsonccc
agibsonccc Dec 28, 2016 Member

You should never have a dependency on a backend.

+import java.io.IOException;
+
+import static org.bytedeco.javacpp.hdf5.H5F_ACC_RDONLY;
+
+@Slf4j
+public class DeepLearning4jEntryPoint {
+
+ public void fit(EntryPointFitParameters entryPointFitParameters)
+ }
+
+ for (int i = 0; i < entryPointFitParameters.getNbEpoch(); i++) {
+ log.info("Fitting: " + i);
@agibsonccc
agibsonccc Dec 28, 2016 Member

This is a lot of code. Could we maybe have a unit test for just this section? I would encapsulate this in a method. The views here are my main concern.

+
+ private INDArray h5FileToNDArray(String inputFilePath) {
+ hdf5.H5File h5File = new hdf5.H5File();
+ h5File.openFile(inputFilePath, H5F_ACC_RDONLY);
@agibsonccc
agibsonccc Dec 28, 2016 Member

try/with would be cleaner

@pkoperek
pkoperek Dec 28, 2016 Contributor

👍

+ new RecursiveCopier(input, dataBuffer, shape).copy();
+
+ h5File.close();
+
@agibsonccc
agibsonccc Dec 28, 2016 Member

Again try /with

@pkoperek
pkoperek Dec 28, 2016 Contributor

👍

+ private int[] toInt(long[] array) {
+ int[] retVal = new int[array.length];
+
+ for (int i = 0; i < array.length; i++) {
@agibsonccc
agibsonccc Dec 28, 2016 Member

Add this in to ArrayUtil if it's not already. (This is in nd4j where I put this kind of stuff.)

+import lombok.Builder;
+
+@Builder
+public class EntryPointFitParameters {
@agibsonccc
agibsonccc Dec 28, 2016 Member

Also use lombok rather than manual getters and setters.

@pkoperek
pkoperek Dec 28, 2016 Contributor

👍

+import org.slf4j.LoggerFactory;
+import py4j.GatewayServer;
+
+public class Server {
@pkoperek
pkoperek Dec 28, 2016 Contributor

👍

+
+ private static final Logger logger = LoggerFactory.getLogger(Server.class);
+
+ public static void main(String[] args) {
@agibsonccc
agibsonccc Dec 28, 2016 Member

Use @Slf4j in lombok.

@pkoperek
pkoperek Dec 28, 2016 Contributor

👍

+ File features = prepareResource("theano_mnist/x_train.h5");
+ File labels = prepareResource("theano_mnist/y_train.h5");
+
+ EntryPointFitParameters entryPointParameters = EntryPointFitParameters
@agibsonccc
agibsonccc Dec 28, 2016 Member

Comment on overall api. Is there any way you can just use a neuralnet configuration directly or better yet leverage the work @turambar and @eraly are doing with model import?

@agibsonccc
agibsonccc Dec 28, 2016 Member

Nvm I see how you're doing it now.

@pkoperek pkoperek added a commit to pkoperek/nd4j that referenced this pull request Dec 28, 2016
@pkoperek pkoperek toInts - moved from deeplearning4j/deeplearning4j#2562 8d50071
+ .batchSize(128)
+ .nbEpoch(2)
+ .type("sequential")
+ .build();
@agibsonccc
agibsonccc Dec 28, 2016 Member

Make "sequential" an enum

@pkoperek pkoperek changed the title from [RFC] Keras backend to Keras backend - support for Theano sequential models Dec 29, 2016
@pkoperek
Contributor

Two general questions about the backends:

  1. You said that maven automatically selects the one with correct classifier based on the operating system: I was going through poms/plugins and couldn't figure out how this is done. Could you point me to some docs?
  2. I moved the nd4j-native to test scope (without this tests fail as i don't have a backend on classpath). However, if i want to build a package which is shippable with python wrapper code and/or can be run separately - i think i will have to add the backend as a dependency (have a separate proflle for this? not sure if this would make much sense).

@agibsonccc: What do you think about wrapping the server with a docker container? Using dl4j as a keras backend would require then:

  1. install the wrapper package with pip
  2. pull & start the docker container
  3. use the wrapper in keras script

I think there have to be some manual server installation steps - docker seems to offer least friction.

@saudet
Member
saudet commented Dec 29, 2016

The dependency is here: https://github.com/deeplearning4j/nd4j/blob/master/nd4j-backends/nd4j-backend-impls/nd4j-native/pom.xml#L18 but that doesn't work for Gradle, sbt, etc so don't rely on that, and use nd4j-native-platform as a separate dependency somewhere yes.

pkoperek added some commits Dec 18, 2016
@pkoperek pkoperek gitignore
stub code for server
pom - basic config
36a5bbe
@pkoperek pkoperek add keras to parent pom ec73261
@pkoperek pkoperek slf logging 12c1baf
@pkoperek pkoperek copying data from float[] to NDArray according to specified shape 52cfd68
@pkoperek pkoperek reading data from h5 files 53c6ba1
@pkoperek pkoperek ndj4 api deps b8894f3
@pkoperek pkoperek fix the issue with copying only the first array element 7c662d7
@pkoperek pkoperek select correct batch intervals for labels dataset 48d33c7
@pkoperek pkoperek remove obsolete log 291a2cd
@pkoperek pkoperek trace batch processing progress 74c570a
@pkoperek pkoperek removed ugly hack for tensorflow d8fdf98
@pkoperek pkoperek additional logging and import cleanup 158ce12
@pkoperek pkoperek pom: remove compiler plugin, add dependency to single backend (??? no…
…t sure which backend should be shipped with server)
77f2aac
@pkoperek pkoperek better package name 22a806f
@pkoperek pkoperek fixes from the comments f72c1d2
@pkoperek pkoperek changes from comments + little cleanup 600afe7
@pkoperek pkoperek pojo for fit entry point fd20c21
@pkoperek pkoperek fluent assertions + re-indent 070ef45
@pkoperek pkoperek use the pojo as parameter ec3abbe
@pkoperek pkoperek missing test scope d51b261
@pkoperek pkoperek failing test 89e0732
@pkoperek pkoperek not using assertj finally fdf9c02
@pkoperek pkoperek resources for Keras theano model for MNIST example ce1f6a2
@pkoperek pkoperek working test 9680e47
@pkoperek pkoperek logging in tests 0e93046
@pkoperek pkoperek use lombok in Server f8aca94
@pkoperek pkoperek changing poms according to comments ef78775
@pkoperek pkoperek javadoc 6adb1ed
@pkoperek pkoperek javadoc 7a88441
@pkoperek pkoperek try-with and other comments ee56e7a
@pkoperek pkoperek refactor a bit - extract fitInBatches b33d7f0
@pkoperek pkoperek javadoc 46c6d63
@pkoperek pkoperek enum for model types 1491441
@pkoperek pkoperek refactoring: using the enum 79558c6
@pkoperek pkoperek reorganizing pom (adding backend for tests) f8e09de
@pkoperek pkoperek refactoring into smaller pieces b254a32
@pkoperek pkoperek basic unit tests for neural network batch learner 1397533
@pkoperek pkoperek using nd4j-native-platform instead of nd4j-native 1d5770d
@agibsonccc

Needs refining still.

deeplearning4j-keras/pom.xml
+ <groupId>org.nd4j</groupId>
+ <artifactId>nd4j-native-platform</artifactId>
+ <version>${nd4j.version}</version>
+ <scope>test</scope>
@agibsonccc
agibsonccc Jan 1, 2017 Member

I'm not sure if you tested this :D. It should be physically impossible for this to work. You can only use nd4j-native when building from source. This includes test. Please change it back.

@pkoperek
pkoperek Jan 2, 2017 edited Contributor

Maybe I did something weird with maven, but in IntelliJ runs the unit tests without a problem :)
Doesn't nd4j-native-platform refer to nd4j-native?

Excerpt from *-platform pom:

    <properties>
        <nd4j.backend>nd4j-native</nd4j.backend>
    </properties>

    <dependencies>
        <dependency>
            <groupId>${project.groupId}</groupId>
            <artifactId>${nd4j.backend}</artifactId>
            <version>${project.version}</version>
        </dependency>

Either way - I'll change it back :)

+ try {
+ MultiLayerNetwork multiLayerNetwork = neuralNetworkReader.readNeuralNetwork(entryPointFitParameters);
+
+ INDArray features = ndArrayHDF5Reader.readFromPath(entryPointFitParameters.getTrainFeaturesFile());
@agibsonccc
agibsonccc Jan 1, 2017 Member

Is this how keras data is stored? Bit concerned about data representation among other things here..What about reading in numpy arrays?

@pkoperek
pkoperek Jan 2, 2017 Contributor

No - on Python side I'm dumping the dataset myself (and I chose HDF5 - this is the same format Keras uses to export models).

I know that there are problems with this e.g. de/serialization slows down the process, there are obvious practical size limitations. The reason I did it this way is that on the Python side the data coming from the raw data set can be pre-processed (and I think it usually is :) ) before learning and we would have to somehow copy that logic to Java in order to be able to perform the fitting.

The problem here is that we are not using Jython - because it cannot support numpy, and simply just export/import the Keras model. This requires exporting/importing the pre-processed data which was supposed to be fed into the model.

For large distributed datasets it won't work completely - however I'm not sure if in this case someone wouldn't just invest time in rewriting his model in DL4J directly.

Regarding reading/writing numpy arrays: the format used here is pickle (https://docs.python.org/2/library/pickle.html). I was looking for some decent library to read this inside JVM and couldn't really find anything what would convince me that it is stable and well tested. On the other hand HDF5 is already used by Keras and seems to work pretty well.

I don't really have a strong opinion which lib/data format should be used - if you feel there is a better way to do it let me know.

+
+import static org.bytedeco.javacpp.hdf5.H5F_ACC_RDONLY;
+
+public class NDArrayHDF5Reader {
@agibsonccc
agibsonccc Jan 1, 2017 Member

Javadoc.

+import org.nd4j.linalg.indexing.NDArrayIndex;
+
+@Slf4j
+public class NeuralNetworkBatchLearner {
+
+ int begin = 0;
+ int samplesCount = features.size(0);
+
@agibsonccc
agibsonccc Jan 1, 2017 Member

This feels wrong to me. Does it have to be an in memory object? We should be consistent and try to to use iterators if possible.

@pkoperek
pkoperek Jan 2, 2017 Contributor

I think that the whole HDF5 file is read into memory when I read it from disk to an NDArray
here:

INDArray input = Nd4j.create(shape);
new RecursiveCopier(input, dataBuffer, shape).copy();
+@Slf4j
+public class NeuralNetworkReader {
+
+ public MultiLayerNetwork readNeuralNetwork(EntryPointFitParameters entryPointFitParameters)
+
+import org.nd4j.linalg.api.ndarray.INDArray;
+
+public class RecursiveCopier {
+
+ private final INDArray output;
+ private final float[] input;
+ private final int[] shape;
@agibsonccc
agibsonccc Jan 1, 2017 Member

Use databuffers here. (Nd4j.createBuffer()) the input should never assumed to be float.

+ * Main class for the DL4J-as-Keras-backend. Simply launches py4j GatewayServer with
+ * an entry point exposing API available on Python side.
+ */
+@Slf4j
@agibsonccc
agibsonccc Jan 1, 2017 Member

More information. Also add @author

@pkoperek
pkoperek Jan 2, 2017 Contributor

I always kind of thought that adding author just makes the file longer as you can check the real author just by looking on blame of the file ;) but ok - I'll add this :)

@agibsonccc

A lot more to do.

@@ -25,14 +23,15 @@ public void fit(EntryPointFitParameters entryPointFitParameters) throws Exceptio
try {
MultiLayerNetwork multiLayerNetwork = neuralNetworkReader.readNeuralNetwork(entryPointFitParameters);
@agibsonccc
agibsonccc Jan 4, 2017 Member

What about ComputationGraph and sequential? This should only be a few lines of code difference anyways.

@pkoperek
pkoperek Jan 10, 2017 Contributor

This is the Keras sequential model, no? :)
Lets keep this PR simple - I'll add the ComputationGraph as a separate one (want to unblock myself to move on with the Python stuff)

+ DataSetIterator dataSetIterator = new HDF5MiniBatchDataSetIterator(
+ entryPointFitParameters.getTrainFeaturesDirectory(),
+ entryPointFitParameters.getTrainLabelsDirectory()
+ );
@agibsonccc
agibsonccc Jan 4, 2017 Member

Not quite what I expected but I can deal with this.

+ * Async support is turned off on purpose: otherwise there are indeterministic segfaults in JavaCPP
+ * when cleaning memory after HDF5 libs.
+ */
+ return false;
@agibsonccc
agibsonccc Jan 4, 2017 Member

@saudet anything we could do about this?

+
+ @Override
+ public void setPreProcessor(DataSetPreProcessor preProcessor) {
+ throw new UnsupportedOperationException();
@agibsonccc
agibsonccc Jan 4, 2017 Member

This doesn't strike me as reasonable. Just add a field for it and call set :D

+ @Override
+ public DataSetPreProcessor getPreProcessor() {
+ throw new UnsupportedOperationException();
+ }
@agibsonccc
agibsonccc Jan 4, 2017 Member

Same it's 1 field.

+
+ @Override
+ public List<String> getLabels() {
+ return null;
@agibsonccc
agibsonccc Jan 4, 2017 Member

Throw an unsupported operation exception here.

public class NDArrayHDF5Reader {
- public INDArray readFromPath(String inputFilePath) {
+ public INDArray readFromPath(Path inputFilePath) {
try (hdf5.H5File h5File = new hdf5.H5File()) {
@@ -21,7 +27,7 @@ public void copy() {
private void copyRecursive(int shapeIdx, int[] indexes) {
if (shape.length == shapeIdx) {
- output.putScalar(indexes, input[inputIdx++]);
+ output.putScalar(indexes, input.getFloat(inputIdx++));
@agibsonccc
agibsonccc Jan 4, 2017 Member

Wondering if we could use memcpy or something better here?

@pkoperek
pkoperek Jan 10, 2017 edited Contributor

not sure if i'm not doing something colossally stupid, but maybe instead of using the *Copier I could just do:

            INDArray input = Nd4j.create(shape);
            input.setData(dataBuffer);

right? :)

- .trainFeaturesFile(features.getAbsolutePath())
- .trainLabelsFile(labels.getAbsolutePath())
+ .trainFeaturesDirectory(features.toAbsolutePath().toString())
+ .trainLabelsDirectory(labels.toAbsolutePath().toString())
.batchSize(128)
@agibsonccc
agibsonccc Jan 4, 2017 Member

Better. Why not also allow files though? The internal implementation of this would just be equivalent to calling director on the file's parent.

@pkoperek
pkoperek Jan 10, 2017 Contributor

I think keeping only directories makes the server part more clear (simpler). The client side has to generate the batch files by using functions defined in the Python lib so either way we can 'enforce' the correct parameters.

@agibsonccc
agibsonccc Jan 10, 2017 Member

Fine with me :D.

+
+ private Set<String> listBatchFiles(String classpathDirectory) {
+ return new Reflections(classpathDirectory, new ResourcesScanner()).getResources(endsWith(".h5"));
+ }
@agibsonccc
agibsonccc Jan 4, 2017 Member

Not sure why you'd use reflections here. This shouldn't have to be on the classpath?

@agibsonccc
agibsonccc Jan 4, 2017 Member

Ahh, testing. That makes sense. That's still going to be slow though :D

return file;
}
+ private void copyClasspathResourceToFile(String resourceName, File file) throws IOException {
+ FileUtils.copyInputStreamToFile(this.getClass().getClassLoader().getResourceAsStream(resourceName), file);
@agibsonccc
agibsonccc Jan 4, 2017 Member

Again, I don't think data needs to be on the classpath.

@agibsonccc
agibsonccc Jan 4, 2017 Member

Ahh, testing. That makes sense. That's still going to be slow though :D

@SusanBot

Jenkins Build Triggered

@SusanBot

Jenkins Build Started

@agibsonccc

Done. Merging.

@agibsonccc agibsonccc merged commit dead64e into deeplearning4j:master Jan 10, 2017
@SusanBot

Build finished. 741 tests run, 31 skipped, 565 failed.

@SusanBot

Refer to this link for build results (access rights to CI server needed):
/job/deeplearning4j_PR/127/

Build result: ABORTED

[...truncated 3.15 GB...] at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:404)Caused by: java.io.FileNotFoundException: https://api.github.com/repos/deeplearning4j/deeplearning4j/statuses/1e4a1542b52bde14e16e4b5617264f6bdd8bfa4f at sun.reflect.GeneratedConstructorAccessor46.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1890) at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1885) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1884) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1457) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) at org.kohsuke.github.Requester.parse(Requester.java:535) at org.kohsuke.github.Requester._to(Requester.java:262) ... 12 moreCaused by: java.io.FileNotFoundException: https://api.github.com/repos/deeplearning4j/deeplearning4j/statuses/1e4a1542b52bde14e16e4b5617264f6bdd8bfa4f at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1836) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338) at org.kohsuke.github.Requester.parse(Requester.java:525) ... 13 morechannel stopped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment