Skip to content
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.

Composing Models does not work with the MXNet backend #223

Open
2 tasks done
ssbusc1 opened this issue Jan 15, 2019 · 2 comments
Open
2 tasks done

Composing Models does not work with the MXNet backend #223

ssbusc1 opened this issue Jan 15, 2019 · 2 comments

Comments

@ssbusc1
Copy link

ssbusc1 commented Jan 15, 2019

I would like to stack together different models similar to what is described here: https://stackoverflow.com/questions/50092589/how-to-vertically-stack-trained-models-in-keras

This does not seem to work with the MXNet backend. Specifically, the simpler version of wrapping one model within another also does not work. I've included some sample code below that works with the Theano backend, but does not work with the MXNet backend.

  • If running on MXNet, check that you are up-to-date with the latest version. The installation
    instructions can be found here

I'm on keras-mxnet 2.2.4.1 installed via pip.

  • Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Example code below.

import numpy as np
np.random.seed(12345)

import keras
from keras.models import Model
from keras.layers import Input, Dense, Dropout

# Set up some dummy data. Basically, this just represents an identity function for odd/even numbers.

x = np.zeros((1000, 2))
y = np.zeros((1000, 2))

for i in range(x.shape[0]):
    if i % 2 == 0:
        x[i, 0] = 1
        y[i, 0] = 1
    else:
        x[i, 1] = 1
        y[i, 1] = 1

# Create a simple model
num_hidden = 10
dropout = 0.5
sigma = 0.01
weight_initializer = keras.initializers.RandomNormal(mean=0.0, stddev=sigma)
bias_initializer = keras.initializers.Zeros()
batch_size = 20
num_epoch = 200

inp = Input(shape=(2,))
hidden1 = Dense(num_hidden, kernel_initializer=weight_initializer, bias_initializer=bias_initializer, activation='relu')(inp)
dropout1 = Dropout(dropout)(hidden1)
output = Dense(2, kernel_initializer=weight_initializer, bias_initializer=bias_initializer, activation='softmax')(dropout1)

model = Model(inputs=[inp], outputs=[output])
learning_rate = 0.01
optimizer = keras.optimizers.SGD(lr=learning_rate)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

model.fit(x=train_x, y=train_y, epochs=num_epoch, batch_size=batch_size, verbose=2)

# This model should clearly overfit to the data. Evaluation on a slice of the input:
model.predict(x[0:10])

array([[  9.99917269e-01,   8.27653857e-05],
       [  2.75039609e-04,   9.99724925e-01],
       [  9.99917269e-01,   8.27653857e-05],
       [  2.75039609e-04,   9.99724925e-01],
       [  9.99917269e-01,   8.27653857e-05],
       [  2.75039609e-04,   9.99724925e-01],
       [  9.99917269e-01,   8.27653857e-05],
       [  2.75039609e-04,   9.99724925e-01],
       [  9.99917269e-01,   8.27653857e-05],
       [  2.75039609e-04,   9.99724925e-01]], dtype=float32)

# Wrap the model into another model, and predict again.
wrapping_model = Model(inputs=model.inputs, outputs=model.outputs)
wrapping_model.predict(x[0:10])

array([[ 0.50003469,  0.49996528],
       [ 0.49993384,  0.50006616],
       [ 0.50003469,  0.49996528],
       [ 0.49993384,  0.50006616],
       [ 0.50003469,  0.49996528],
       [ 0.49993384,  0.50006616],
       [ 0.50003469,  0.49996528],
       [ 0.49993384,  0.50006616],
       [ 0.50003469,  0.49996528],
       [ 0.49993384,  0.50006616]], dtype=float32)

With MXNet, the predictions with the original model seem fine, but with the wrapped model, the predictions are just random. With Theano, the results are identical to the predictions generated by the original model.

@roywei
Copy link

roywei commented Jan 17, 2019

Hi @ssbusc1 , thanks for submitting this issue. In mxnet backend, we have to override Keras Model and use MXNet Moduel under the hood. So the above code does not tranfer the weights from model to wrapping_model. You have to copy the weights over.

You can do that by either:

  1. save or load the weights if the two models have the same structure
model.save_weights('weights.h5')
wrapping_model = Model(inputs=model.inputs, outputs=model.outputs)
wrapping_model.load_weights('weights.h5')
  1. use layer.get_weights() and layer.set_weights() for specific layers to want the weights to be copied.
wrapping_model = Model(inputs=model.inputs, outputs=model.outputs)
for layer, wrapped_layer in zip(model.layers, wrapping_model.layers):
    print(layer.name)
    print(wrapped_layer.name)
    weights = layer.get_weights()
    wrapped_layer.set_weights(weights)
print(wrapping_model.predict(x[0:10]))
wrapping_model.summary()

This will produce the same prediction as original model

[[9.9909782e-01 9.0223132e-04]
 [1.6969813e-03 9.9830294e-01]
 [9.9909782e-01 9.0223132e-04]
 [1.6969813e-03 9.9830294e-01]
 [9.9909782e-01 9.0223132e-04]
 [1.6969813e-03 9.9830294e-01]
 [9.9909782e-01 9.0223132e-04]
 [1.6969813e-03 9.9830294e-01]
 [9.9909782e-01 9.0223132e-04]
 [1.6969813e-03 9.9830294e-01]]

@ssbusc1
Copy link
Author

ssbusc1 commented Jan 17, 2019

Thanks. The wrapping_model will eventually have a different structure. I'll try approach #2 above. First class support for this will definitely help (as the other backends already support this) as the composition gets more involved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants