New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keras/CNTK: BatchNormalization layer causes predict() values to be incorrect #1994

minimaxir opened this Issue Jun 12, 2017 · 3 comments


None yet
4 participants

minimaxir commented Jun 12, 2017

Using a BatchNormalization Layer during the training of a Keras model will cause the output to assign an equal probability for each class, which is bad.

Does not reproduce when running the same script in Keras w/ TensorFlow backend, or when commenting-out the BatchNormalization() layer and rerunning in Keras/CNTK.

Test Script

import keras
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Dense, Dropout, Input, BatchNormalization
from keras.optimizers import RMSprop
import numpy as np

batch_size = 128
num_classes = 10
epochs = 1

# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

main_input = Input(shape=(784,))

hidden_1 = Dense(512, activation='relu')(main_input)
hidden_1 = BatchNormalization()(hidden_1)
hidden_1 = Dropout(0.2)(hidden_1)

hidden_2 = Dense(512, activation='relu')(hidden_1)
hidden_2 = Dropout(0.2)(hidden_2)

main_output = Dense(10, activation='softmax', name="main_out")(hidden_2)
model = Model(inputs=main_input, outputs=main_output)



history =, y_train,
                    validation_data=(x_test, y_test))


Actual Results:

Classes have near equal probability.

[[ 0.09599514 0.09778792 0.10449217 0.10341065 0.09734494 0.09968996 0.09743218 0.09651197 0.10852019 0.09881491]]

Expected Results:

One class has probability near 1; the rest near 0.

[[ 1.33117704e-08 1.82817816e-09 1.40895963e-05 5.67311133e-07 1.03561837e-09 8.40251257e-10 2.87819811e-12 9.99984384e-01 1.15252341e-08 9.22094387e-07]]


This comment has been minimized.


souptc commented Jun 12, 2017

thanks for repo this! we will take a look to see what happened.

@souptc souptc self-assigned this Jun 12, 2017


This comment has been minimized.

usuyama commented Jul 3, 2017

+1 - I faced same issue when I used BatchNormalization with CNTK/Keras.
The predict values seemed to be rather random. The different predict values were produced just by changing batch size.


This comment has been minimized.


souptc commented Jul 6, 2017

pull request has been created at:

@cha-zhang cha-zhang closed this Jul 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment