# Deterministic Tensorflow 

## Getting repeatable results with keras models.

---
If your like me then during your Machine learning work you've struggled with getting repeatable results out of your deep learning models. Often when building a network you don't have too much leeway to repeat and retest your model and training cycle to confirm your results and instead you have to take what you have gotten first time around and as long as its passed your validation and testing requirements, run with that to show off your work. However if you have the luxury (or requirement)  of retraining and repeating your work you will most likely find that given the same training settings, hyperparameters and training time, the final results of your hard work will be different. If you are lucky only slightly different and a positive change, but if your deadline is close, more likely it will be moderately worse and throw your whole hyposthesis and last couple of weeks work into doubt. This isn't an unexpected outcome and is part and parcel of using machine learning approaches that are non-deterministic and based on multiple random variables to kickstart change into the model. But still it can be a real blow to the confidence of your results especially when you are trying out something new, hoping to get a small improvement that is smaller then the possible variance of the original model's performance. <br>
Now the standard approach to deal with these varying results is to just simply repeat the process mutliple times and then average the result and publish that result with the footnote that this value was a collected average. When you don't see this it can put into doubt the exact accuracy of a paper's proclaimed results, was this result representative of the average expected performance? or instead a lucky random seed that performed extra well? 
<br>



The core parts of this tutorial are recongition of the various sources of randomness that are present in tensorflow and machine learning models.
<li>
    first the seed randomness influencing weight creation and updates </li>
    <li>next the gpu randomness sources </li>
   <li> then the randomness of shuffling the dataset </li>
   <li> and finally the effect on randomness of the order of execution (particularlly in notebook scenarios) </li>
   

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sys
import os
sys.path.append("..") # Adds higher directory to python modules path.
import branchingdnn as branching

In [2]:
import tensorflow as tf
import keras
print(keras.__version__)
print (tf.__version__)
from numpy import array
from keras.models import Sequential
from keras.layers import Dense
from keras.preprocessing.sequence import TimeseriesGenerator
# define dataset
series = array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# define generator
n_input = 2
generator = TimeseriesGenerator(series, series, length=n_input, batch_size=8)
# define model
model = Sequential()
model.add(Dense(100, activation='relu', input_dim=n_input))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit_generator(generator, steps_per_epoch=1, epochs=200, verbose=0)
# make a one step prediction out of sample
x_input = array([9, 10]).reshape((1, n_input))
yhat = model.predict(x_input, verbose=0)
print(yhat)

2.3.1
2.2.0
[[11.435026]]


In [None]:
import tensorflow as tf
import keras
import numpy as np

print(keras.__version__)
print (tf.__version__)
print (np.__version__)

from numpy import array
from keras.models import Sequential
from keras.layers import Dense
from keras.preprocessing.sequence import TimeseriesGenerator
# define dataset
series = array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# define generator
n_input = 2
generator = TimeseriesGenerator(series, series, length=n_input, batch_size=8)
# define model
model = Sequential()
model.add(Dense(100, activation='relu', input_dim=n_input))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit_generator(generator, steps_per_epoch=1, epochs=200, verbose=0)
# make a one step prediction out of sample
x_input = array([9, 10]).reshape((1, n_input))
yhat = model.predict(x_input, verbose=0)
print(yhat)

2.6.0
2.6.2
1.19.5




In [5]:
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import os
import time
(train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()
CLASS_NAMES= ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
###normal method
validation_images, validation_labels = train_images[:5000], train_labels[:5000] #get the first 5k training samples as validation set
train_images, train_labels = train_images[5000:], train_labels[5000:] # now remove the validation set from the training set.
test_images, test_labels = test_images, test_labels # now remove the validation set from the training set.

train_ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels)).shuffle()
test_ds = tf.data.Dataset.from_tensor_slices((test_images, test_labels))
validation_ds = tf.data.Dataset.from_tensor_slices((validation_images, validation_labels))


TypeError: shuffle() missing 1 required positional argument: 'buffer_size'

In [10]:

root_logdir = os.path.join(os.curdir, "logs\\fit\\")

def get_run_logdir():
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, run_id)

run_logdir = get_run_logdir()
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)


In [11]:
inputs = keras.Input(shape=(227,227,3))
x = keras.layers.Conv2D(filters=96, kernel_size=(11,11), strides=(4,4), activation='relu', input_shape=(227,227,3))(inputs)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2))(x)
x = keras.layers.Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), activation='relu', padding="same")(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2))(x)
x = keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same")(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Conv2D(filters=384, kernel_size=(1,1), strides=(1,1), activation='relu', padding="same")(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Conv2D(filters=256, kernel_size=(1,1), strides=(1,1), activation='relu', padding="same")(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2))(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(4096, activation='relu')(x)
x = keras.layers.Dropout(0.5)(x)

# ### first branch
# branchLayer = keras.layers.Flatten(name=tf.compat.v1.get_default_graph().unique_name("branch_flatten"))(x)
# branchLayer = keras.layers.Dense(124, activation="relu",name=tf.compat.v1.get_default_graph().unique_name("branch124"))(branchLayer)
# branchLayer = keras.layers.Dense(64, activation="relu",name=tf.compat.v1.get_default_graph().unique_name("branch64"))(branchLayer)
# branchLayer = keras.layers.Dense(10, name=tf.compat.v1.get_default_graph().unique_name("branch_output"))(branchLayer)

x = keras.layers.Dense(4096, activation='relu')(x)
x = keras.layers.Dropout(0.5)(x)

# ### second Branch
# branchLayer2 = keras.layers.Flatten(name=tf.compat.v1.get_default_graph().unique_name("branch_flatten"))(x)
# branchLayer2 = keras.layers.Dense(10, name=tf.compat.v1.get_default_graph().unique_name("branch_output"))(branchLayer2)

x = keras.layers.Dense(10, activation='softmax')(x)



In [None]:
model = keras.Model(inputs=inputs, outputs=[x], name="alexnet")

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=tf.optimizers.SGD(lr=0.001,momentum=0.9), metrics=['accuracy'])
model.summary()

---


In [2]:
import os
# os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
os.environ['TF_DETERMINISTIC_OPS'] = '1'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sys
import os
import keras
sys.path.append("..") # Adds higher directory to python modules path.
import branchingdnn as branching
# dataset = branching.dataset.prepare.dataset(tf.keras.datasets.cifar10.load_data(),64,5000,22500,(227,227),include_targets=False)

In [3]:
#minst
from tensorflow.keras import layers, models

In [4]:
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


# convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


In [5]:


def summarize_keras_trainable_variables(model, message):
    s = sum(map(lambda x: x.sum(), model.get_weights()))
    print("summary of trainable variables %s: %.13f" % (message, s))
    return s
# train_ds, validation_ds, test_ds = dataset

for i in range(3):
    seed = 42
    # random.seed(seed)
    tf.random.set_seed(seed)
    np.random.seed(seed)
    outputs =[]
    inputs = keras.Input(shape=(input_shape))
    x = layers.Flatten(input_shape=(28,28))(inputs)
    x = layers.Dense(512, activation="relu")(x)
    x= layers.Dropout(0.2)(x)
    #exit 2
    x = layers.Dense(512, activation="relu")(x)
    x= layers.Dropout(0.2)(x)
    #exit 3
    x = layers.Dense(512, activation="relu")(x)
    x= layers.Dropout(0.2)(x)
    #exit 4
    x = layers.Dense(512, activation="relu")(x)
    x= layers.Dropout(0.2)(x)
    #exit 5
    x = layers.Dense(512, activation="relu")(x)
    x= layers.Dropout(0.2)(x)
    #exit 1 The main branch exit is refered to as "exit 1" or "main exit" to avoid confusion when adding addtional exits
    output1 = layers.Dense(10, name="output1")(x)
    softmax = layers.Softmax()(output1)

    outputs.append(softmax)
    model = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model_normal")
    batch_size = 128
    epochs = 1
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    summarize_keras_trainable_variables(model,"before training")
    model.fit(x_train, y_train, shuffle=False,batch_size=batch_size, epochs=epochs, validation_split=0.1)
    summarize_keras_trainable_variables(model,"after training")

summary of trainable variables before training: 19.9035952091217
summary of trainable variables after training: -3558.5371806323528
summary of trainable variables before training: 19.9035952091217
summary of trainable variables after training: -3558.5371806323528
summary of trainable variables before training: 19.9035952091217
summary of trainable variables after training: -3558.5371806323528


In [14]:
batch_size = 128
epochs = 1
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)




<keras.callbacks.History at 0x1f0e27b8c50>

In [8]:

# tf.debugging.experimental.enable_dump_debug_info(logdir, tensor_debug_mode="FULL_HEALTH", circular_buffer_size=-1)
(train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()

CLASS_NAMES= ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

# import csv
# with open('results/altTrain_labels.csv', newline='') as f:
    # reader = csv.reader(f,quoting=csv.QUOTE_NONNUMERIC)
    # alt_trainLabels = list(reader)
# with open('results/altTest_labels.csv', newline='') as f:
    # reader = csv.reader(f,quoting=csv.QUOTE_NONNUMERIC)
    # alt_testLabels = list(reader)

# altTraining = tf.data.Dataset.from_tensor_slices((train_images,alt_trainLabels))

# validation_images, validation_labels = train_images[:5000], alt_trainLabels[:5000]
# train_ds = tf.data.Dataset.from_tensor_slices((train_images, alt_trainLabels))
# test_ds = tf.data.Dataset.from_tensor_slices((test_images, alt_testLabels))
train_labels = tf.keras.utils.to_categorical(train_labels,10)
test_labels = tf.keras.utils.to_categorical(test_labels,10)

###normal method
validation_images, validation_labels = train_images[:5000], train_labels[:5000] #get the first 5k training samples as validation set
train_images, train_labels = train_images[5000:], train_labels[5000:] # now remove the validation set from the training set.
train_ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
test_ds = tf.data.Dataset.from_tensor_slices((test_images, test_labels))
validation_ds = tf.data.Dataset.from_tensor_slices((validation_images, validation_labels))

def augment_images(image, label):
    # Normalize images to have a mean of 0 and standard deviation of 1
    # image = tf.image.per_image_standardization(image)
    # Resize images from 32x32 to 277x277
    image = tf.image.resize(image, (227,227))
    return image, label

train_ds_size = len(list(train_ds))
test_ds_size = len(list(test_ds))
validation_ds_size = len(list(validation_ds))

train_ds = (train_ds
                  .map(augment_images)
                  .shuffle(buffer_size=train_ds_size,seed=42,reshuffle_each_iteration=False)
                  .batch(batch_size=128, drop_remainder=True))

test_ds = (test_ds
                  .map(augment_images)
                #   .shuffle(buffer_size=train_ds_size)
                  .batch(batch_size=128, drop_remainder=True))

validation_ds = (validation_ds
                  .map(augment_images)
                #   .shuffle(buffer_size=validation_ds_size)
                  .batch(batch_size=128, drop_remainder=True))

In [9]:
inputs = keras.Input(shape=(227,227,3))
x = keras.layers.Conv2D(filters=96, kernel_size=(11,11), strides=(4,4), activation='relu', input_shape=(227,227,3))(inputs)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2))(x)
x = keras.layers.Conv2D(filters=256, kernel_size=(5,5), strides=(1,1), activation='relu', padding="same")(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2))(x)
x = keras.layers.Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), activation='relu', padding="same")(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Conv2D(filters=384, kernel_size=(1,1), strides=(1,1), activation='relu', padding="same")(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Conv2D(filters=256, kernel_size=(1,1), strides=(1,1), activation='relu', padding="same")(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.MaxPool2D(pool_size=(3,3), strides=(2,2))(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(4096, activation='relu')(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(4096, activation='relu')(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(10, activation='softmax')(x)
model = keras.Model(inputs=inputs, outputs=[x], name="alexnet")    
model.compile(loss='categorical_crossentropy', optimizer=tf.optimizers.SGD(lr=0.001,momentum=0.9), metrics=['accuracy'])

model.save("distil_test.hdf5")

In [10]:

seed = 42
# random.seed(seed)
tf.random.set_seed(seed)
np.random.seed(seed)

def summarize_keras_trainable_variables(model, message):
    s = sum(map(lambda x: x.sum(), model.get_weights()))
    print("summary of trainable variables %s: %.13f" % (message, s))
    return s
# train_ds, validation_ds, test_ds = dataset


for i in range(3):
    tf.random.set_seed(seed)
    np.random.seed(seed)
    model = tf.keras.models.load_model("distil_test.hdf5")
    summarize_keras_trainable_variables(model,"before training")
    
    model.fit(train_ds, shuffle=False,validation_data = validation_ds,epochs=1)
    summarize_keras_trainable_variables(model,"after training")
    results = model.evaluate(test_ds)
    print(results)

summary of trainable variables before training: 2931.7305606603622
summary of trainable variables after training: 2656064.6382689797319
[1.4122194051742554, 0.48828125]
summary of trainable variables before training: 2931.7305606603622
summary of trainable variables after training: 2656064.6382689797319
[1.4122194051742554, 0.48828125]
summary of trainable variables before training: 2931.7305606603622
summary of trainable variables after training: 2656064.6382689797319
[1.4122194051742554, 0.48828125]
