Skip to content
This repository has been archived by the owner on Jun 9, 2021. It is now read-only.

Benchmark: CNN proposal #25

Open
Willian-Zhang opened this issue Nov 20, 2020 · 47 comments
Open

Benchmark: CNN proposal #25

Willian-Zhang opened this issue Nov 20, 2020 · 47 comments

Comments

@Willian-Zhang
Copy link

Willian-Zhang commented Nov 20, 2020

The following code implements the original @ylecun LeCun's CNN architecture., with Dropout comment out due to an issue.

import tensorflow.compat.v2 as tf
import tensorflow_datasets as tfds

tf.enable_v2_behavior()

from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')


(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

batch_size = 128

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(batch_size)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)


ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(batch_size)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)


model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(32, kernel_size=(3, 3),
                 activation='relu'),
  tf.keras.layers.Conv2D(64, kernel_size=(3, 3),
                 activation='relu'),
  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
#   tf.keras.layers.Dropout(0.25),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
#   tf.keras.layers.Dropout(0.5),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(0.001),
    metrics=['accuracy'],
)

model.fit(
    ds_train,
    epochs=12,
    validation_data=ds_test,
)

packages required to run:

pip install tensorflow_datasets
@Willian-Zhang
Copy link
Author

Willian-Zhang commented Nov 20, 2020

This is giving me results on my MacBook Air 2020 m1 8G

2020-11-20 23:47:18.141957: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-20 23:47:18.145970: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-20 23:47:18.479186: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/10
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1614 - accuracy: 0.9519/Users/willian/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1613 - accuracy: 0.9519 - val_loss: 0.0449 - val_accuracy: 0.9853
Epoch 2/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0427 - accuracy: 0.9867 - val_loss: 0.0336 - val_accuracy: 0.9885
Epoch 3/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0264 - accuracy: 0.9914 - val_loss: 0.0333 - val_accuracy: 0.9885
Epoch 4/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0167 - accuracy: 0.9946 - val_loss: 0.0393 - val_accuracy: 0.9879
Epoch 5/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0128 - accuracy: 0.9956 - val_loss: 0.0333 - val_accuracy: 0.9890
Epoch 6/10
469/469 [==============================] - 24s 49ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0087 - accuracy: 0.9973 - val_loss: 0.0341 - val_accuracy: 0.9900
Epoch 7/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9975 - val_loss: 0.0379 - val_accuracy: 0.9887
Epoch 8/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0063 - accuracy: 0.9979 - val_loss: 0.0366 - val_accuracy: 0.9906
Epoch 9/10
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9982 - val_loss: 0.0512 - val_accuracy: 0.9859
Epoch 10/10
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9982 - val_loss: 0.0462 - val_accuracy: 0.9884

Keys are:

  • 24s/epoch
  • 48ms/step (batch-size relevant)
  • 98.8% final acc

on my Mac mini 2020 m1 16G

  • 22s/epoch
  • 45ms/step
  • 98.9% final acc

@ephes
Copy link

ephes commented Nov 20, 2020

Run this on my MacBook Pro (16 Zoll, 2019) 2,3 GHz 8-Core Intel Core i9 AMD Radeon Pro 5500M 8GB

2020-11-20 17:42:23.136427: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-20 17:42:23.318515: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-20 17:42:24.014368: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1588 - accuracy: 0.9514/Users/jochen/projects/ds_tutorial/mac_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 57s 114ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1588 - accuracy: 0.9514 - val_loss: 0.0479 - val_accuracy: 0.9841
Epoch 2/12
469/469 [==============================] - 56s 116ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0442 - accuracy: 0.9863 - val_loss: 0.0348 - val_accuracy: 0.9880
Epoch 3/12
469/469 [==============================] - 56s 115ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0277 - accuracy: 0.9913 - val_loss: 0.0393 - val_accuracy: 0.9863
Epoch 4/12
469/469 [==============================] - 56s 115ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0189 - accuracy: 0.9940 - val_loss: 0.0387 - val_accuracy: 0.9876
Epoch 5/12
469/469 [==============================] - 56s 114ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0142 - accuracy: 0.9953 - val_loss: 0.0354 - val_accuracy: 0.9895
Epoch 6/12
469/469 [==============================] - 57s 117ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0092 - accuracy: 0.9970 - val_loss: 0.0407 - val_accuracy: 0.9881
...
real	11m31.063s
user	16m18.586s
sys	4m3.070s
  • 56s/epoch
  • 116ms/step
  • 98.96% final acc

@tranchis
Copy link

My results with a Macbook Pro M1, 16Gb of RAM:

2020-11-20 21:18:55.599180: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-20 21:18:55.599898: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-20 21:18:55.889178: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1508 - accuracy: 0.9560/Users/sergio/repos/tf-test/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1506 - accuracy: 0.9561 - val_loss: 0.0479 - val_accuracy: 0.9851
Epoch 2/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0421 - accuracy: 0.9868 - val_loss: 0.0383 - val_accuracy: 0.9870
Epoch 3/12
469/469 [==============================] - 23s 45ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0262 - accuracy: 0.9916 - val_loss: 0.0407 - val_accuracy: 0.9874
Epoch 4/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0177 - accuracy: 0.9944 - val_loss: 0.0353 - val_accuracy: 0.9868
Epoch 5/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0125 - accuracy: 0.9960 - val_loss: 0.0395 - val_accuracy: 0.9885
Epoch 6/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0094 - accuracy: 0.9971 - val_loss: 0.0393 - val_accuracy: 0.9898
Epoch 7/12
469/469 [==============================] - 23s 45ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0095 - accuracy: 0.9968 - val_loss: 0.0421 - val_accuracy: 0.9887
Epoch 8/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0066 - accuracy: 0.9978 - val_loss: 0.0437 - val_accuracy: 0.9892
Epoch 9/12
469/469 [==============================] - 25s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9982 - val_loss: 0.0437 - val_accuracy: 0.9897
Epoch 10/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9984 - val_loss: 0.0510 - val_accuracy: 0.9879
Epoch 11/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0041 - accuracy: 0.9986 - val_loss: 0.0401 - val_accuracy: 0.9912
Epoch 12/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0047 - accuracy: 0.9983 - val_loss: 0.0472 - val_accuracy: 0.9901
  • 23 s/epoch
  • 46 ms/step
  • 99.83% final acc

One thing to note is that there must be a bottleneck somewhere. I was monitoring the GPU usage in Activity Monitor and it never went above 60%.

@anna-tikhonova
Copy link
Collaborator

@Willian-Zhang Thank you for providing a reproducible test case. We will take a look.

@rnogy
Copy link

rnogy commented Nov 21, 2020

MacBook Pro ,13-inch, 2017, i5, 8GB, intel iris 640

apple compiled tensorflow

Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1539 - accuracy: 0.9537/Users/corgi/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 108s 206ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1539 - accuracy: 0.9537 - val_loss: 0.0472 - val_accuracy: 0.9849
Epoch 2/12
469/469 [==============================] - 101s 206ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0406 - accuracy: 0.9875 - val_loss: 0.0408 - val_accuracy: 0.9863
Epoch 3/12
469/469 [==============================] - 98s 201ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0261 - accuracy: 0.9922 - val_loss: 0.0427 - val_accuracy: 0.9873
Epoch 4/12
469/469 [==============================] - 100s 204ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0169 - accuracy: 0.9945 - val_loss: 0.0293 - val_accuracy: 0.9905
Epoch 5/12
469/469 [==============================] - 98s 202ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0120 - accuracy: 0.9963 - val_loss: 0.0332 - val_accuracy: 0.9902
Epoch 6/12
469/469 [==============================] - 98s 201ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0097 - accuracy: 0.9970 - val_loss: 0.0361 - val_accuracy: 0.9898
Epoch 7/12
469/469 [==============================] - 99s 203ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0088 - accuracy: 0.9971 - val_loss: 0.0409 - val_accuracy: 0.9880
Epoch 8/12
469/469 [==============================] - 99s 202ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9983 - val_loss: 0.0387 - val_accuracy: 0.9886
Epoch 9/12
469/469 [==============================] - 97s 200ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9981 - val_loss: 0.0411 - val_accuracy: 0.9888
Epoch 10/12
469/469 [==============================] - 99s 203ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9985 - val_loss: 0.0493 - val_accuracy: 0.9885
Epoch 11/12
469/469 [==============================] - 101s 206ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0047 - accuracy: 0.9983 - val_loss: 0.0446 - val_accuracy: 0.9892
Epoch 12/12
469/469 [==============================] - 100s 205ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0041 - accuracy: 0.9985 - val_loss: 0.0440 - val_accuracy: 0.9891
  • 100s /epoch
  • 186 ms/step
  • 99.85% final acc

pip version (tf 2.3.1)

Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1640 - accuracy: 0.9506WARNING:tensorflow:From /Users/corgi/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From /Users/corgi/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
469/469 [==============================] - 67s 143ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1640 - accuracy: 0.9506 - val_loss: 0.0571 - val_accuracy: 0.9810
Epoch 2/12
469/469 [==============================] - 63s 134ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0431 - accuracy: 0.9868 - val_loss: 0.0397 - val_accuracy: 0.9864
Epoch 3/12
469/469 [==============================] - 57s 122ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0266 - accuracy: 0.9916 - val_loss: 0.0361 - val_accuracy: 0.9890
Epoch 4/12
469/469 [==============================] - 57s 122ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0186 - accuracy: 0.9940 - val_loss: 0.0351 - val_accuracy: 0.9895
Epoch 5/12
469/469 [==============================] - 56s 120ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0130 - accuracy: 0.9959 - val_loss: 0.0396 - val_accuracy: 0.9886
Epoch 6/12
469/469 [==============================] - 57s 121ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0097 - accuracy: 0.9967 - val_loss: 0.0392 - val_accuracy: 0.9880
Epoch 7/12
469/469 [==============================] - 59s 125ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0083 - accuracy: 0.9970 - val_loss: 0.0376 - val_accuracy: 0.9895
Epoch 8/12
469/469 [==============================] - 59s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0071 - accuracy: 0.9978 - val_loss: 0.0423 - val_accuracy: 0.9880
Epoch 9/12
469/469 [==============================] - 56s 119ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9982 - val_loss: 0.0357 - val_accuracy: 0.9895
Epoch 10/12
469/469 [==============================] - 57s 121ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9981 - val_loss: 0.0378 - val_accuracy: 0.9902
Epoch 11/12
469/469 [==============================] - 56s 119ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0029 - accuracy: 0.9990 - val_loss: 0.0383 - val_accuracy: 0.9910
Epoch 12/12
469/469 [==============================] - 58s 124ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9985 - val_loss: 0.0435 - val_accuracy: 0.9903
  • 59 s/epoch
  • 136 ms/step
  • 99.85% final acc

tf compiled with FMA, AVX, AVX2, SSE4.1, SSE4.2 flag
Wheel from (https://github.com/lakshayg/tensorflow-build)

Epoch 1/12
469/469 [==============================] - ETA: 0s - loss: 0.1570 - accuracy: 0.95272020-11-21 01:23:29.984485: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 64s 135ms/step - loss: 0.1570 - accuracy: 0.9527 - val_loss: 0.0511 - val_accuracy: 0.9836
Epoch 2/12
469/469 [==============================] - ETA: 0s - loss: 0.0425 - accuracy: 0.98662020-11-21 01:24:41.347821: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 67s 142ms/step - loss: 0.0425 - accuracy: 0.9866 - val_loss: 0.0405 - val_accuracy: 0.9867
Epoch 3/12
469/469 [==============================] - ETA: 0s - loss: 0.0274 - accuracy: 0.99152020-11-21 01:25:55.016136: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 68s 145ms/step - loss: 0.0274 - accuracy: 0.9915 - val_loss: 0.0339 - val_accuracy: 0.9886
...
Epoch 11/12
469/469 [==============================] - ETA: 0s - loss: 0.0034 - accuracy: 0.99892020-11-21 01:34:52.652276: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
469/469 [==============================] - 64s 137ms/step - loss: 0.0034 - accuracy: 0.9989 - val_loss: 0.0429 - val_accuracy: 0.9910
Epoch 12/12
469/469 [==============================] - 61s 129ms/step - loss: 0.0034 - accuracy: 0.9988 - val_loss: 0.0515 - val_accuracy: 0.9893
  • 63 s/epoch
  • 185 ms/step
  • 99.88% final acc

It's interesting to see apple's optimized version of tensorflow is slower than the pip version. Looking at the warming

I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.2 AVX AVX2 FMA

, I think it either has to do with intel's oneapi or the support for instruction sets on x86 that leads to its performance loss. I tried using the compiled binaries that support FMA, AVX, AVX2, SSE4.1, SSE4.2 to see if it is the instruction support that leads to the performance loss, but it throws a warning (due to exhausted data; why only this run?; Batch size -> 118?). Anyhow, it'd be nice if apple provides more documentation about their own version of tf and please let me know if I am the only one who encountered that tensorflow-macos is slower than pip tensorflow (-> request for documentation / request for feature (instruction set) support?).

@dkgaraujo
Copy link

Running Apple's Mac-optimized on a 2019 16' MacBook Pro with AMD Radeon Pro 5500M:

Screenshot 2020-11-21 at 12 33 40

  • ~ 60-61s / epoch
  • ~126ms / step
  • 99.07% validation accuracy

And here is the GPU performance after the first epochs have started.

Screenshot 2020-11-21 at 12 17 38

I suspect the slack in the GPU is due to the comparatively low batch size compared to the GPU memory capacity. When I change batch_size = 500, the results are as follows:

Screenshot 2020-11-21 at 12 52 04

With the following GPU usage:

Screenshot 2020-11-21 at 12 51 50

Note that each epoch now takes 27s, less than half of the speed with batch_size=128. I think this illustrates that each combination of backend + GPU + specific data at hand has a value of batch size that will optimize speed; it's up for the analyst to find it (maybe running one-epoch only iterations to check speed at different settings).

@anhornsby
Copy link

To echo @dkgaraujo, I can run this at around 24s per epoch on a Macbook Pro 16" 2019 with Radeon Pro 5300M if I increase the batch size (e.g., batch_size = 1250). This is about 10s quicker per epoch compared to CPU and comparable to the M1 benchmarks posted above.

With low batch sizes (e.g. 128), GPU performance is comparable or slower vs CPU.

@Willian-Zhang
Copy link
Author

Willian-Zhang commented Nov 21, 2020

@anhornsby with batch_size = 1250 (Train on 48 steps, validate on 8 steps)

on MacBook Air 2020 m1 8G, I get:

  • 17s/epoch
  • 320ms/step (batch-size relevant)
  • 98.7% final acc

on Mac mini 2020 m1 16G:

  • 12s/epoch
  • 220ms/step (batch-size relevant)
  • 99.0% final acc

@robin7g
Copy link

robin7g commented Nov 21, 2020

Results on my Mac Mini 2020 m1 16G.

GPU = 22s per epoch , CPU = 17s per epoch , Any = 28s per epoch (weird!)

Best results were from commenting out the code that disables eager execution and also the code that selects GPU.. just don't set these and I get the best results.

python3 cnn.py
Epoch 1/12
2020-11-21 17:27:02.971440: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-21 17:27:02.972299: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
469/469 [==============================] - 17s 34ms/step - loss: 0.3564 - accuracy: 0.8921 - val_loss: 0.0479 - val_accuracy: 0.9834
Epoch 2/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0488 - accuracy: 0.9857 - val_loss: 0.0395 - val_accuracy: 0.9868
Epoch 3/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0270 - accuracy: 0.9917 - val_loss: 0.0383 - val_accuracy: 0.9875
Epoch 4/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0182 - accuracy: 0.9946 - val_loss: 0.0347 - val_accuracy: 0.9889
Epoch 5/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0120 - accuracy: 0.9959 - val_loss: 0.0390 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0097 - accuracy: 0.9972 - val_loss: 0.0359 - val_accuracy: 0.9891
Epoch 7/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0072 - accuracy: 0.9976 - val_loss: 0.0387 - val_accuracy: 0.9886
Epoch 8/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0341 - val_accuracy: 0.9911
Epoch 9/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0450 - val_accuracy: 0.9890
Epoch 10/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0076 - accuracy: 0.9974 - val_loss: 0.0460 - val_accuracy: 0.9882
Epoch 11/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0030 - accuracy: 0.9991 - val_loss: 0.0446 - val_accuracy: 0.9891
Epoch 12/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0518 - val_accuracy: 0.9881

Highlights are:

15s/epoch
33ms/step (original batch size)
98.8% final accuracy

@dkgaraujo
Copy link

To echo @dkgaraujo, I can run this at around 24s per epoch on a Macbook Pro 16" 2019 with Radeon Pro 5300M if I increase the batch size (e.g., batch_size = 1250). This is about 10s quicker per epoch compared to CPU and comparable to the M1 benchmarks posted above.

With low batch sizes (e.g. 128), GPU performance is comparable or slower vs CPU.

Some more results:

I ran the same code as before, but with batch_size = 2000. With the GPU I had 20s/epoch, compared to the CPU with 85s/epoch.

@anhornsby
Copy link

Results on my Mac Mini 2020 m1 16G.

GPU = 22s per epoch , CPU = 17s per epoch , Any = 28s per epoch (weird!)

Best results were from commenting out the code that disables eager execution and also the code that selects GPU.. just don't set these and I get the best results.

python3 cnn.py
Epoch 1/12
2020-11-21 17:27:02.971440: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-21 17:27:02.972299: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
469/469 [==============================] - 17s 34ms/step - loss: 0.3564 - accuracy: 0.8921 - val_loss: 0.0479 - val_accuracy: 0.9834
Epoch 2/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0488 - accuracy: 0.9857 - val_loss: 0.0395 - val_accuracy: 0.9868
Epoch 3/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0270 - accuracy: 0.9917 - val_loss: 0.0383 - val_accuracy: 0.9875
Epoch 4/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0182 - accuracy: 0.9946 - val_loss: 0.0347 - val_accuracy: 0.9889
Epoch 5/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0120 - accuracy: 0.9959 - val_loss: 0.0390 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0097 - accuracy: 0.9972 - val_loss: 0.0359 - val_accuracy: 0.9891
Epoch 7/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0072 - accuracy: 0.9976 - val_loss: 0.0387 - val_accuracy: 0.9886
Epoch 8/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0341 - val_accuracy: 0.9911
Epoch 9/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0450 - val_accuracy: 0.9890
Epoch 10/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0076 - accuracy: 0.9974 - val_loss: 0.0460 - val_accuracy: 0.9882
Epoch 11/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0030 - accuracy: 0.9991 - val_loss: 0.0446 - val_accuracy: 0.9891
Epoch 12/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0518 - val_accuracy: 0.9881

Highlights are:

15s/epoch
33ms/step (original batch size)
98.8% final accuracy

Commenting out the line that disables eager execution seems helpful. 20s per epoch with batch_size = 1500.

@danielmbradley
Copy link

danielmbradley commented Nov 22, 2020

Results on my Mac Mini 2020 m1 16G.
GPU = 22s per epoch , CPU = 17s per epoch , Any = 28s per epoch (weird!)
Best results were from commenting out the code that disables eager execution and also the code that selects GPU.. just don't set these and I get the best results.

python3 cnn.py
Epoch 1/12
2020-11-21 17:27:02.971440: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-21 17:27:02.972299: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
469/469 [==============================] - 17s 34ms/step - loss: 0.3564 - accuracy: 0.8921 - val_loss: 0.0479 - val_accuracy: 0.9834
Epoch 2/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0488 - accuracy: 0.9857 - val_loss: 0.0395 - val_accuracy: 0.9868
Epoch 3/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0270 - accuracy: 0.9917 - val_loss: 0.0383 - val_accuracy: 0.9875
Epoch 4/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0182 - accuracy: 0.9946 - val_loss: 0.0347 - val_accuracy: 0.9889
Epoch 5/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0120 - accuracy: 0.9959 - val_loss: 0.0390 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0097 - accuracy: 0.9972 - val_loss: 0.0359 - val_accuracy: 0.9891
Epoch 7/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0072 - accuracy: 0.9976 - val_loss: 0.0387 - val_accuracy: 0.9886
Epoch 8/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0341 - val_accuracy: 0.9911
Epoch 9/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0450 - val_accuracy: 0.9890
Epoch 10/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0076 - accuracy: 0.9974 - val_loss: 0.0460 - val_accuracy: 0.9882
Epoch 11/12
469/469 [==============================] - 15s 33ms/step - loss: 0.0030 - accuracy: 0.9991 - val_loss: 0.0446 - val_accuracy: 0.9891
Epoch 12/12
469/469 [==============================] - 16s 33ms/step - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0518 - val_accuracy: 0.9881

Highlights are:
15s/epoch
33ms/step (original batch size)
98.8% final accuracy

Commenting out the line that disables eager execution seems helpful. 20s per epoch with batch_size = 1500.

Interestingly, when I removed the line that disables eager execution my system just ended up hanging? Did you change anything else other than just commenting that out @anhornsby?

@anhornsby
Copy link

@danielmbradley nope, same code as above, using the recommended virtualenv

@DVS70
Copy link

DVS70 commented Nov 22, 2020

Macbook Pro M1, 16Gb of RAM
standard tf installation with venv, execution from terminal, no other significant processes running

batch size 128: 23s/epoch, 45ms/step, 98.98% final accuracy, GPU% ~ 55%
batch size 256: 15s/epoch, 59ms/steep, 99.11% final accuracy, GPU% ~65%
batch size 512: 13s/epoch, 98ms/steep, 99.01% final accuracy, GPU% ~75%
batch size 1024: 12s/epoch, 180ms/steep, 98.99% final accuracy, GPU% ~80%
batch size 1280: 12s/epoch, 227ms/steep, 98.86% final accuracy, GPU% ~83%
batch size 2048: 13s/epoch, 375ms/step, 98.76% final accuracy, GPU% ~88%
batch size 4096: 15s/epoch, 890ms/step, 98.57% final accuracy, GPU% up to 90%

@danielmbradley
Copy link

@anhornsby Interesting, there must be some difference in the way they implemented eager execution between intel Macs and M1 Macs, mine just completely falls over when that line is missing. Did find increasing the size of the batches significantly increased processing speed though (oddly though the time printed in the terminal was wrong once it hit 22 seconds)

@VictorownzuA11
Copy link

VictorownzuA11 commented Nov 24, 2020

Just for fun I wanted to try running this on a Windows 10 Laptop with a mobile 1060 (6G) and i7-7700HQ, 16GB RAM:

batch_size = 128

469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1642 - accuracy: 0.9517 - val_loss: 0.0566 - val_accuracy: 0.9817
Epoch 2/12
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0436 - accuracy: 0.9865 - val_loss: 0.0368 - val_accuracy: 0.9879
Epoch 3/12
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0281 - accuracy: 0.9908 - val_loss: 0.0357 - val_accuracy: 0.9880
Epoch 4/12
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0179 - accuracy: 0.9941 - val_loss: 0.0335 - val_accuracy: 0.9893
Epoch 5/12
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0133 - accuracy: 0.9956 - val_loss: 0.0405 - val_accuracy: 0.9878
Epoch 6/12
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0095 - accuracy: 0.9968 - val_loss: 0.0305 - val_accuracy: 0.9912
Epoch 7/12
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0077 - accuracy: 0.9973 - val_loss: 0.0373 - val_accuracy: 0.9896
Epoch 8/12
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9972 - val_loss: 0.0443 - val_accuracy: 0.9877
Epoch 9/12
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9982 - val_loss: 0.0397 - val_accuracy: 0.9894
Epoch 10/12
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0035 - accuracy: 0.9989 - val_loss: 0.0487 - val_accuracy: 0.9885
Epoch 11/12
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0044 - accuracy: 0.9985 - val_loss: 0.0502 - val_accuracy: 0.9866
Epoch 12/12
469/469 [==============================] - 5s 11ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0043 - accuracy: 0.9984 - val_loss: 0.0426 - val_accuracy: 0.9896

Highlights are:
5s/epoch
11ms/step (original batch size)
98.96% final accuracy

batch_size = 1250

48/48 [==============================] - 4s 78ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.5046 - accuracy: 0.8650 - val_loss: 0.1678 - val_accuracy: 0.9517
Epoch 2/12
48/48 [==============================] - 3s 71ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.1180 - accuracy: 0.9659 - val_loss: 0.0735 - val_accuracy: 0.9778
Epoch 3/12
48/48 [==============================] - 4s 76ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0654 - accuracy: 0.9811 - val_loss: 0.0520 - val_accuracy: 0.9828
Epoch 4/12
48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0463 - accuracy: 0.9866 - val_loss: 0.0465 - val_accuracy: 0.9847
Epoch 5/12
48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0388 - accuracy: 0.9882 - val_loss: 0.0448 - val_accuracy: 0.9852
Epoch 6/12
48/48 [==============================] - 4s 76ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0324 - accuracy: 0.9905 - val_loss: 0.0399 - val_accuracy: 0.9868
Epoch 7/12
48/48 [==============================] - 3s 71ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0257 - accuracy: 0.9924 - val_loss: 0.0373 - val_accuracy: 0.9885
Epoch 8/12
48/48 [==============================] - 4s 78ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0209 - accuracy: 0.9942 - val_loss: 0.0387 - val_accuracy: 0.9882
Epoch 9/12
48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0174 - accuracy: 0.9950 - val_loss: 0.0368 - val_accuracy: 0.9883
Epoch 10/12
48/48 [==============================] - 4s 77ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0152 - accuracy: 0.9955 - val_loss: 0.0379 - val_accuracy: 0.9887
Epoch 11/12
48/48 [==============================] - 3s 72ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0124 - accuracy: 0.9964 - val_loss: 0.0397 - val_accuracy: 0.9880
Epoch 12/12
48/48 [==============================] - 3s 71ms/step - batch: 23.5000 - size: 1.0000 - loss: 0.0096 - accuracy: 0.9974 - val_loss: 0.0394 - val_accuracy: 0.9885

Highlights are:
3-4s/epoch
71-78ms/step
98.85% final accuracy

batch_size = 4096

Highlights are:
3s/epoch
200ms/step
98.58% final accuracy

@SpaceMonkeyForever
Copy link

SpaceMonkeyForever commented Nov 24, 2020

MacBook Air 2020 M1 with 16 GB - Same as others with an M1 MacBook

2020-11-24 21:24:52.855304: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-24 21:24:52.856412: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-24 21:24:53.156975: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1565 - accuracy: 0.9534/Users/spacemonkey/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 26s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1563 - accuracy: 0.9535 - val_loss: 0.0468 - val_accuracy: 0.9847
Epoch 2/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0437 - accuracy: 0.9865 - val_loss: 0.0381 - val_accuracy: 0.9871
Epoch 3/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0277 - accuracy: 0.9912 - val_loss: 0.0390 - val_accuracy: 0.9879
Epoch 4/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0174 - accuracy: 0.9947 - val_loss: 0.0370 - val_accuracy: 0.9865
Epoch 5/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0123 - accuracy: 0.9961 - val_loss: 0.0399 - val_accuracy: 0.9873
Epoch 6/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0099 - accuracy: 0.9966 - val_loss: 0.0379 - val_accuracy: 0.9889
Epoch 7/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0086 - accuracy: 0.9971 - val_loss: 0.0417 - val_accuracy: 0.9878
Epoch 8/12
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0063 - accuracy: 0.9980 - val_loss: 0.0412 - val_accuracy: 0.9892
Epoch 9/12
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9984 - val_loss: 0.0411 - val_accuracy: 0.9904
Epoch 10/12
469/469 [==============================] - 25s 50ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9983 - val_loss: 0.0559 - val_accuracy: 0.9868
Epoch 11/12
469/469 [==============================] - 24s 49ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0037 - accuracy: 0.9988 - val_loss: 0.0417 - val_accuracy: 0.9897
Epoch 12/12
469/469 [==============================] - 25s 49ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0056 - accuracy: 0.9981 - val_loss: 0.0448 - val_accuracy: 0.9893

@SpaceMonkeyForever
Copy link

SpaceMonkeyForever commented Nov 25, 2020

Windows GeForce GTX 1080TI, intel 5820k using tensorflow-gpu version 2.3.1

I had to comment out these lines:

from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')

Results:
Batch size = 128
2s/Batch
5ms/step
val_accuracy: 0.9870

Log:

2020-11-25 00:22:53.068167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 00:22:53.076410: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 00:22:53.093385: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:53.110747: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 00:22:53.119427: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 00:22:53.139974: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 00:22:53.149363: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 00:22:53.185160: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:53.188810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 00:22:53.192451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 00:22:53.202681: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 00:22:53.208791: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:53.212913: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 00:22:53.218933: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 00:22:53.223650: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 00:22:53.229800: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 00:22:53.233966: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:53.239978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 00:22:53.907505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 00:22:53.911316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2020-11-25 00:22:53.914206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2020-11-25 00:22:53.919438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2020-11-25 00:22:53.930196: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14efa539c30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-25 00:22:53.935222: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-11-25 00:22:54.154381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 00:22:54.162717: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 00:22:54.169072: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:54.173108: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 00:22:54.179071: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 00:22:54.183115: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 00:22:54.189077: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 00:22:54.193209: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:54.199286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 00:22:54.202657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 00:22:54.208740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2020-11-25 00:22:54.211523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2020-11-25 00:22:54.214418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
Train on 469 steps, validate on 79 steps
Epoch 1/12
2020-11-25 00:22:55.528873: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 00:22:57.078128: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 00:22:57.840244: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1612 - accuracy: 0.9516WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1612 - accuracy: 0.9516 - val_loss: 0.0503 - val_accuracy: 0.9850
Epoch 2/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9866 - val_loss: 0.0382 - val_accuracy: 0.9880
Epoch 3/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0295 - accuracy: 0.9905 - val_loss: 0.0416 - val_accuracy: 0.9851
Epoch 4/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0205 - accuracy: 0.9932 - val_loss: 0.0342 - val_accuracy: 0.9889
Epoch 5/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0138 - accuracy: 0.9955 - val_loss: 0.0373 - val_accuracy: 0.9885
Epoch 6/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0103 - accuracy: 0.9967 - val_loss: 0.0395 - val_accuracy: 0.9881
Epoch 7/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0087 - accuracy: 0.9970 - val_loss: 0.0372 - val_accuracy: 0.9887
Epoch 8/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0072 - accuracy: 0.9977 - val_loss: 0.0389 - val_accuracy: 0.9897
Epoch 9/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0066 - accuracy: 0.9980 - val_loss: 0.0419 - val_accuracy: 0.9895
Epoch 10/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9984 - val_loss: 0.0439 - val_accuracy: 0.9891
Epoch 11/12
469/469 [==============================] - 2s 5ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9989 - val_loss: 0.0503 - val_accuracy: 0.9889
Epoch 12/12
469/469 [==============================] - 2s 4ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0044 - accuracy: 0.9984 - val_loss: 0.0605 - val_accuracy: 0.9870

I ran again with batch size = 512 since I have a lot of memory on this GPU.

Results:
1s/Batch
12ms/step
val_accuracy: 0.9905

Log:

2020-11-25 10:32:51.405457: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:54.188682: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-11-25 10:32:54.219663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 10:32:54.219952: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:54.224344: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:54.228633: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 10:32:54.230251: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 10:32:54.235019: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 10:32:54.237543: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 10:32:54.247529: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:54.247745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 10:32:54.248102: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-25 10:32:54.257534: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x17097fc7d50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-25 10:32:54.257730: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-11-25 10:32:54.258020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 10:32:54.258307: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:54.258451: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:54.258593: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 10:32:54.258735: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 10:32:54.258879: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 10:32:54.259021: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 10:32:54.259161: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:54.259371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 10:32:54.873392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 10:32:54.873552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-11-25 10:32:54.873646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-11-25 10:32:54.873964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2020-11-25 10:32:54.876885: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x170bb22d130 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-25 10:32:54.877077: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-11-25 10:32:55.083716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-25 10:32:55.084009: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-25 10:32:55.084150: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:55.084287: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 10:32:55.084425: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 10:32:55.084567: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-25 10:32:55.084708: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-25 10:32:55.084846: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:55.085016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 10:32:55.085171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 10:32:55.085319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-11-25 10:32:55.085408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-11-25 10:32:55.085599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8678 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
Train on 118 steps, validate on 20 steps
Epoch 1/12
2020-11-25 10:32:56.377688: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-25 10:32:57.797291: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-25 10:32:58.516367: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
118/118 [==============================] - ETA: 0s - batch: 58.5000 - size: 1.0000 - loss: 0.3163 - accuracy: 0.9059WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From D:\Code\CondaEnvs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
118/118 [==============================] - 2s 14ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.3163 - accuracy: 0.9059 - val_loss: 0.0891 - val_accuracy: 0.9738
Epoch 2/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0695 - accuracy: 0.9798 - val_loss: 0.0621 - val_accuracy: 0.9800
Epoch 3/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0455 - accuracy: 0.9863 - val_loss: 0.0431 - val_accuracy: 0.9861
Epoch 4/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0331 - accuracy: 0.9897 - val_loss: 0.0386 - val_accuracy: 0.9876
Epoch 5/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0259 - accuracy: 0.9922 - val_loss: 0.0322 - val_accuracy: 0.9890
Epoch 6/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0203 - accuracy: 0.9937 - val_loss: 0.0329 - val_accuracy: 0.9895
Epoch 7/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0165 - accuracy: 0.9952 - val_loss: 0.0364 - val_accuracy: 0.9880
Epoch 8/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0126 - accuracy: 0.9960 - val_loss: 0.0303 - val_accuracy: 0.9909
Epoch 9/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0089 - accuracy: 0.9976 - val_loss: 0.0364 - val_accuracy: 0.9893
Epoch 10/12
118/118 [==============================] - 1s 13ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0082 - accuracy: 0.9976 - val_loss: 0.0357 - val_accuracy: 0.9900
Epoch 11/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9987 - val_loss: 0.0395 - val_accuracy: 0.9892
Epoch 12/12
118/118 [==============================] - 1s 12ms/step - batch: 58.5000 - size: 1.0000 - loss: 0.0034 - accuracy: 0.9993 - val_loss: 0.0377 - val_accuracy: 0.9905

@SpaceMonkeyForever
Copy link

Question please:

Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.

@lightb0x
Copy link

tested on ubuntu 20.04.1, rtx 3070, tensorflow container 20.11-tf2-py3

batch s / epoch ms / step acc. gpu-util (%)
128 2 3-4 0.9884 73-75
256 1 5-6 0.9881 82
512 1 10-11 0.9881 87
1024 1 19-20 0.9889 92
1280 1 24-30 0.9880 94
2048 1 37-40 0.9883 95
4096 9->1 620->65 0.9872 97

batch size=4096 took longer on first 3 epochs, taking 9, 5, 2 seconds per each epoch (620, 363, 90 ms per step per each epoch)

@sidagrawal
Copy link

Tested on a MacBook Pro (13-inch, M1, 2020) with 8 GB RAM

Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1600 - accuracy: 0.9523/Users/sidagrawal/MachineLearning/env/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 106s 220ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1600 - accuracy: 0.9523 - val_loss: 0.0538 - val_accuracy: 0.9827
Epoch 2/12
469/469 [==============================] - 104s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9863 - val_loss: 0.0388 - val_accuracy: 0.9874
Epoch 3/12
469/469 [==============================] - 103s 217ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0271 - accuracy: 0.9917 - val_loss: 0.0362 - val_accuracy: 0.9879
Epoch 4/12
469/469 [==============================] - 104s 218ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0170 - accuracy: 0.9950 - val_loss: 0.0300 - val_accuracy: 0.9897
Epoch 5/12
469/469 [==============================] - 104s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0133 - accuracy: 0.9959 - val_loss: 0.0369 - val_accuracy: 0.9892
Epoch 6/12
469/469 [==============================] - 104s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0089 - accuracy: 0.9971 - val_loss: 0.0393 - val_accuracy: 0.9890
Epoch 7/12
469/469 [==============================] - 105s 219ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0068 - accuracy: 0.9977 - val_loss: 0.0474 - val_accuracy: 0.9867
Epoch 8/12
469/469 [==============================] - 105s 221ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0070 - accuracy: 0.9977 - val_loss: 0.0374 - val_accuracy: 0.9896
Epoch 9/12
469/469 [==============================] - 104s 218ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9983 - val_loss: 0.0376 - val_accuracy: 0.9898
Epoch 10/12
469/469 [==============================] - 103s 216ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9983 - val_loss: 0.0493 - val_accuracy: 0.9888
Epoch 11/12
469/469 [==============================] - 104s 218ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9985 - val_loss: 0.0389 - val_accuracy: 0.9896
Epoch 12/12
469/469 [==============================] - 105s 220ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9990 - val_loss: 0.0424 - val_accuracy: 0.9904

No changes to the script:
105 s/epoch
220ms/step
99.04% final acc

Not sure why my numbers aren't comparable to the other M1 numbers.

@astrowonk
Copy link

astrowonk commented Nov 26, 2020

tested on ubuntu 20.04.1, rtx 3070, tensorflow container 20.11-tf2-py3
batch s / epoch ms / step acc. gpu-util (%)
128 2 3-4 0.9884 73-75
256 1 5-6 0.9881 82
512 1 10-11 0.9881 87
1024 1 19-20 0.9889 92
1280 1 24-30 0.9880 94
2048 1 37-40 0.9883 95
4096 9->1 620->65 0.9872 97

batch size=4096 took longer on first 3 epochs, taking 9, 5, 2 seconds per each epoch (620, 363, 90 ms per step per each epoch)

3070 running Tensorflow, how did you do it? I thought you needed CUDA 11 on a 3070 and that there were problems with CUDA 11 and the nightly. I guess the difference is Windows vs Ubuntu.

One thing I hope is that with support for Apple ComputeML, as the M line of apple chips evolves, this fork "just works" with faster/better Apple Silicon going forward, rather than needing an endless series of patches/etc. The CUDA/CUDNN install dance on Windows never fails to thwart me.

@astrowonk
Copy link

Question please:

Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.

I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.

@SpaceMonkeyForever
Copy link

Question please:
Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.

I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.

Oh, I didn't think of that. Do you have any source on this?

@danielmbradley
Copy link

Question please:
Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.

I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.

Oh, I didn't think of that. Do you have any source on this?

I'm not sure how true that is? I've never had any issue with speed when making predictions on non-ML specific hardware, it's always been the training that's been slow

@astrowonk
Copy link

astrowonk commented Nov 26, 2020

Question please:
Is there a way to get this script to use the "Neural Engine" i.e. dedicated ML hardware instead? That is way more interesting to benchmark.

I believe the Neural Engine is designed to accelerate trained CoreML models inference/prediction, as far as I can tell it's not used in training? There doesn't seem to be any API to use it other than CoreML.

Oh, I didn't think of that. Do you have any source on this?

I'm not sure how true that is? I've never had any issue with speed when making predictions on non-ML specific hardware, it's always been the training that's been slow

Information isn't great on the Neural Engine. CoreML is definitely a way to run trained models on device. This repo talks about what we know about the neural engine.

The impressive speedup of the super resolution scaling in Pixelmator 2 cites the Neural Engine as helping on M1 Macs

It's notable that the writeup on this branch of Tensorflow talks about using ML Compute to enhance the training speed by using the CPU and GPU, but doesn't mention the Neural Engine itself. It would be great if we could use it to train! Perhaps that's coming some day?

@BrentOeyen-CA
Copy link

Macbook pro RAM 16 GB, HD 500 GB, same script but no disable eager execution

Epoch 1/12
2020-11-27 00:02:50.544598: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-27 00:02:50.545510: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
469/469 [==============================] - 18s 35ms/step - loss: 0.3663 - accuracy: 0.8887 - val_loss: 0.0470 - val_accuracy: 0.9846
Epoch 2/12
469/469 [==============================] - 16s 34ms/step - loss: 0.0449 - accuracy: 0.9865 - val_loss: 0.0438 - val_accuracy: 0.9844
Epoch 3/12
469/469 [==============================] - 16s 34ms/step - loss: 0.0281 - accuracy: 0.9907 - val_loss: 0.0314 - val_accuracy: 0.9885
Epoch 4/12
469/469 [==============================] - 16s 34ms/step - loss: 0.0177 - accuracy: 0.9949 - val_loss: 0.0361 - val_accuracy: 0.9884
Epoch 5/12
469/469 [==============================] - 16s 34ms/step - loss: 0.0108 - accuracy: 0.9965 - val_loss: 0.0310 - val_accuracy: 0.9903
Epoch 6/12
469/469 [==============================] - 16s 34ms/step - loss: 0.0081 - accuracy: 0.9976 - val_loss: 0.0311 - val_accuracy: 0.9905
Epoch 7/12
469/469 [==============================] - 16s 34ms/step - loss: 0.0069 - accuracy: 0.9977 - val_loss: 0.0441 - val_accuracy: 0.9880
Epoch 8/12
469/469 [==============================] - 16s 34ms/step - loss: 0.0051 - accuracy: 0.9982 - val_loss: 0.0352 - val_accuracy: 0.9902
Epoch 9/12
469/469 [==============================] - 16s 34ms/step - loss: 0.0056 - accuracy: 0.9981 - val_loss: 0.0371 - val_accuracy: 0.9901
Epoch 10/12
469/469 [==============================] - 16s 34ms/step - loss: 0.0035 - accuracy: 0.9987 - val_loss: 0.0349 - val_accuracy: 0.9905
Epoch 11/12
469/469 [==============================] - 16s 34ms/step - loss: 0.0035 - accuracy: 0.9990 - val_loss: 0.0381 - val_accuracy: 0.9895
Epoch 12/12
469/469 [==============================] - 16s 34ms/step - loss: 0.0044 - accuracy: 0.9987 - val_loss: 0.0401 - val_accuracy: 0.9901

@lightb0x
Copy link

tested on ubuntu 20.04.1, rtx 3070, tensorflow container 20.11-tf2-py3

batch s / epoch ms / step acc. gpu-util (%)

128 2 3-4 0.9884 73-75

256 1 5-6 0.9881 82

512 1 10-11 0.9881 87

1024 1 19-20 0.9889 92

1280 1 24-30 0.9880 94

2048 1 37-40 0.9883 95

4096 9->1 620->65 0.9872 97

batch size=4096 took longer on first 3 epochs, taking 9, 5, 2 seconds per each epoch (620, 363, 90 ms per step per each epoch)

3070 running Tensorflow, how did you do it? I thought you needed CUDA 11 on a 3070 and that there were problems with CUDA 11 and the nightly. I guess the difference is Windows vs Ubuntu.

Just install CUDA 11.1 compatible driver(455 for now) and use aforementioned container.
Container takes care of troublesome dependency problems. Check this for detail.

@Shakshi3104
Copy link

Tested on MacBook Air (13-inch, Early 2015, 1.6GHz Intel Core i5, Intel HD Graphics 6000) with 8GB RAM

Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9540/Users/user/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 222s 461ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9540 - val_loss: 0.0448 - val_accuracy: 0.9861
Epoch 2/12
469/469 [==============================] - 231s 482ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0439 - accuracy: 0.9866 - val_loss: 0.0357 - val_accuracy: 0.9876
Epoch 3/12
469/469 [==============================] - 241s 503ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0265 - accuracy: 0.9915 - val_loss: 0.0342 - val_accuracy: 0.9890
Epoch 4/12
469/469 [==============================] - 277s 576ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0191 - accuracy: 0.9942 - val_loss: 0.0307 - val_accuracy: 0.9893
Epoch 5/12
469/469 [==============================] - 248s 512ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0117 - accuracy: 0.9964 - val_loss: 0.0329 - val_accuracy: 0.9897
Epoch 6/12
469/469 [==============================] - 230s 478ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0107 - accuracy: 0.9966 - val_loss: 0.0353 - val_accuracy: 0.9888
Epoch 7/12
469/469 [==============================] - 232s 482ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9973 - val_loss: 0.0533 - val_accuracy: 0.9864
Epoch 8/12
469/469 [==============================] - 268s 561ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0061 - accuracy: 0.9979 - val_loss: 0.0429 - val_accuracy: 0.9885
Epoch 9/12
469/469 [==============================] - 235s 485ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0053 - accuracy: 0.9982 - val_loss: 0.0363 - val_accuracy: 0.9899
Epoch 10/12
469/469 [==============================] - 253s 528ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0053 - accuracy: 0.9982 - val_loss: 0.0348 - val_accuracy: 0.9909
Epoch 11/12
469/469 [==============================] - 248s 507ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9984 - val_loss: 0.0405 - val_accuracy: 0.9905
Epoch 12/12
469/469 [==============================] - 248s 515ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0123 - accuracy: 0.9960 - val_loss: 0.0381 - val_accuracy: 0.9886

No changes to the script:
248 s/epoch
515ms/step
98.86% final acc

@ismaproco
Copy link

MacBook Air 2020 M1 with 8 GB- Connected to Power - No difference really to the others with M1

2020-11-27 15:16:01.395210: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-11-27 15:16:01.398078: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
2020-11-27 15:16:01.702008: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1600 - accuracy: 0.9520/Users/savathos/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 25s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1598 - accuracy: 0.9520 - val_loss: 0.0498 - val_accuracy: 0.9834
Epoch 2/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0424 - accuracy: 0.9868 - val_loss: 0.0392 - val_accuracy: 0.9868
Epoch 3/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0270 - accuracy: 0.9918 - val_loss: 0.0382 - val_accuracy: 0.9872
Epoch 4/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0177 - accuracy: 0.9944 - val_loss: 0.0397 - val_accuracy: 0.9879
Epoch 5/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0116 - accuracy: 0.9962 - val_loss: 0.0449 - val_accuracy: 0.9870
Epoch 6/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0101 - accuracy: 0.9968 - val_loss: 0.0383 - val_accuracy: 0.9885
Epoch 7/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0068 - accuracy: 0.9979 - val_loss: 0.0441 - val_accuracy: 0.9865
Epoch 8/12
469/469 [==============================] - 23s 46ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0073 - accuracy: 0.9976 - val_loss: 0.0529 - val_accuracy: 0.9869
Epoch 9/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9980 - val_loss: 0.0451 - val_accuracy: 0.9884
Epoch 10/12
469/469 [==============================] - 24s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0042 - accuracy: 0.9987 - val_loss: 0.0542 - val_accuracy: 0.9874
Epoch 11/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9984 - val_loss: 0.0505 - val_accuracy: 0.9877
Epoch 12/12
469/469 [==============================] - 23s 47ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0035 - accuracy: 0.9989 - val_loss: 0.0492 - val_accuracy: 0.9871

No changes to the script:
24 s/epoch
47ms/step
99.89% accuracy

@kennyfrc
Copy link

kennyfrc commented Nov 28, 2020

Device: MacBook Pro (13-inch, 2019), 2.4 GHz Quad-Core Intel Core i5, 8GB RAM, Radeon RX 5700 XT 8 GB

Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - 63s 128ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1607 - accuracy: 0.9530 - val_loss: 0.0528 - val_accuracy: 0.9827
Epoch 2/12
469/469 [==============================] - 62s 129ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0439 - accuracy: 0.9863 - val_loss: 0.0375 - val_accuracy: 0.9874
Epoch 3/12
469/469 [==============================] - 62s 129ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0257 - accuracy: 0.9917 - val_loss: 0.0369 - val_accuracy: 0.9881
Epoch 4/12
469/469 [==============================] - 61s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0188 - accuracy: 0.9937 - val_loss: 0.0327 - val_accuracy: 0.9899
Epoch 5/12
469/469 [==============================] - 61s 127ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0116 - accuracy: 0.9964 - val_loss: 0.0441 - val_accuracy: 0.9864
Epoch 6/12
469/469 [==============================] - 61s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0092 - accuracy: 0.9970 - val_loss: 0.0341 - val_accuracy: 0.9903
Epoch 7/12
469/469 [==============================] - 61s 125ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0078 - accuracy: 0.9973 - val_loss: 0.0338 - val_accuracy: 0.9897
Epoch 8/12
469/469 [==============================] - 61s 127ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0065 - accuracy: 0.9979 - val_loss: 0.0392 - val_accuracy: 0.9888
Epoch 9/12
469/469 [==============================] - 61s 125ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9980 - val_loss: 0.0404 - val_accuracy: 0.9895
Epoch 10/12
469/469 [==============================] - 61s 126ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9985 - val_loss: 0.0464 - val_accuracy: 0.9887
Epoch 11/12
469/469 [==============================] - 61s 127ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0047 - accuracy: 0.9986 - val_loss: 0.0473 - val_accuracy: 0.9890
Epoch 12/12
469/469 [==============================] - 63s 128ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0032 - accuracy: 0.9988 - val_loss: 0.0453 - val_accuracy: 0.9897

Summary:

  • 64s/epoch
  • 129ms/step
  • 98.9% final accuracy

@ismaproco
Copy link

Desktop Ryzen 2400g, 16GB, Windows (Conda) Worth the try

2020-11-27 16:43:00.273670: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
Epoch 1/12
    469/Unknown - 62s 132ms/step - loss: 0.1622 - accuracy: 0.95162020-11-27 16:44:06.022659: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
2020-11-27 16:44:09.471968: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 66s 140ms/step - loss: 0.1622 - accuracy: 0.9516 - val_loss: 0.0600 - val_accuracy: 0.9799
Epoch 2/12
468/469 [============================>.] - ETA: 0s - loss: 0.0428 - accuracy: 0.98692020-11-27 16:45:17.461835: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 68s 145ms/step - loss: 0.0429 - accuracy: 0.9869 - val_loss: 0.0379 - val_accuracy: 0.9882
Epoch 3/12
468/469 [============================>.] - ETA: 0s - loss: 0.0276 - accuracy: 0.99152020-11-27 16:46:21.553304: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0277 - accuracy: 0.9915 - val_loss: 0.0349 - val_accuracy: 0.9882
Epoch 4/12
468/469 [============================>.] - ETA: 0s - loss: 0.0183 - accuracy: 0.99452020-11-27 16:47:25.641510: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0183 - accuracy: 0.9945 - val_loss: 0.0359 - val_accuracy: 0.9894
Epoch 5/12
468/469 [============================>.] - ETA: 0s - loss: 0.0146 - accuracy: 0.99512020-11-27 16:48:29.695354: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0146 - accuracy: 0.9951 - val_loss: 0.0367 - val_accuracy: 0.9890
Epoch 6/12
468/469 [============================>.] - ETA: 0s - loss: 0.0089 - accuracy: 0.99702020-11-27 16:49:33.919164: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0088 - accuracy: 0.9970 - val_loss: 0.0360 - val_accuracy: 0.9895
Epoch 7/12
468/469 [============================>.] - ETA: 0s - loss: 0.0084 - accuracy: 0.99752020-11-27 16:50:38.218212: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0084 - accuracy: 0.9975 - val_loss: 0.0499 - val_accuracy: 0.9873
Epoch 8/12
468/469 [============================>.] - ETA: 0s - loss: 0.0066 - accuracy: 0.99792020-11-27 16:51:42.458833: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0066 - accuracy: 0.9979 - val_loss: 0.0402 - val_accuracy: 0.9896
Epoch 9/12
468/469 [============================>.] - ETA: 0s - loss: 0.0067 - accuracy: 0.99762020-11-27 16:52:46.661109: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 137ms/step - loss: 0.0067 - accuracy: 0.9976 - val_loss: 0.0412 - val_accuracy: 0.9893
Epoch 10/12
468/469 [============================>.] - ETA: 0s - loss: 0.0041 - accuracy: 0.99872020-11-27 16:53:52.020888: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 65s 139ms/step - loss: 0.0041 - accuracy: 0.9987 - val_loss: 0.0374 - val_accuracy: 0.9901
Epoch 11/12
468/469 [============================>.] - ETA: 0s - loss: 0.0034 - accuracy: 0.99892020-11-27 16:54:55.984763: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 136ms/step - loss: 0.0034 - accuracy: 0.9989 - val_loss: 0.0458 - val_accuracy: 0.9904
Epoch 12/12
468/469 [============================>.] - ETA: 0s - loss: 0.0035 - accuracy: 0.99892020-11-27 16:55:59.786269: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
         [[{{node IteratorGetNext}}]]
469/469 [==============================] - 64s 136ms/step - loss: 0.0035 - accuracy: 0.9989 - val_loss: 0.0515 - val_accuracy: 0.9876
  • 137s/epoch
  • 136ms/step
  • 99.89 accuracy

@dmmajithia
Copy link

Device: Mac Pro Late 2013 (3.7 GHz Quad-Core Intel Xeon E5, 2x AMD FirePro D300 2 GB, 64GB).
Looks like neither of the GPUs are being used here - Max GPU utilization is ~6% and CPU Idle is ~60%.

Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9538
469/469 [==============================] - 170s 355ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9538 - val_loss: 0.0556 - val_accuracy: 0.9806
Epoch 2/12
469/469 [==============================] - 172s 361ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0437 - accuracy: 0.9866 - val_loss: 0.0365 - val_accuracy: 0.9881
Epoch 3/12
469/469 [==============================] - 185s 389ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0269 - accuracy: 0.9916 - val_loss: 0.0356 - val_accuracy: 0.9887
Epoch 4/12
469/469 [==============================] - 182s 383ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0177 - accuracy: 0.9946 - val_loss: 0.0375 - val_accuracy: 0.9885
Epoch 5/12
469/469 [==============================] - 171s 359ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0131 - accuracy: 0.9959 - val_loss: 0.0405 - val_accuracy: 0.9883
Epoch 6/12
469/469 [==============================] - 171s 358ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0101 - accuracy: 0.9968 - val_loss: 0.0355 - val_accuracy: 0.9899
Epoch 7/12
469/469 [==============================] - 171s 358ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0071 - accuracy: 0.9977 - val_loss: 0.0387 - val_accuracy: 0.9892
Epoch 8/12
469/469 [==============================] - 170s 355ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0061 - accuracy: 0.9981 - val_loss: 0.0394 - val_accuracy: 0.9897
Epoch 9/12
469/469 [==============================] - 172s 361ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9981 - val_loss: 0.0404 - val_accuracy: 0.9902
Epoch 10/12
469/469 [==============================] - 169s 354ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9982 - val_loss: 0.0481 - val_accuracy: 0.9882
Epoch 11/12
469/469 [==============================] - 169s 354ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0054 - accuracy: 0.9980 - val_loss: 0.0403 - val_accuracy: 0.9892
Epoch 12/12
469/469 [==============================] - 166s 348ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0036 - accuracy: 0.9990 - val_loss: 0.0532 - val_accuracy: 0.9883

  • 185s/epoch
  • 389ms/step
  • 98.83 accuracy

In the screenshot below, GPU slot 2 is connected to the display and slot 1 is the spare.

Screen Shot 2020-11-28 at 9 25 17 PM

Surprisingly, when I ran the code from issue #39 it switched to using the idle GPU with ~80% utilization.
Seems like set_mlc_device ignores my GPU recommendation when model size is small.

@rizky
Copy link

rizky commented Dec 12, 2020

Mac Pro Late 2013 (3,5 GHz 6-Core Intel Xeon E5, 2x AMD FirePro D500 3 GB, 32GB).

Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1605 - accuracy: 0.9520
469/469 [==============================] - 44s 78ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1605 - accuracy: 0.9520 - val_loss: 0.0501 - val_accuracy: 0.9839
Epoch 2/12
469/469 [==============================] - 39s 77ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0460 - accuracy: 0.9859 - val_loss: 0.0373 - val_accuracy: 0.9880
Epoch 3/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0270 - accuracy: 0.9919 - val_loss: 0.0383 - val_accuracy: 0.9866
Epoch 4/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0198 - accuracy: 0.9937 - val_loss: 0.0334 - val_accuracy: 0.9896
Epoch 5/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0138 - accuracy: 0.9955 - val_loss: 0.0409 - val_accuracy: 0.9876
Epoch 6/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0107 - accuracy: 0.9965 - val_loss: 0.0381 - val_accuracy: 0.9886
Epoch 7/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0090 - accuracy: 0.9970 - val_loss: 0.0408 - val_accuracy: 0.9883
Epoch 8/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0062 - accuracy: 0.9979 - val_loss: 0.0363 - val_accuracy: 0.9896
Epoch 9/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0062 - accuracy: 0.9979 - val_loss: 0.0385 - val_accuracy: 0.9908
Epoch 10/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9986 - val_loss: 0.0523 - val_accuracy: 0.9885
Epoch 11/12
469/469 [==============================] - 39s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0048 - accuracy: 0.9983 - val_loss: 0.0537 - val_accuracy: 0.9876
Epoch 12/12
469/469 [==============================] - 38s 76ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0050 - accuracy: 0.9983 - val_loss: 0.0439 - val_accuracy: 0.9893

@Aatiya25
Copy link

Aatiya25 commented Dec 23, 2020

Can anyone explain how to install TensorFlow on MacBook m1 2020. I am getting error: zsh: illegal hardware instruction python under virtual environment(tensorflow_macos_venv) when I try to import TensorFlow.
I am using terminal without Rosette 2.

@mrdbourke
Copy link

mrdbourke commented Dec 24, 2020

Thank you @Willian-Zhang for creating this!

I used it (code unchanged from above) to benchmark a few of my Macs + a GPU-powered Google Colab instance:

MacBook Air (M1) MacBook Pro 13-inch (M1) MacBook Pro 16-inch (Intel) Google Colab T4 GPU^
tensorflow_macos benchmark 23-24s/epoch 25-26s/epoch 20-21s/epoch 9s/epoch

Specs:

MacBook Air (M1) MacBook Pro 13-inch (M1) MacBook Pro 16-inch (Intel)
CPU 8-core M1 8-core M1 2.4GHz 8-core Intel Core i9
GPU 7-core M1 8-core M1 AMD Radeon Pro 5500M with 8GB of GDDR6 memory
Neural engine 16-core M1 16-core M1 NA
Memory (RAM) 16GB 16GB 64GB
Storage 256GB 512GB 2TB

Very interesting to see the M1 MacBook Air performing on-par/better than the M1 MacBook Pro.

The 16-inch I used is almost top-spec too (barely a year old)... incredible how performant Apple's new M1 chip is.

I also did a few more tests on each machine, namely:

  1. Final Cut Pro video export
  2. CreateML machine learning model training
  3. TensorFlow macOS code (Basic CNN, Transfer Learning, the benchmark test above)

See the results from the above on my blog. I also made a video running through each of them on YouTube.

@2black0
Copy link

2black0 commented Dec 26, 2020

i5-8400T
16GB 2400Mhz

just disable 2 this line cause dont have gpu

#from tensorflow.python.compiler.mlcompute import mlcompute
#mlcompute.set_mlc_device(device_name='gpu')

Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1616 - accuracy: 0.9532/Users/thinkmac/opt/miniconda3/envs/tf-test/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 40s 81ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1616 - accuracy: 0.9532 - val_loss: 0.0551 - val_accuracy: 0.9816
Epoch 2/12
469/469 [==============================] - 40s 82ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0440 - accuracy: 0.9864 - val_loss: 0.0459 - val_accuracy: 0.9848
Epoch 3/12
469/469 [==============================] - 37s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0280 - accuracy: 0.9909 - val_loss: 0.0359 - val_accuracy: 0.9890
Epoch 4/12
469/469 [==============================] - 37s 75ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0198 - accuracy: 0.9937 - val_loss: 0.0332 - val_accuracy: 0.9894
Epoch 5/12
469/469 [==============================] - 36s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0132 - accuracy: 0.9958 - val_loss: 0.0427 - val_accuracy: 0.9872
Epoch 6/12
469/469 [==============================] - 36s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0102 - accuracy: 0.9969 - val_loss: 0.0420 - val_accuracy: 0.9877
Epoch 7/12
469/469 [==============================] - 37s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0077 - accuracy: 0.9974 - val_loss: 0.0525 - val_accuracy: 0.9843
Epoch 8/12
469/469 [==============================] - 38s 75ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0071 - accuracy: 0.9975 - val_loss: 0.0381 - val_accuracy: 0.9896
Epoch 9/12
469/469 [==============================] - 36s 74ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0046 - accuracy: 0.9985 - val_loss: 0.0438 - val_accuracy: 0.9879
Epoch 10/12
469/469 [==============================] - 36s 73ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0070 - accuracy: 0.9975 - val_loss: 0.0470 - val_accuracy: 0.9880
Epoch 11/12
469/469 [==============================] - 36s 73ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0039 - accuracy: 0.9986 - val_loss: 0.0423 - val_accuracy: 0.9896
Epoch 12/12
469/469 [==============================] - 36s 73ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0041 - accuracy: 0.9987 - val_loss: 0.0349 - val_accuracy: 0.9914

Result:

  • 40s/epoch
  • 82ms/step
  • 99.14% Accuracy

@igaspard
Copy link

igaspard commented Dec 28, 2020

MacBook Pro (16-inch, 2019)
CPU: 2.3 GHz 8-Core Intel Core i9
GPU: AMD Radeon Pro 5500M 4 GB

2020-12-28 17:50:35.421277: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-28 17:50:35.544447: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-12-28 17:50:36.201512: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1651 - accuracy: 0.9515/Users/gaspardshen/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 20s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1651 - accuracy: 0.9515 - val_loss: 0.0520 - val_accuracy: 0.9835
Epoch 2/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0436 - accuracy: 0.9864 - val_loss: 0.0337 - val_accuracy: 0.9889
Epoch 3/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0275 - accuracy: 0.9918 - val_loss: 0.0360 - val_accuracy: 0.9877
Epoch 4/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0190 - accuracy: 0.9940 - val_loss: 0.0364 - val_accuracy: 0.9885
Epoch 5/12
469/469 [==============================] - 19s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0132 - accuracy: 0.9957 - val_loss: 0.0422 - val_accuracy: 0.9864
Epoch 6/12
469/469 [==============================] - 20s 38ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0101 - accuracy: 0.9965 - val_loss: 0.0375 - val_accuracy: 0.9892
Epoch 7/12
469/469 [==============================] - 21s 39ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0081 - accuracy: 0.9973 - val_loss: 0.0405 - val_accuracy: 0.9895
Epoch 8/12
469/469 [==============================] - 21s 39ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0077 - accuracy: 0.9976 - val_loss: 0.0397 - val_accuracy: 0.9889
Epoch 9/12
469/469 [==============================] - 21s 39ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9984 - val_loss: 0.0492 - val_accuracy: 0.9872
Epoch 10/12
469/469 [==============================] - 20s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0069 - accuracy: 0.9975 - val_loss: 0.0365 - val_accuracy: 0.9894
Epoch 11/12
469/469 [==============================] - 20s 38ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0045 - accuracy: 0.9985 - val_loss: 0.0374 - val_accuracy: 0.9907
Epoch 12/12
469/469 [==============================] - 20s 37ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0026 - accuracy: 0.9992 - val_loss: 0.0390 - val_accuracy: 0.9909
python cnn_benchmark.py  268.30s user 212.62s system 194% cpu 4:07.11 total

Results:

  • 20 s/epoch
  • 38 ms/step
  • 99.09% Accuracy

@ma010
Copy link

ma010 commented Dec 29, 2020

Tested on a 2.2 GHz Quad-Core Intel Core i7, Intel Iris Pro Graphics, 2014 15-inch MacBook Pro. I also observed that the mac-optimized version seems slower than the non-optimized version. (similar to the results of @rnogy )
MacOS optimized Tensorflow, I put mlcompute.set_mlc_device(device_name='any'). I had to comment out disable_eager_execution(), otherwise I would get an error segmentation fault. Results

  • 78s /epoch
  • 170s /step
  • 99.92% final acc
2020-12-29 16:49:46.272987: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/12
2020-12-29 16:49:46.886017: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
469/469 [==============================] - 79s 160ms/step - loss: 0.3364 - accuracy: 0.8962 - val_loss: 0.0550 - val_accuracy: 0.9823
Epoch 2/12
469/469 [==============================] - 73s 156ms/step - loss: 0.0420 - accuracy: 0.9871 - val_loss: 0.0393 - val_accuracy: 0.9865
Epoch 3/12
469/469 [==============================] - 78s 166ms/step - loss: 0.0239 - accuracy: 0.9934 - val_loss: 0.0320 - val_accuracy: 0.9896
Epoch 4/12
469/469 [==============================] - 80s 170ms/step - loss: 0.0172 - accuracy: 0.9945 - val_loss: 0.0423 - val_accuracy: 0.9871
Epoch 5/12
469/469 [==============================] - 75s 160ms/step - loss: 0.0112 - accuracy: 0.9967 - val_loss: 0.0421 - val_accuracy: 0.9860
Epoch 6/12
469/469 [==============================] - 75s 159ms/step - loss: 0.0080 - accuracy: 0.9976 - val_loss: 0.0451 - val_accuracy: 0.9878
Epoch 7/12
469/469 [==============================] - 74s 157ms/step - loss: 0.0071 - accuracy: 0.9979 - val_loss: 0.0392 - val_accuracy: 0.9885
Epoch 8/12
469/469 [==============================] - 83s 177ms/step - loss: 0.0069 - accuracy: 0.9976 - val_loss: 0.0433 - val_accuracy: 0.9882
Epoch 9/12
469/469 [==============================] - 78s 166ms/step - loss: 0.0053 - accuracy: 0.9984 - val_loss: 0.0399 - val_accuracy: 0.9907
Epoch 10/12
469/469 [==============================] - 78s 165ms/step - loss: 0.0050 - accuracy: 0.9983 - val_loss: 0.0412 - val_accuracy: 0.9901
Epoch 11/12
469/469 [==============================] - 75s 160ms/step - loss: 0.0035 - accuracy: 0.9990 - val_loss: 0.0461 - val_accuracy: 0.9897
Epoch 12/12
469/469 [==============================] - 75s 160ms/step - loss: 0.0025 - accuracy: 0.9992 - val_loss: 0.0466 - val_accuracy: 0.9889

Non-MacOS optimized Tensorflow (pip install tensorflow in conda env). Results:

  • 64s /epoch
  • 130s/step
  • 99.90% final acc
2020-12-29 17:12:34.872512: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-12-29 17:12:34.872759: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-29 17:12:35.023844: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-12-29 17:12:35.676140: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1557 - accuracy: 0.9538/Users/fengxma/opt/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 69s 140ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1557 - accuracy: 0.9538 - val_loss: 0.0425 - val_accuracy: 0.9848
Epoch 2/12
469/469 [==============================] - 68s 138ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0406 - accuracy: 0.9879 - val_loss: 0.0446 - val_accuracy: 0.9869
Epoch 3/12
469/469 [==============================] - 65s 132ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0250 - accuracy: 0.9925 - val_loss: 0.0372 - val_accuracy: 0.9871
Epoch 4/12
469/469 [==============================] - 65s 134ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0176 - accuracy: 0.9944 - val_loss: 0.0325 - val_accuracy: 0.9887
Epoch 5/12
469/469 [==============================] - 63s 129ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0119 - accuracy: 0.9963 - val_loss: 0.0349 - val_accuracy: 0.9901
Epoch 6/12
469/469 [==============================] - 66s 135ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0102 - accuracy: 0.9968 - val_loss: 0.0350 - val_accuracy: 0.9888
Epoch 7/12
469/469 [==============================] - 63s 129ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0069 - accuracy: 0.9978 - val_loss: 0.0375 - val_accuracy: 0.9904
Epoch 8/12
469/469 [==============================] - 64s 130ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0051 - accuracy: 0.9985 - val_loss: 0.0459 - val_accuracy: 0.9871
Epoch 9/12
469/469 [==============================] - 64s 131ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0072 - accuracy: 0.9975 - val_loss: 0.0347 - val_accuracy: 0.9905
Epoch 10/12
469/469 [==============================] - 66s 135ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9979 - val_loss: 0.0439 - val_accuracy: 0.9881
Epoch 11/12
469/469 [==============================] - 65s 132ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0036 - accuracy: 0.9988 - val_loss: 0.0476 - val_accuracy: 0.9885
Epoch 12/12
469/469 [==============================] - 64s 131ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0029 - accuracy: 0.9990 - val_loss: 0.0448 - val_accuracy: 0.9897

@DevReev
Copy link

DevReev commented Dec 30, 2020

2020-12-30 14:50:04.896932: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-30 14:50:05.037206: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2020-12-30 14:50:06.878061: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1611 - accuracy: 0.9521
469/469 [==============================] - 23s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1610 - accuracy: 0.9521 - val_loss: 0.0496 - val_accuracy: 0.9846
Epoch 2/12
469/469 [==============================] - 23s 45ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0453 - accuracy: 0.9860 - val_loss: 0.0501 - val_accuracy: 0.9833
Epoch 3/12
469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0284 - accuracy: 0.9910 - val_loss: 0.0380 - val_accuracy: 0.9868
Epoch 4/12
469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0198 - accuracy: 0.9942 - val_loss: 0.0343 - val_accuracy: 0.9888
Epoch 5/12
469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0135 - accuracy: 0.9957 - val_loss: 0.0318 - val_accuracy: 0.9904
Epoch 6/12
469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0104 - accuracy: 0.9967 - val_loss: 0.0337 - val_accuracy: 0.9896
Epoch 7/12
469/469 [==============================] - 22s 42ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0080 - accuracy: 0.9974 - val_loss: 0.0363 - val_accuracy: 0.9895
Epoch 8/12
469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0074 - accuracy: 0.9973 - val_loss: 0.0470 - val_accuracy: 0.9878
Epoch 9/12
469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0065 - accuracy: 0.9976 - val_loss: 0.0436 - val_accuracy: 0.9887
Epoch 10/12
469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9982 - val_loss: 0.0492 - val_accuracy: 0.9881
Epoch 11/12
469/469 [==============================] - 22s 44ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9983 - val_loss: 0.0429 - val_accuracy: 0.9896
Epoch 12/12
469/469 [==============================] - 22s 43ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0037 - accuracy: 0.9989 - val_loss: 0.0454 - val_accuracy: 0.9893

Had quite some fan noise

@singhsidhukuldeep
Copy link

GPU name: Tesla T4 16GB vRam
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
RAM: 16 GB
Precision: Float 32

Epoch 12/12
469/469 [==============================] - 8s 8ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0038 - accuracy: 0.9987 - val_loss: 0.0305 - val_accuracy: 0.9919
CPU times: user 2min, sys: 54.6 s, total: 2min 55s
Wall time: 2min 2s


GPU name: Tesla T4 16GB vRam
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
RAM: 16 GB
Precision: Float 16

Epoch 12/12
469/469 [==============================] - 9s 8ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9982 - val_loss: 0.0422 - val_accuracy: 0.9894
CPU times: user 2min 5s, sys: 55.8 s, total: 3min 1s
Wall time: 2min 5s

@RahulBhalley
Copy link

Seeing all amazing results you might not wanna bother about this machine from 2016. 😅 Anyways, I got the following information.

System: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)
Operating System: macOS Big Sur version 11.1
Processor: 2.9 GHz Dual-Core Intel Core i5
Memory: 8 GB 2133 MHz LPDDR3
Graphics: Intel Iris Graphics 550 1536 MB

2021-01-16 19:13:22.511385: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-16 19:13:23.496719: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 2.2993 - accuracy: 0.1251/Users/rahulbhalley/tensorflow_macos_venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 146s 288ms/step - batch: 234.0000 - size: 1.0000 - loss: 2.2993 - accuracy: 0.1251 - val_loss: 2.3012 - val_accuracy: 0.1135
Epoch 2/12
469/469 [==============================] - 140s 291ms/step - batch: 234.0000 - size: 1.0000 - loss: 1.8151 - accuracy: 0.3670 - val_loss: 0.6209 - val_accuracy: 0.8441
Epoch 3/12
469/469 [==============================] - 140s 289ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.4052 - accuracy: 0.8984 - val_loss: 0.2491 - val_accuracy: 0.9445
Epoch 4/12
469/469 [==============================] - 158s 330ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1970 - accuracy: 0.9510 - val_loss: 0.1449 - val_accuracy: 0.9649
Epoch 5/12
469/469 [==============================] - 145s 301ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1394 - accuracy: 0.9653 - val_loss: 0.1099 - val_accuracy: 0.9695
Epoch 6/12
469/469 [==============================] - 152s 312ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1117 - accuracy: 0.9715 - val_loss: 0.0927 - val_accuracy: 0.9739
Epoch 7/12
469/469 [==============================] - 146s 300ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0933 - accuracy: 0.9766 - val_loss: 0.0828 - val_accuracy: 0.9787
Epoch 8/12
469/469 [==============================] - 180s 374ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0810 - accuracy: 0.9796 - val_loss: 0.0765 - val_accuracy: 0.9793
Epoch 9/12
469/469 [==============================] - 165s 342ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0717 - accuracy: 0.9817 - val_loss: 0.0718 - val_accuracy: 0.9811
Epoch 10/12
469/469 [==============================] - 140s 287ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0630 - accuracy: 0.9845 - val_loss: 0.0586 - val_accuracy: 0.9818
Epoch 11/12
469/469 [==============================] - 229s 480ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0570 - accuracy: 0.9859 - val_loss: 0.0727 - val_accuracy: 0.9817
Epoch 12/12
469/469 [==============================] - 146s 302ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0506 - accuracy: 0.9874 - val_loss: 0.0559 - val_accuracy: 0.9838

Key results:

  • Time / Epochs: ~146 seconds
  • Total Training Time: ~29 minutes
  • Accuracy: 0.9838

@dmitry-kabanov
Copy link

dmitry-kabanov commented Jan 23, 2021

iMac Pro 2017, 3 GHz 10-Core Intel Xeon W, 32 GB 2666 MHz DDR4, Radeon Pro Vega 64 16 GB

On GPU:

2021-01-23 15:21:50.079691: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-23 15:21:50.183928: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-23 15:21:52.322549: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1696 - accuracy: 0.9476/Users/dima/dev/learn/2021-01-23-apple-tensorflow/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 17s 27ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1696 - accuracy: 0.9476 - val_loss: 0.0472 - val_accuracy: 0.9850
Epoch 2/12
469/469 [==============================] - 14s 27ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9866 - val_loss: 0.0391 - val_accuracy: 0.9874
...
Epoch 11/12
469/469 [==============================] - 15s 28ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0042 - accuracy: 0.9985 - val_loss: 0.0474 - val_accuracy: 0.9891
Epoch 12/12
469/469 [==============================] - 15s 28ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0051 - accuracy: 0.9983 - val_loss: 0.0446 - val_accuracy: 0.9892

On CPU:

2021-01-23 15:25:55.524865: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-23 15:25:55.617573: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-23 15:25:56.065950: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
469/469 [==============================] - ETA: 0s - batch: 234.0000 - size: 1.0000 - loss: 0.1579 - accuracy: 0.9530/Users/dima/dev/learn/2021-01-23-apple-tensorflow/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 45s 93ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1579 - accuracy: 0.9530 - val_loss: 0.0545 - val_accuracy: 0.9820
Epoch 2/12
469/469 [==============================] - 45s 93ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0457 - accuracy: 0.9858 - val_loss: 0.0446 - val_accuracy: 0.9856
...
Epoch 11/12
469/469 [==============================] - 47s 96ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0057 - accuracy: 0.9978 - val_loss: 0.0409 - val_accuracy: 0.9894
Epoch 12/12
469/469 [==============================] - 47s 96ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9990 - val_loss: 0.0409 - val_accuracy: 0.9890

On pip-provided Tensorflow 2.4 (with removing two mlcompute lines from the script) it is twice faster:

2021-01-23 15:42:17.869355: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-23 15:42:17.869529: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-23 15:42:17.960406: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-01-23 15:42:18.386414: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1506 - accuracy: 0.9546/Users/dima/dev/learn/2021-01-23-apple-tensorflow/venv-tf-pip/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1506 - accuracy: 0.9546 - val_loss: 0.0459 - val_accuracy: 0.9854
Epoch 2/12
469/469 [==============================] - 24s 48ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0423 - accuracy: 0.9869 - val_loss: 0.0390 - val_accuracy: 0.9870
...
Epoch 11/12
469/469 [==============================] - 26s 51ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0043 - accuracy: 0.9985 - val_loss: 0.0505 - val_accuracy: 0.9896
Epoch 12/12
469/469 [==============================] - 25s 50ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0037 - accuracy: 0.9989 - val_loss: 0.0450 - val_accuracy: 0.9900

@nikolaeff
Copy link

This code is too hot! I think I just toasted GPU on my 16'' mbp by running this benchmark. Make sure your warranty not expired before experimenting.

@Leon-lianglyu
Copy link

MacBook Air (M1, 2020) 7 Core GPU
Train on 469 steps, validate on 79 steps
Epoch 1/12
467/469 [============================>.] - ETA: 0s - batch: 233.0000 - size: 1.0000 - loss: 0.1596 - accuracy: 0.9516/Users/leon/miniforge3/envs/tf-env/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically.
warnings.warn('Model.state_updates will be removed in a future version. '
469/469 [==============================] - 13s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1594 - accuracy: 0.9517 - val_loss: 0.0578 - val_accuracy: 0.9819
Epoch 2/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0430 - accuracy: 0.9871 - val_loss: 0.0362 - val_accuracy: 0.9879
Epoch 3/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0269 - accuracy: 0.9913 - val_loss: 0.0375 - val_accuracy: 0.9870
Epoch 4/12
469/469 [==============================] - 12s 23ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0181 - accuracy: 0.9941 - val_loss: 0.0393 - val_accuracy: 0.9878
Epoch 5/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0127 - accuracy: 0.9956 - val_loss: 0.0347 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 12s 23ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0098 - accuracy: 0.9967 - val_loss: 0.0356 - val_accuracy: 0.9890
Epoch 7/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0087 - accuracy: 0.9970 - val_loss: 0.0341 - val_accuracy: 0.9896
Epoch 8/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0049 - accuracy: 0.9984 - val_loss: 0.0402 - val_accuracy: 0.9893
Epoch 9/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0061 - accuracy: 0.9978 - val_loss: 0.0480 - val_accuracy: 0.9884
Epoch 10/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0058 - accuracy: 0.9980 - val_loss: 0.0435 - val_accuracy: 0.9877
Epoch 11/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0043 - accuracy: 0.9986 - val_loss: 0.0410 - val_accuracy: 0.9913
Epoch 12/12
469/469 [==============================] - 12s 24ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0030 - accuracy: 0.9989 - val_loss: 0.0492 - val_accuracy: 0.9889

Process finished with exit code 0

@harshamodini
Copy link

this was one of the factors that helped me choose between 2 laptops that were priced the same
1.)MSI gf65 i7 10th gen with 6GB rtx2060
2.)apple MacBook air m1 base model

I ran the benchmark on both the device at the store and was surprised how capable apple m1 even tho it couldn't beat the MSI but it gave a respected result than the similar priced hp

anyway at last I end up buying MSI as it gave me more options

so here are my results:
specs:i7 10th gen
GPU:RTX 2060(6GB) it only utilized 40% only

Epoch 1/12
469/469 [==============================] - 7s 9ms/step - loss: 0.3589 - accuracy: 0.8936 - val_loss: 0.0471 - val_accuracy: 0.9855
Epoch 2/12
469/469 [==============================] - 4s 8ms/step - loss: 0.0429 - accuracy: 0.9871 - val_loss: 0.0355 - val_accuracy: 0.9879
Epoch 3/12
469/469 [==============================] - 4s 8ms/step - loss: 0.0258 - accuracy: 0.9918 - val_loss: 0.0318 - val_accuracy: 0.9894
Epoch 4/12
469/469 [==============================] - 4s 8ms/step - loss: 0.0163 - accuracy: 0.9943 - val_loss: 0.0275 - val_accuracy: 0.9913
Epoch 5/12
469/469 [==============================] - 4s 8ms/step - loss: 0.0117 - accuracy: 0.9962 - val_loss: 0.0349 - val_accuracy: 0.9894
Epoch 6/12
469/469 [==============================] - 4s 8ms/step - loss: 0.0096 - accuracy: 0.9966 - val_loss: 0.0389 - val_accuracy: 0.9883
Epoch 7/12
469/469 [==============================] - 4s 8ms/step - loss: 0.0078 - accuracy: 0.9973 - val_loss: 0.0510 - val_accuracy: 0.9869
Epoch 8/12
469/469 [==============================] - 4s 8ms/step - loss: 0.0081 - accuracy: 0.9971 - val_loss: 0.0389 - val_accuracy: 0.9903
Epoch 9/12
469/469 [==============================] - 4s 8ms/step - loss: 0.0033 - accuracy: 0.9989 - val_loss: 0.0456 - val_accuracy: 0.9895
Epoch 10/12
469/469 [==============================] - 4s 9ms/step - loss: 0.0053 - accuracy: 0.9983 - val_loss: 0.0410 - val_accuracy: 0.9903
Epoch 11/12
469/469 [==============================] - 4s 8ms/step - loss: 0.0035 - accuracy: 0.9988 - val_loss: 0.0558 - val_accuracy: 0.9875
Epoch 12/12
469/469 [==============================] - 4s 8ms/step - loss: 0.0018 - accuracy: 0.9995 - val_loss: 0.0459 - val_accuracy: 0.9898

each epoch 4s
each step 8m/s
accuracy :0.9995
val_accuracy: 0.9898

@thecaffeinedev
Copy link

Tested on a MacBook Pro (13-inch, M1, 2020) with 8 GB RAM

Train on 469 steps, validate on 79 steps
Epoch 1/12
468/469 [============================>.] - ETA: 0s - batch: 233.5000 - size: 1.0000 - loss: 0.1554 - accuracy: 0.9533
469/469 [==============================] - 14s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1552 - accuracy: 0.9534 - val_loss: 0.0524 - val_accuracy: 0.9836
Epoch 2/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0447 - accuracy: 0.9865 - val_loss: 0.0402 - val_accuracy: 0.9863
Epoch 3/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0263 - accuracy: 0.9919 - val_loss: 0.0316 - val_accuracy: 0.9901
Epoch 4/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0176 - accuracy: 0.9941 - val_loss: 0.0319 - val_accuracy: 0.9885
Epoch 5/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0115 - accuracy: 0.9961 - val_loss: 0.0370 - val_accuracy: 0.9890
Epoch 6/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0103 - accuracy: 0.9965 - val_loss: 0.0376 - val_accuracy: 0.9893
Epoch 7/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0079 - accuracy: 0.9973 - val_loss: 0.0345 - val_accuracy: 0.9892
Epoch 8/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0055 - accuracy: 0.9982 - val_loss: 0.0340 - val_accuracy: 0.9900
Epoch 9/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0063 - accuracy: 0.9976 - val_loss: 0.0442 - val_accuracy: 0.9888
Epoch 10/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0040 - accuracy: 0.9987 - val_loss: 0.0374 - val_accuracy: 0.9895
Epoch 11/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0044 - accuracy: 0.9984 - val_loss: 0.0370 - val_accuracy: 0.9906
Epoch 12/12
469/469 [==============================] - 13s 25ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0034 - 

accuracy: 0.9988 - val_loss: 0.0478 - val_accuracy: 0.9883
CPU times: user 2min 6s, sys: 30.9 s, total: 2min 37s
Wall time: 3min 2s


Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests