# One Device Strategy 

<a target="_blank" href="https://colab.research.google.com/github/LuisAngelMendozaVelasco/TensorFlow-Advanced_Techniques_Specialization/blob/master/Custom_and_Distributed_Training_with_TensorFlow/Week4/Labs/C2_W4_Lab_4_one-device-strategy.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png">Run in Google Colab</a>

In this ungraded lab, you'll learn how to set up a [One Device Strategy](https://www.tensorflow.org/api_docs/python/tf/distribute/OneDeviceStrategy). This is typically used to deliberately test your code on a single device. This can be used before switching to a different strategy that distributes across multiple devices. Please click on the **Open in Colab** badge above so you can download the datasets and use a GPU-enabled lab environment.

## Imports

In [1]:
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
from keras import Sequential, layers, optimizers, Input

2024-08-24 14:09:57.500914: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-24 14:09:57.512503: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-24 14:09:57.516156: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-24 14:09:57.524455: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Define the Distribution Strategy

You can list available devices in your machine and specify a device type. This allows you to verify the device name to pass in `tf.distribute.OneDeviceStrategy()`.

In [2]:
# Choose a device type such as CPU or GPU
devices = tf.config.list_physical_devices('GPU')
print(devices[0])

# You'll see that the name will look something like "/physical_device:GPU:0"
# Just take the GPU:0 part and use that as the name
gpu_name = "GPU:0"

# Define the strategy and pass in the device name
one_strategy = tf.distribute.OneDeviceStrategy(device=gpu_name)

PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')


2024-08-24 14:09:59.504895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1817 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5


## Parameters

We'll define a few global variables for setting up the model and dataset.

In [3]:
pixels = 224
MODULE_HANDLE = 'https://tfhub.dev/tensorflow/resnet_50/feature_vector/1'
IMAGE_SIZE = (pixels, pixels)
BATCH_SIZE = 32

print("Using {} with input size {}".format(MODULE_HANDLE, IMAGE_SIZE))

Using https://tfhub.dev/tensorflow/resnet_50/feature_vector/1 with input size (224, 224)


## Download and Prepare the Dataset

We will use the [Cats vs Dogs](https://www.tensorflow.org/datasets/catalog/cats_vs_dogs) dataset and we will fetch it via TFDS.

In [4]:
splits = ['train[:80%]', 'train[80%:90%]', 'train[90%:]']

(train_examples, validation_examples, test_examples), info = tfds.load('cats_vs_dogs', with_info=True, as_supervised=True, split=splits)

num_examples = info.splits['train'].num_examples
num_classes = info.features['label'].num_classes

In [5]:
# Resize the image and normalize pixel values
def format_image(image, label):
    image = tf.image.resize(image, IMAGE_SIZE) / 255.0

    return  image, label

In [6]:
# Prepare batches
train_batches = train_examples.shuffle(num_examples // 4).map(format_image).batch(BATCH_SIZE).prefetch(1)
validation_batches = validation_examples.map(format_image).batch(BATCH_SIZE).prefetch(1)
test_batches = test_examples.map(format_image).batch(1)

In [7]:
# Check if the batches have the correct size and the images have the correct shape
for image_batch, label_batch in train_batches.take(1):
    pass

print(image_batch.shape)

(32, 224, 224, 3)


2024-08-24 14:10:00.828315: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


## Define and Configure the Model

As with other strategies, setting up the model requires minimal code changes. Let's first define a utility function to build and compile the model.

In [8]:
# Tells if we want to freeze the layer weights of our feature extractor during training
do_fine_tuning = False

In [9]:
def build_and_compile_model():
    print("Building model with", MODULE_HANDLE)

    # Configures the feature extractor fetched from TF Hub
    feature_extractor = hub.KerasLayer(MODULE_HANDLE,
                                       trainable=do_fine_tuning)

    # Define the model
    model = Sequential([Input(shape=IMAGE_SIZE + (3,)),
                        layers.Lambda(lambda x: feature_extractor(x)),
                        # Append a dense with softmax for the number of classes
                        layers.Dense(num_classes, activation='softmax')])

    # Display summary
    model.summary()

    # Configure the optimizer, loss and metrics
    optimizer = optimizers.SGD(learning_rate=0.002, momentum=0.9) if do_fine_tuning else 'adam'
    model.compile(optimizer=optimizer,
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    return model

You can now call the function under the strategy scope. This places variables and computations on the device you specified earlier.

In [10]:
# Build and compile under the strategy scope
with one_strategy.scope():
    model = build_and_compile_model()

Building model with https://tfhub.dev/tensorflow/resnet_50/feature_vector/1


`model.fit()` can be run as usual.

In [11]:
EPOCHS = 1
hist = model.fit(train_batches,
                 epochs=EPOCHS,
                 validation_data=validation_batches)

2024-08-24 14:10:05.789638: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:553] The `assert_cardinality` transformation is currently not handled by the auto-shard rewrite and will be removed.
2024-08-24 14:10:12.173148: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8907
W0000 00:00:1724530212.226015   38924 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530212.249555   38924 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530212.255233   38924 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530212.263048   38924 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530212.269670   38924 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530212.283456   38924 gpu_timer.cc:114] Skipping the delay kernel

[1m  1/582[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1:15:04[0m 8s/step - accuracy: 0.3438 - loss: 1.4520

W0000 00:00:1724530214.686110   38924 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530214.693080   38924 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530214.720244   38924 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530214.729986   38924 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530214.738313   38924 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced


[1m 11/582[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1:32[0m 161ms/step - accuracy: 0.6118 - loss: 0.8705

Corrupt JPEG data: 99 extraneous bytes before marker 0xd9


[1m 43/582[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m1:25[0m 159ms/step - accuracy: 0.8037 - loss: 0.4685



[1m 53/582[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m1:24[0m 159ms/step - accuracy: 0.8265 - loss: 0.4180

Corrupt JPEG data: 396 extraneous bytes before marker 0xd9


[1m125/582[0m [32m━━━━[0m[37m━━━━━━━━━━━━━━━━[0m [1m1:12[0m 159ms/step - accuracy: 0.8980 - loss: 0.2539

Corrupt JPEG data: 65 extraneous bytes before marker 0xd9


[1m333/582[0m [32m━━━━━━━━━━━[0m[37m━━━━━━━━━[0m [1m39s[0m 160ms/step - accuracy: 0.9448 - loss: 0.1419

Corrupt JPEG data: 2226 extraneous bytes before marker 0xd9


[1m346/582[0m [32m━━━━━━━━━━━[0m[37m━━━━━━━━━[0m [1m37s[0m 160ms/step - accuracy: 0.9461 - loss: 0.1388

Corrupt JPEG data: 128 extraneous bytes before marker 0xd9


[1m357/582[0m [32m━━━━━━━━━━━━[0m[37m━━━━━━━━[0m [1m35s[0m 160ms/step - accuracy: 0.9471 - loss: 0.1363

Corrupt JPEG data: 239 extraneous bytes before marker 0xd9


[1m384/582[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m31s[0m 160ms/step - accuracy: 0.9494 - loss: 0.1306

Corrupt JPEG data: 1153 extraneous bytes before marker 0xd9


[1m390/582[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m30s[0m 160ms/step - accuracy: 0.9499 - loss: 0.1294

Corrupt JPEG data: 228 extraneous bytes before marker 0xd9


[1m581/582[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 160ms/step - accuracy: 0.9607 - loss: 0.1023

2024-08-24 14:11:47.893520: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
	 [[{{node MultiDeviceIteratorGetNextFromShard}}]]
W0000 00:00:1724530307.897815   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530307.900958   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530307.904633   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530307.908085   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530307.911830   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530307.917366   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530307.924490   38925 gpu_timer.cc:114] Skipping the delay k

[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 163ms/step - accuracy: 0.9608 - loss: 0.1022

2024-08-24 14:11:49.564712: W external/local_tsl/tsl/framework/bfc_allocator.cc:291] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.35GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
W0000 00:00:1724530309.570323   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530309.573550   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530309.582981   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530309.587532   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530309.590096   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530309.592855   38925 gpu_timer.cc:114] Skipping the delay kernel, measur

[1m582/582[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m117s[0m 188ms/step - accuracy: 0.9608 - loss: 0.1021 - val_accuracy: 0.9918 - val_loss: 0.0233


W0000 00:00:1724530323.903272   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530323.906467   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530323.916095   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530323.921143   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530323.924388   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530323.928084   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530323.931964   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530323.936261   38925 gpu_timer.cc:114] Skipping the delay kernel, measurement accuracy will be reduced
W0000 00:00:1724530323.940013   38925 gp

Once everything is working correctly, you can switch to a different device or a different strategy that distributes to multiple devices.