##### Copyright 2018 The TensorFlow Authors.

# Basic classification cifar 10

This guide trains a neural network model to classify images of clothing, like sneakers and shirts. It's okay if you don't understand all the details; this is a fast-paced overview of a complete TensorFlow program with the details explained as you go.

This guide uses [tf.keras](https://www.tensorflow.org/guide/keras), a high-level API to build and train models in TensorFlow.

In [1]:
# TensorFlow and tf.keras
import tensorflow as tf

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

print(tf.__version__)

2024-07-30 17:35:37.994889: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-30 17:35:38.010340: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-30 17:35:38.015165: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-30 17:35:38.027127: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


2.17.0


## Import the Cifar10 dataset

In [2]:
from pathlib import Path

cifar_data_path = Path('data/cifar-10/raw/tfds')
input_shape = (32, 32, 3)


In [3]:
# To consider to switch to - 

# If not - this is a tf.data.Dataset, docs - https://www.tensorflow.org/api_docs/python/tf/data/Dataset

#  To consider using - tf.keras.utils.image_dataset_from_directory
# example from - https://www.tensorflow.org/tutorials/load_data/images
#  https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image_dataset_from_directory


train_ds = tf.keras.utils.image_dataset_from_directory(
  cifar_data_path / 'train',
  seed=123,
  image_size=input_shape[:-1],
  )
test_ds = tf.keras.utils.image_dataset_from_directory(
  cifar_data_path / 'test',
  seed=123,
  image_size=input_shape[:-1],
  )

Found 50000 files belonging to 10 classes.


I0000 00:00:1722357341.122569   85577 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722357341.166557   85577 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722357341.166826   85577 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722357341.168041   85577 cuda_executor.cc:1015] successful NUMA node read from SysFS ha

Found 10000 files belonging to 10 classes.


In [4]:
class_names = np.array(sorted([item.name for item in (cifar_data_path / 'train').glob('*') if item.name != "LICENSE.txt"]))
print(class_names)

['0' '1' '2' '3' '4' '5' '6' '7' '8' '9']


In [5]:
for f in train_ds.take(1):
  print(f[0].shape)

(32, 32, 32, 3)


2024-07-30 17:35:42.506639: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


In [6]:
print(tf.data.experimental.cardinality(train_ds).numpy()*32)

50016


In [7]:
class_names = [
    "airplane",
    "automobile",
    "bird",
    "cat",
    "deer",
    "dog",
    "frog",
    "horse",
    "ship",
    "truck",
]

In [8]:
normalization_layer = tf.keras.layers.Rescaling(1./255)

In [9]:
normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixel values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image))

0.058823533 1.0


In [10]:
AUTOTUNE = tf.data.AUTOTUNE

# train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)
# val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

In [11]:
class_names = [
    "airplane",
    "automobile",
    "bird",
    "cat",
    "deer",
    "dog",
    "frog",
    "horse",
    "ship",
    "truck",
]
num_classes=len(class_names)

Scale these values to a range of 0 to 1 before feeding them to the neural network model. To do so, divide the values by 255. It's important that the *training set* and the *testing set* be preprocessed in the same way:

## Build the model

Building the neural network requires configuring the layers of the model, then compiling the model.

### Set up the layers

The basic building block of a neural network is the [*layer*](https://www.tensorflow.org/api_docs/python/tf/keras/layers). Layers extract representations from the data fed into them. Hopefully, these representations are meaningful for the problem at hand.

Most of deep learning consists of chaining together simple layers. Most layers, such as `tf.keras.layers.Dense`, have parameters that are learned during training.

In [12]:
model = tf.keras.Sequential([
  tf.keras.layers.Rescaling(1./255),
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(num_classes)
])

The first layer in this network, `tf.keras.layers.Flatten`, transforms the format of the images from a two-dimensional array (of 28 by 28 pixels) to a one-dimensional array (of 28 * 28 = 784 pixels). Think of this layer as unstacking rows of pixels in the image and lining them up. This layer has no parameters to learn; it only reformats the data.

After the pixels are flattened, the network consists of a sequence of two `tf.keras.layers.Dense` layers. These are densely connected, or fully connected, neural layers. The first `Dense` layer has 128 nodes (or neurons). The second (and last) layer returns a logits array with length of 10. Each node contains a score that indicates the current image belongs to one of the 10 classes.

### Compile the model

Before the model is ready for training, it needs a few more settings. These are added during the model's [*compile*](https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile) step:

* [*Optimizer*](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) —This is how the model is updated based on the data it sees and its loss function.
* [*Loss function*](https://www.tensorflow.org/api_docs/python/tf/keras/losses) —This measures how accurate the model is during training. You want to minimize this function to "steer" the model in the right direction.
* [*Metrics*](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) —Used to monitor the training and testing steps. The following example uses *accuracy*, the fraction of the images that are correctly classified.

In [13]:
cce = tf.keras.losses.CategoricalCrossentropy(from_logits=True, label_smoothing=0.1)
def categorical_crossentropy_loss(y_true, y_pred):
    y_true_int = tf.cast(y_true, tf.uint8)
    y_true_onehot = tf.one_hot(y_true_int, num_classes)
    return cce(y_true_onehot, y_pred)

model.compile(optimizer=tf.keras.optimizers.AdamW(),
            #   loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
              loss=categorical_crossentropy_loss,
              metrics=['accuracy'])

## Train the model

Training the neural network model requires the following steps:

1. Feed the training data to the model. In this example, the training data is in the `train_images` and `train_labels` arrays.
2. The model learns to associate images and labels.
3. You ask the model to make predictions about a test set—in this example, the `test_images` array.
4. Verify that the predictions match the labels from the `test_labels` array.


### Wandb init
Initialise Wandb and set parameters

In [14]:
from wandb.integration.keras import WandbMetricsLogger, WandbModelCheckpoint
import wandb
wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33malon_pole[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [15]:
# Start a run, tracking hyperparameters
wandb.init(
    project="cifar10",
    name="adamW"
)

### Feed the model

To start training,  call the [`model.fit`](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) method—so called because it "fits" the model to the training data:

In [16]:
history = model.fit(
    train_ds, 
    epochs=100,
    validation_data=test_ds,
    callbacks=[WandbMetricsLogger()]
    )

Epoch 1/100


I0000 00:00:1722357345.288032   85773 service.cc:146] XLA service 0x7ff520006a70 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1722357345.288056   85773 service.cc:154]   StreamExecutor device (0): NVIDIA GeForce RTX 3080 Ti, Compute Capability 8.6
2024-07-30 17:35:45.321157: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-07-30 17:35:45.470718: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8907
2024-07-30 17:35:45.510366: W external/local_xla/xla/service/gpu/nvptx_compiler.cc:762] The NVIDIA driver's CUDA version is 12.2 which is older than the ptxas CUDA version (12.3.107). Because the driver is older than the ptxas version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibil

[1m 154/1563[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 991us/step - accuracy: 0.1272 - loss: 2.2869

I0000 00:00:1722357346.944299   85773 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.2966 - loss: 1.9811  

[34m[1mwandb[0m: [32m[41mERROR[0m Unable to log learning rate.


[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 2ms/step - accuracy: 0.2966 - loss: 1.9810 - val_accuracy: 0.5288 - val_loss: 1.5456
Epoch 2/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.5323 - loss: 1.5354 - val_accuracy: 0.5844 - val_loss: 1.4266
Epoch 3/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.5921 - loss: 1.4199 - val_accuracy: 0.6004 - val_loss: 1.3964
Epoch 4/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.6218 - loss: 1.3544 - val_accuracy: 0.6278 - val_loss: 1.3346
Epoch 5/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.6405 - loss: 1.3091 - val_accuracy: 0.6407 - val_loss: 1.3108
Epoch 6/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.6612 - loss: 1.2704 - val_accuracy: 0.6528 - val_loss: 1.2814
Epoch 7/100
[1m1563/1

In [17]:
f[1].numpy()

array([9, 1, 9, 6, 9, 4, 4, 0, 9, 1, 2, 0, 1, 1, 0, 7, 4, 8, 3, 7, 5, 7,
       9, 3, 6, 1, 3, 2, 2, 1, 5, 9], dtype=int32)

In [18]:
wandb.finish()

0,1
epoch/accuracy,▁▄▅▅▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇███████████████████
epoch/epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
epoch/loss,█▅▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch/val_accuracy,▁▄▆▇▇▇██████▇▇▇█▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
epoch/val_loss,█▅▃▂▁▁▁▁▁▁▂▁▂▂▂▂▂▃▂▂▂▃▃▃▃▄▄▄▃▄▃▄▄▄▄▄▄▄▄▄

0,1
epoch/accuracy,0.87678
epoch/epoch,99.0
epoch/loss,0.8589
epoch/val_accuracy,0.6809
epoch/val_loss,1.36282


In [19]:
!rm -rf wandb