# Food Vision Big

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt

In [None]:
# Get helper functions file
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py
from helper_functions import create_tensorboard_callback, plot_loss_curves, compare_historys

--2023-12-25 10:05:40--  https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10246 (10K) [text/plain]
Saving to: ‘helper_functions.py’


2023-12-25 10:05:41 (56.9 MB/s) - ‘helper_functions.py’ saved [10246/10246]



## Loading the data, exploring it and preprocessing

In [None]:
# Is the dataset that we're looking for available?
datasets_names = tfds.list_builders()
print("food101" in datasets_names)

True


In [None]:
(train_data, test_data), ds_info = tfds.load(name="food101",
                                             split=["train", "validation"],
                                             shuffle_files=True,
                                             as_supervised=True,
                                             with_info=True)

""" split = Not all datasets have train, validation and test, this one has only training and validation.
    as_supervised = download data in tuple format (sample, label), e.g. (image, label)
    with_info = include dataset metadata? (ds_info)
"""

' split = Not all datasets have train, validation and test, this one has only training and validation.\n    as_supervised = download data in tuple format (sample, label), e.g. (image, label)\n    with_info = include dataset metadata? (ds_info)\n'

In [12]:
print(ds_info.features)

# Get class names
class_names = ds_info.features["label"].names

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=uint8),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=101),
})


In [None]:
# Let's look at how this data is constructed, to know if we need to change anything
train_sample = train_data.take(1)

# Output info about our training sample
for image, label in train_sample:
  print(f""" Image shape: {image.shape} Image dtype: {image.dtype}
        Target class from Food101 (tensor form): {label}
        Class name (str form): {class_names[label.numpy()]}""")


NameError: ignored

All right, we see here that there are some things we need to correct, the first is that the shapes differ from image to image, we need to stay consistent and create the tensors to be of the same size.

The second thing that we notice is that the images are not in float format, but
rather int, we can't have that (since we'll use efficientnet we don't need to
worry about scalling the pixel-values of the image to 0-1 values, but in others we might).

The third is that the labels are not one-hot encoded so we might need to use a different loss function when creating the model.

**We'll solve these problems with a preprocessing function.**

In [None]:
# Make a function for preprocessing images
def preprocess_img(image, label, img_shape=224):
  """
  Converts image datatype from 'uint8' -> 'float32' and reshapes image to
  [img_shape, img_shape, color_channels]
  """
  image = tf.image.resize(image, [img_shape, img_shape]) # reshape to img_shape
  return tf.cast(image, tf.float32), label # return (float32_image, label) tuple

In [None]:
# Preprocess a single sample image and check the outputs
preprocessed_img = preprocess_img(image, label)[0]
print(f"Image after preprocessing:\n {preprocessed_img[:2]}...,\nShape: {preprocessed_img.shape}," +
      "\nDatatype: {preprocessed_img.dtype}")

Image after preprocessing:
 [[[25.244898   12.244898    4.7295923 ]
  [32.40816    19.408163   11.408163  ]
  [32.719387   19.719387   10.719387  ]
  ...
  [39.005093   20.994862    3.4999783 ]
  [36.31123    19.31123     2.1173685 ]
  [36.413277   20.000034    4.000035  ]]

 [[19.897957    6.897958    0.18367267]
  [28.943878   15.943878    7.9438777 ]
  [22.112244    9.112244    1.1122437 ]
  ...
  [48.89794    27.015284    6.1428356 ]
  [44.642845   23.852028    3.0714417 ]
  [42.7602     21.954084    3.0714283 ]]]...,
Shape: (224, 224, 3),
Datatype: {preprocessed_img.dtype}


## Batching and preparing the dataset

For loading data in the most performant way possible, see the TensorFlow docuemntation on Better performance with the tf.data API (https://www.tensorflow.org/guide/data_performance).

In [None]:
# Map preprocessing function to training data (and paralellize)
train_data = train_data.map(map_func=preprocess_img, num_parallel_calls=tf.data.AUTOTUNE)

# Shuffle train_data and turn it into batches and prefetch it (load it faster)
train_data = train_data.shuffle(buffer_size=1000).batch(batch_size=32).prefetch(buffer_size=tf.data.AUTOTUNE)

# Map prepreprocessing function to test data
test_data = test_data.map(preprocess_img, num_parallel_calls=tf.data.AUTOTUNE)

# Turn test data into batches (don't need to shuffle)
test_data = test_data.batch(32).prefetch(tf.data.AUTOTUNE)

"""Note: Extra: cache() - caches elements in a target dataset, saving loading time
(will only if your dataset is small enough to fit in memory, standard Colab instances
only have 12GB of memory)"""

train_data, test_data

(<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))>,
 <_PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))>)

## Building Feature Extraction model with Mixed Precision Training

In [None]:
# Callbacks - Create ModelCheckpoint callback to save model's progress
checkpoint_path = "model_checkpoints/cp.ckpt" # saving weights requires ".ckpt" extension
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
                                                      monitor="val_accuracy", # save the model weights with best validation accuracy
                                                      save_best_only=True, # only save the best weights
                                                      save_weights_only=True,
                                                      verbose=0) # don't print out whether or not model is being saved

# Setting mixed precision
tf.keras.mixed_precision.set_global_policy(policy="mixed_float16")

In [13]:
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(include_top=False)
base_model.trainable = False;

inputs = tf.keras.layers.Input(shape=(224, 224, 3), name="input_layer")

x = base_model(inputs, training=False) # Model in inference type mode only.
x = tf.keras.layers.GlobalAveragePooling2D(name="pooling_layer")(x)
x = tf.keras.layers.Dense(len(class_names))(x) # We want 1 output neuron per class

#Mixed precision requires dtype=float32
outputs = tf.keras.layers.Activation("softmax", dtype=tf.float32, name="softmax_float32")(x)

model = tf.keras.Model(inputs, outputs)

# Use sparse_categorical_crossentropy when labels are *not* one-hot encoded
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(),
              metrics=["accuracy"])

In [None]:
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_layer (InputLayer)    [(None, 224, 224, 3)]     0         
                                                                 
 efficientnetv2-b0 (Functio  (None, None, None, 1280   5919312   
 nal)                        )                                   
                                                                 
 pooling_layer (GlobalMaxPo  (None, 1280)              0         
 oling2D)                                                        
                                                                 
 dense (Dense)               (None, 101)               129381    
                                                                 
 softmax_float32 (Activatio  (None, 101)               0         
 n)                                                              
                                                             

In [14]:
history1 = model.fit(train_data,
                     epochs=3,
                     steps_per_epoch=len(train_data),
                     validation_data=test_data,
                     validation_steps=int(0.15 * len(test_data)),
                     callbacks=[model_checkpoint])

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [15]:
model.evaluate(test_data)



[1.002677321434021, 0.7258613705635071]

## Saving model to file

In [16]:
# We need to save it to drive because the model is waaay too big to rerun everything
# everytime
from google.colab import drive

# Mount Google Drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [17]:
# Specify the path to save the model
model_path = "/content/gdrive/My Drive/Colab Notebooks/TENSORFLOW/ModelFoodVisionBig/"

model.save(model_path)

## Fine-Tunning

### Loading Model

In [20]:
# Load model previously saved above
loaded_model = tf.keras.models.load_model("/content/gdrive/My Drive/Colab Notebooks/TENSORFLOW/ModelFoodVisionBig/")

In [None]:
loaded_model.evaluate(test_data)



[2.4630420207977295, 0.5788118839263916]

### Setting EarlyStop callback, model checkpoint callback again AND ReduceLROnPlateau

The ReduceLROnPlateau callback helps to tune the learning rate for you.

Like the ModelCheckpoint and EarlyStopping callbacks, the ReduceLROnPlateau callback montiors a specified metric and when that metric stops improving, it reduces the learning rate by a specified factor (e.g. divides the learning rate by 10).

**But why lower the learning rate?**

Imagine having a coin at the back of the couch and you're trying to grab with your fingers. Now think of the learning rate as the size of the movements your hand makes towards the coin, the closer you get, the smaller you want your hand movements to be, otherwise the coin will be lost.

Our model's ideal performance is the equivalent of grabbing the coin. So as training goes on and our model gets closer and closer to it's ideal performance (also called **convergence**), we want the amount it learns to be less and less.

To do this we'll create an instance of the *ReduceLROnPlateau* callback to monitor the validation loss just like the EarlyStopping callback. Once the validation loss stops improving for two or more epochs, we'll reduce the learning rate by a factor of 5 (e.g. 0.001 to 0.0002).

And to make sure the learning rate doesn't get too low (and potentially result in our model learning nothing), we'll set the minimum learning rate to 1e-7.

In [18]:
# This will stop training if model's val_loss doesn't improve for 3 epochs
early_stopping = tf.keras.callbacks.EarlyStopping(monitor="val_loss",
                                                  patience=3)

# This time, we will save best model during fine-tuning (monitor val_loss while
# training and save the best model (lowest val_loss))
checkpoint_path2 = "model_checkpoints2/cpFineTunning.ckpt" # saving weights requires ".ckpt" extension
model_checkpoint2 = tf.keras.callbacks.ModelCheckpoint(checkpoint_path2,
                                                      monitor="val_loss",
                                                      save_best_only=True)

# Creating learning rate reduction callback
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",
                                                 factor=0.2, # multiply the learning rate by 0.2 (reduce by 5x)
                                                 patience=2,
                                                 verbose=1, # print out when learning rate goes down
                                                 min_lr=1e-7)

### Unfreezing ALL layers, Compiling and Fitting

In [22]:
# Are any of the layers in our model frozen?
for layer in loaded_model.layers:
    layer.trainable = True # set all layers to trainable

In [23]:
loaded_model.compile(loss="sparse_categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
              metrics=["accuracy"])


In [None]:
# And now to make sure it starts at the same checkpoint, we can load the checkpointed
# weights from checkpoint_path:
model.load_weights(checkpoint_path)

<tensorflow.python.checkpoint.checkpoint.CheckpointLoadStatus at 0x7b586b593c40>

In [24]:
history1_fine_tunned = loaded_model.fit(train_data,
                                        epochs=100,
                                        steps_per_epoch = len(train_data),
                                        validation_data = test_data,
                                        validation_steps = int(0.15 * len(test_data)),
                                        callbacks=[early_stopping, reduce_lr, model_checkpoint2])

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 4: ReduceLROnPlateau reducing learning rate to 1.9999999494757503e-05.
Epoch 5/100


### Evaluating the model

In [25]:
loaded_model.evaluate(test_data)



[0.930356502532959, 0.8158019781112671]

## Saving the model fine-tunned

In [26]:
# We need to save it to drive because the model is waaay too big to rerun everything
# everytime
from google.colab import drive

# Mount Google Drive
drive.mount('/content/gdrive')

# Specify the path to save the model
model_path = "/content/gdrive/My Drive/Colab Notebooks/TENSORFLOW/ModelFoodVisionBigFT/"

model.save(model_path)

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


# Extra Exercises

## Exercise 1 Use the same evaluation techniques on the large-scale Food Vision model as you did in the previous notebook (Transfer Learning Part 3: Scaling up):

More specifically, it would be good to see:

* A confusion matrix between all of the model's predictions and true labels.
* A graph showing the f1-scores of each class.
* A visualization of the model making predictions on various images and
  comparing the predictions to the ground truth.
    * For example, plot a sample image from the test dataset and have the title
      of the plot show the prediction, the prediction probability and the ground truth label.

In [31]:
# prompt: A confusion matrix between all of the model's predictions and true labels

from sklearn.metrics import confusion_matrix
import numpy as np


# Get true labels


# Get predictions on the test data
predictions = loaded_model.predict(test_data)

# Create confusion matrix
cm = confusion_matrix(true_labels, predictions)
# Plot confusion matrix
plt.imshow(cm, cmap=plt.cm.Blues)
plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.colorbar()
plt.show()


  1/790 [..............................] - ETA: 2:10

  labels = np.array(labels)


 25/790 [..............................] - ETA: 1:00

KeyboardInterrupt: ignored

## Exercise 2: Take 3 of your own photos of food and use the Food Vision model to make predictions on them. How does it go?

## Exercise 3: Retrain the model (feature extraction and fine-tuning) we trained in this notebook, except this time use EfficientNetB4 as the base model instead of EfficientNetB0. Do you notice an improvement in performance? Does it take longer to train? Are there any tradeoffs to consider?

## Exercise 4: Name one important benefit of mixed precision training, how does this benefit take place?

#