<a href="https://colab.research.google.com/github/LxYuan0420/eat_tensorflow2_in_30_days/blob/master/notebooks/1_2_Example_Modeling_Procedure_for_Images.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from google.colab import drive
drive.mount('/gdrive')

Mounted at /gdrive


In [2]:
%cd /gdrive/MyDrive/Colab Notebooks/git/eat_tensorflow2_in_30_days/notebooks

/gdrive/MyDrive/Colab Notebooks/git/eat_tensorflow2_in_30_days/notebooks


**1-2 Example: Modeling Procedure for Images**

1. Data Preparation

The cifar2 dataset is a sub-set of cifar10, which only contains two classes: airplane and automobile.

Each class contains 5000 images for training and 1000 images for testing.

The goal for this task is to train a model to classify images as airplane or automobile.

The files of cifar2 are organized as below:

`cifar2_datasets/train/0_airplane/*.jpg`

`cifar2_datasets/train/1_automobile/*.jpg`

`cifar2_datasets/test/0_airplane/*.jpg`

`cifar2_datasets/test/1_automobile/*.jpg`



There are two ways of image preparation in TensorFlow.

The first one is constructing the image data generator using ImageDataGenerator in tf.keras.

The second one is constructing data pipeline using tf.data.Dataset and several methods in tf.image

The former is simpler and is demonstrated in this article (in Chinese).

The latter is the original method of TensorFlow, which is more flexible with possible better performance with proper usage.


Below is the introduction to the second method.

In [3]:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models

BATCH_SIZE=100

def load_image(img_path, size=(32,32)):
    label = tf.constant(1, tf.int8) if tf.strings.regex_full_match(img_path, ".*automobile.*") else tf.constant(0, tf.int8)
    img = tf.io.read_file(img_path)
    img = tf.io.decode_jpeg(img)
    img = tf.image.resize(img, size) / 255.0

    return (img, label)

In [4]:
# sample output of Dataset.list_file("../filename/..")
# [b'../data/cifar2/train/automobile/4004.jpg']
# so it is just a file path


#Parellel pre-processing using num_parellel_calss and caching data with prefetch function to improve the performance
ds_train = tf.data.Dataset.list_files("../data/cifar2/train/*/*.jpg") \
            .map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
            .shuffle(buffer_size=1000).batch(BATCH_SIZE) \
            .prefetch(tf.data.experimental.AUTOTUNE) 

ds_test = tf.data.Dataset.list_files("../data/cifar2/test/*/*.jpg") \
            .map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
            .batch(BATCH_SIZE) \
            .prefetch(tf.data.experimental.AUTOTUNE) 

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

#Checking part of the samples
from matplotlib import pyplot as plt 

plt.figure(figsize=(8,8)) 
for i,(img,label) in enumerate(ds_train.unbatch().take(9)):
    ax=plt.subplot(3,3,i+1)
    ax.imshow(img.numpy())
    ax.set_title("label = %d"%label)
    ax.set_xticks([])
    ax.set_yticks([]) 
plt.show()

In [None]:
for x, y in ds_train.take(1):
    print(x.shape, y.shape)

**2. Model Definition**

Usually there are three ways of modeling using APIs of Keras: sequential modeling using Sequential() function, arbitrary modeling using functional API, and customized modeling by inheriting base class Model.

Here we use API functions for modeling.

In [6]:
def img_classifier():
    inputs = tf.keras.Input(shape=(32, 32, 3))
    x = tf.keras.layers.Conv2D(32, kernel_size=(3,3))(inputs)
    x = tf.keras.layers.MaxPool2D()(x)
    x = tf.keras.layers.Conv2D(64, kernel_size=(5,5))(x)
    x = tf.keras.layers.MaxPool2D()(x)
    x = tf.keras.layers.Dropout(0.1)(x)
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dense(32, activation='relu')(x)
    outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)

    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    return model

In [7]:
model = img_classifier()

model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        51264     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
dropout (Dropout)            (None, 5, 5, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0     

**3. Model Training**

There are three usual ways for model training: use internal function fit, use internal function train_on_batch, and customized training loop. Here we introduce the simplist way: using internal function fit.

In [8]:
import datetime
import os

stamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
logdir = os.path.join('data', 'autograph', stamp)

## We recommend using pathlib under Python3
# from pathlib import Path
# stamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
# logdir = str(Path('../data/autograph/' + stamp))

tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)


model.compile(
    loss=tf.keras.losses.BinaryCrossentropy(),
    optimizer=tf.keras.optimizers.Adam(),
    metrics=['accuracy']
)

In [None]:
history = model.fit(ds_train,
                    epochs=1,
                    validation_data=ds_test,
                    callbacks=[tensorboard_callback])

**4. Model Evaluation**


In [None]:
%load_ext tensorboard

from tensorboard import notebook
notebook.list()

In [None]:
notebook.start("--logdir ../data/")

In [None]:
import pandas as pd 
dfhistory = pd.DataFrame(history.history)
dfhistory.index = range(1,len(dfhistory) + 1)
dfhistory.index.name = 'epoch'

dfhistory

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

import matplotlib.pyplot as plt

def plot_metric(history, metric):
    train_metrics = history.history[metric]
    val_metrics = history.history['val_'+metric]
    epochs = range(1, len(train_metrics) + 1)
    plt.plot(epochs, train_metrics, 'bo--')
    plt.plot(epochs, val_metrics, 'ro-')
    plt.title('Training and validation '+ metric)
    plt.xlabel("Epochs")
    plt.ylabel(metric)
    plt.legend(["train_"+metric, 'val_'+metric])
    plt.show()

In [None]:
plot_metric(history,"loss")

In [None]:
plot_metric(history,"accuracy")

In [None]:
#Evaluating data using model.evaluate function
val_loss,val_accuracy = model.evaluate(ds_test,workers=4)
print(val_loss,val_accuracy)

**Model Saving**

In [None]:
# Saving the weights, this way only save the tensors of the weights
model.save_weights('../model_weights/tf_model_weights.ckpt',save_format = "tf")