Copyright (C) 2019-2023 Software Platform Lab, Seoul National University

Licensed under the Apache License, Version 2.0 (the "License"); 

you may not use this file except in compliance with the License. 

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 

Unless required by applicable law or agreed to in writing, software 

distributed under the License is distributed on an "AS IS" BASIS, 


WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 


See the License for the specific language governing permissions and


limitations under the License.

## Defining a model in TensorFlow 


In TensorFlow, various libraries regarding the model definition are provided under `tf.keras`.

### Model Subclassing
We can build a fully-customizable model by **subclassing [tf.keras.Model]** (https://www.tensorflow.org/api_docs/python/tf/keras/models/Model) and **defining your own forward pass.** **Layers** are created in the `__init__` method, provided by the [tf.keras.layers](https://www.tensorflow.org/api_docs/python/tf/keras/layers)  and they are set as attributes of the class instance. **The forward pass is defined in the `call` method.** You can access **model variables by `model.trainable_variables`.**

Below is an example of **a linear regression model** to be defined as a subclass of `tf.keras.Model`, and then be trained using loss function, gradient function and optimizer provided in [tf.keras.optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers). Useful loss functions are also provided in [tf.keras.losses](https://www.tensorflow.org/api_docs/python/tf/keras/losses). We will cover these in more detail as we go on.

In [18]:
import tensorflow as tf

In [19]:
NUM_EXAMPLES = 2000
# 2000*1 형태의 정규 분포 난수 만들기
toy_inputs = tf.random.normal([NUM_EXAMPLES, 1])
noise = tf.random.normal([NUM_EXAMPLES, 1])
toy_outputs = toy_inputs * 2 - 1 + noise * 1/4

In [20]:
toy_inputs

<tf.Tensor: shape=(2000, 1), dtype=float32, numpy=
array([[ 0.08734757],
       [-0.61965317],
       [ 0.9665108 ],
       ...,
       [-0.0677831 ],
       [ 1.4166945 ],
       [ 0.21652994]], dtype=float32)>

In [21]:
class ToyModel(tf.keras.Model):
    def __init__(self):
        """Define layers"""
        super(ToyModel, self).__init__()
        self.dense = tf.keras.layers.Dense(units=1)

    def call(self, input):
        """Define forward pass."""
        result = self.dense(input)        
        return result


# The loss function to be optimized (MSE loss)
def loss(model, inputs, targets):
    error = model(inputs) - targets
    return tf.reduce_mean(tf.square(error)) # 평균제곱오차

optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01)

model = ToyModel()
print("Initial loss: {:.3f}".format(loss(model, toy_inputs, toy_outputs)))
print("Trainable variables:")
for var in model.trainable_variables:
  print("\t", var.name, ": ", var.numpy())

Initial loss: 3.503
Trainable variables:
	 toy_model/dense_5/kernel:0 :  [[0.47264993]]
	 toy_model/dense_5/bias:0 :  [0.]


In [22]:
# Training loop
## GradientTape: 자동 미분을 통해 동적으로 Gradient 값들을 확인해 볼 수 있다는 장점
for i in range(300):
    with tf.GradientTape() as tape:
        loss_value = loss(model, toy_inputs, toy_outputs)
    grads = tape.gradient(loss_value, model.trainable_variables)
    # backpropagation: weight 업데이트
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    if i % 20 == 0:
        print("Loss at step {:03d}: {:.3f}".format(i, loss(model, toy_inputs, toy_outputs)))

print("Final loss: {:.3f}".format(loss(model, toy_inputs, toy_outputs)))
print("Trainable variables:")
for var in model.trainable_variables:
  print("\t", var.name, ": ", var.numpy())

Loss at step 000: 3.364
Loss at step 020: 1.507
Loss at step 040: 0.695
Loss at step 060: 0.340
Loss at step 080: 0.185
Loss at step 100: 0.117
Loss at step 120: 0.088
Loss at step 140: 0.075
Loss at step 160: 0.069
Loss at step 180: 0.066
Loss at step 200: 0.065
Loss at step 220: 0.065
Loss at step 240: 0.065
Loss at step 260: 0.065
Loss at step 280: 0.065
Final loss: 0.065
Trainable variables:
	 toy_model/dense_5/kernel:0 :  [[2.0078015]]
	 toy_model/dense_5/bias:0 :  [-0.9951335]


It's not required to set an input shape for the `tf.keras.Model` class since the parameters are set the first time input is passed to the layer.

tf.keras.layers classes create and contain their own model variables that are tied to the lifetime of their layer objects. To share layer variables, share their objects.

Below examples shows a new model that relies on the previous toy model. We are going to employ an additional bias to fit a slightly different data.

In [23]:
toy_outputs_2 = toy_outputs + 3

class ToyModel2(tf.keras.Model):
    def __init__(self, toy_model):
        """Define layers"""
        super(ToyModel2, self).__init__()
        self.toy_model = toy_model
        self.b = tf.Variable(0., name='another_bias')

    def call(self, input):
        """Define forward pass."""
        result = self.toy_model(input)        
        return result + self.b


model2 = ToyModel2(model)
print("Initial loss: {:.3f}".format(loss(model2, toy_inputs, toy_outputs_2)))
print("Trainable variables:")
for var in model2.trainable_variables:
  print("\t", var.name, ": ", var.numpy())

Initial loss: 9.053
Trainable variables:
	 toy_model/dense_5/kernel:0 :  [[2.0078015]]
	 toy_model/dense_5/bias:0 :  [-0.9951335]
	 another_bias:0 :  0.0


We are only optimizing the additional bias. The weight and bias of toy_model_1 does not change.

In [24]:
# Training loop
for i in range(300):
    with tf.GradientTape() as tape:
        loss_value = loss(model2, toy_inputs, toy_outputs_2)
    grads = tape.gradient(loss_value, [model2.b]) # gradient w.r.t. `model2.b`, not `model2.trainable_variables`
    optimizer.apply_gradients(zip(grads, [model2.b])) # optimize only `model2.b`
    if i % 20 == 0:
        print("Loss at step {:03d}: {:.3f}".format(i, loss(model2, toy_inputs, toy_outputs_2)))

print("Final loss: {:.3f}".format(loss(model2, toy_inputs, toy_outputs_2)))
print("Trainable variables:")
for var in model2.trainable_variables:
  print("\t", var.name, ": ", var.numpy())

Loss at step 000: 8.697
Loss at step 020: 3.912
Loss at step 040: 1.779
Loss at step 060: 0.829
Loss at step 080: 0.405
Loss at step 100: 0.216
Loss at step 120: 0.132
Loss at step 140: 0.095
Loss at step 160: 0.078
Loss at step 180: 0.071
Loss at step 200: 0.067
Loss at step 220: 0.066
Loss at step 240: 0.065
Loss at step 260: 0.065
Loss at step 280: 0.065
Final loss: 0.065
Trainable variables:
	 toy_model/dense_5/kernel:0 :  [[2.0078015]]
	 toy_model/dense_5/bias:0 :  [-0.9951335]
	 another_bias:0 :  2.9911287


## Convolutional Neural Networks
Build simple CNN in TensorFlow.





### Preparing MNIST Dataset

In [25]:
import tensorflow as tf

# Download the mnist dataset using keras
data_train, data_test = tf.keras.datasets.mnist.load_data()

# Parse images and labels
(train_images, train_labels) = data_train
(test_images, test_labels) = data_test

# Numpy reshape & type casting
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1).astype('float32')
train_labels = train_labels.astype('int64')
test_labels = test_labels.astype('int64')


# Normalizing the images to the range of [0., 1.]
train_images /= 255.
test_images /= 255.

print(train_images.shape, train_labels.shape)
print(test_images.shape, test_labels.shape)

(60000, 28, 28, 1) (60000,)
(10000, 28, 28, 1) (10000,)


### Define the CNN Model

In [26]:
from tensorflow.keras import Model
# Construct a tf.keras.model using tf.keras
class MyCNN(Model):
  def __init__(self):
    super(MyCNN, self).__init__()
    self.conv1 = tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='valid')
    self.conv2 = tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='valid')
    self.conv3 = tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='valid')
    self.maxpool = tf.keras.layers.MaxPooling2D((2, 2))
    self.flatten = tf.keras.layers.Flatten()
    self.dense1 = tf.keras.layers.Dense(256, activation='relu')
    self.dense2 = tf.keras.layers.Dense(10, activation='softmax')

  def call(self, x):
    x = self.conv1(x)
    x = self.maxpool(x)

    x = self.conv2(x)
    x = self.maxpool(x)

    x = self.conv3(x)
    x = self.maxpool(x)

    x = self.flatten(x)
    x = self.dense1(x)
    x = self.dense2(x)
    
    return x

# Create model
model = MyCNN()

### Setting up training
After the model is constructed, we specify optimizer and loss function. We can also monitor training using metrics:
* `optimizer`: This field specifies which optimizer to use. We can pass an optimizer instance (e.g., `tf.keras.optimizers.Adam`, `tf.keras.optimizers.RMSProp`), which are defined in  `tf.train` module.
* `loss`: The function to minimize during optimization. Common choices include `mean square error (mse)`, `[categorical|binary]_crossentropy`. Loss functions are specified by name or by passing a callable object from the `tf.keras.losses` module.
* `metrics`: Used to monitor training. We can put string names or callables defined in `tf.keras.metrics` module (e.g. `'accuracy'`)

In [27]:
# Choose loss function and optimizer for training
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()

# Metrics to measure loss and accuracy of the model
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

### Train and Test functions using `tf.function`
By annotating a train function with `tf.function`, TensorFlow internally **creates a graph so that it can benefit from graph-based execution.**

In [28]:
# Define function for training
@tf.function
def train_step(images, labels):
  with tf.GradientTape() as tape:
    predictions = model(images, training=True)
    loss = loss_fn(labels, predictions)
  gradients = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

  train_loss(loss)
  train_accuracy(labels, predictions)

# Define function for testing
@tf.function
def test_step(images, labels):
  predictions = model(images, training=False)
  loss = loss_fn(labels, predictions)

  test_loss(loss)
  test_accuracy(labels, predictions)

### Prepare the dataset and start training

In [29]:
batch_size = 128

# Prepare the dataset using tf.data
train_ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
train_ds = train_ds.shuffle(10000)
train_ds = train_ds.batch(batch_size)

test_ds = tf.data.Dataset.from_tensor_slices((test_images, test_labels))
test_ds = test_ds.batch(batch_size)



EPOCHS = 10

for epoch in range(EPOCHS):
    # Reset the metrics at each epoch
    train_loss.reset_states()
    train_accuracy.reset_states()
    test_loss.reset_states()
    test_accuracy.reset_states()

    for images, labels in train_ds:
      train_step(images, labels)

    for images, labels in test_ds:
      test_step(images, labels)

    print('Epoch: %02d' % (epoch + 1),
          'Loss = {:2.4f}'.format(train_loss.result()),
          'Train accuracy = {:2.4f}'.format(train_accuracy.result()),
          'Test loss = {:2.4f}'.format(test_loss.result()),
          'Test accuracy = {:2.4f}'.format(test_accuracy.result()))

Epoch: 01 Loss = 0.2982 Train accuracy = 0.9123 Test loss = 0.0908 Test accuracy = 0.9711
Epoch: 02 Loss = 0.0808 Train accuracy = 0.9753 Test loss = 0.0831 Test accuracy = 0.9747
Epoch: 03 Loss = 0.0575 Train accuracy = 0.9824 Test loss = 0.0505 Test accuracy = 0.9847
Epoch: 04 Loss = 0.0467 Train accuracy = 0.9857 Test loss = 0.0533 Test accuracy = 0.9836
Epoch: 05 Loss = 0.0375 Train accuracy = 0.9879 Test loss = 0.0625 Test accuracy = 0.9801
Epoch: 06 Loss = 0.0321 Train accuracy = 0.9899 Test loss = 0.0411 Test accuracy = 0.9875
Epoch: 07 Loss = 0.0260 Train accuracy = 0.9915 Test loss = 0.0536 Test accuracy = 0.9845
Epoch: 08 Loss = 0.0232 Train accuracy = 0.9928 Test loss = 0.0385 Test accuracy = 0.9887
Epoch: 09 Loss = 0.0183 Train accuracy = 0.9940 Test loss = 0.0530 Test accuracy = 0.9850
Epoch: 10 Loss = 0.0188 Train accuracy = 0.9940 Test loss = 0.0437 Test accuracy = 0.9884


## More simplified process using Keras API
Keras API provides much simpler version to define a model and train a model.

### Defining a model
Let's take a look how we can define a model using Keras API.

In [30]:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models

# Let's build a stack of *sequential* layers, which is
# the most common form of neural network graphs.
model = models.Sequential()

# Adds a reshaping layer that transforms (28, 28, 1) to (784,)
model.add(layers.Reshape((784,), input_shape=(28, 28, 1)))

# Adds a dense layer with 128 units to the model
model.add(layers.Dense(units=128, activation='relu'))

# Adds another layer, which has L2 regularization applied to the kernel matrix
model.add(layers.Dense(units=64, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))

# Adds a dense layer with 10 output units
model.add(layers.Dense(units=10, activation='linear'))

### Setting up training
After the model is constructed, `compile` method configures how to learn the model, by specifying optimizer, loss function and metrics.

In [31]:
model.compile(optimizer=tf.keras.optimizers.RMSprop(0.001),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

### Training a model
We can train the model using the `fit` method and then the model is "fit" to the training data. We can specify the training data to use (`images_train` and `labels_train`), how many epochs we will run (`epochs`), and how many items to be processed in a batch (`batch_size`).

In [32]:
model.fit(train_images, train_labels, epochs=10, batch_size=128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f2ac85c0220>

### Evaluating the model
Finally, we evaluate the trained model using test dataset.

In [33]:
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('Test accuracy:', test_acc)

313/313 - 1s - loss: 0.1095 - accuracy: 0.9777 - 692ms/epoch - 2ms/step
Test accuracy: 0.9776999950408936


### **Quiz**
First, define a multi-layer model using Keras API following the CNN model defined in the beginning.

The model should contain at least 3 convolutional layers, 2 max pooling layers, and 1 dense layer.

The test accuracy after training should be higher than 99%.

In [16]:
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import models

############# Write here. #############
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='valid', input_shape=(28,28,1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='valid'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='valid'))
model.add(layers.Flatten())
model.add(layers.Dense(units=128, activation='softmax'))

# Create model

#######################################

Using the model and `(train_images, train_labels)` above, let's train the model using the following configuration:
* optimizer: `tf.keras.optimizers.Adam`
* learning rate: 0.001
* loss: `SparseCategoricalCrossentropy`
* metrics: `accuracy`
* batch size: 128
* epochs: 10

In [10]:
train_images.shape

(60000, 28, 28, 1)

In [17]:
############# Write here. #############
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=10, batch_size=128)
#######################################

test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
print('Test accuracy:', test_acc)

Epoch 1/10


  output, from_logits = _get_logits(


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
313/313 - 1s - loss: 0.0274 - accuracy: 0.9909 - 1s/epoch - 4ms/step
Test accuracy: 0.9908999800682068


## Wrap-up

So far, we have learned how we can define and train models in TensorFlow. For more information you can refer to [guides in TensorFlow official website](https://www.tensorflow.org/guide) and many other blog posts.