# Custom Training
We can easily train our model following three procedures
1. compute loss
2. backpropagate
3. apply gradient

## 1. computing loss
tensorflow provide wide range of loss functions, such as cross entropy and mean squared error, in keras.losses package. We can compute loss by passing predicted values and ground-truth to the loss function. 


In [1]:
import tensorflow as tf
from tensorflow import keras



<module 'tensorflow_core.keras.losses' from '/home/bj1123/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/api/_v2/keras/losses/__init__.py'>

In [4]:
class SimpleModel(keras.Model):
    def __init__(self, n_class):
        super(SimpleModel,self).__init__()
        self.output_layer = keras.layers.Dense(n_class)
    
    def call(self,x):
        return self.output_layer(x)


In [20]:
model = SimpleModel(5)
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True) 
# from_logits argument indicates that whether or not the output of the model is converted to probability of target.
# if from_logits is True, outout of the model is not converted to probability.
with tf.GradientTape() as tape:
    # we have to compute loss in tf.GradientTape for computing gradients in backpropagation 
    x = tf.random.normal((32,100))
    y = tf.random.uniform((32,),0,5,dtype=tf.int64)

    y_predicted = model(x)
    computed_loss = loss(y_true=y, y_pred=y_predicted)

## 2. Backpropagation
We can compute the gradient using backpropagation in one line of code

In [21]:
step_grad = tape.gradient(computed_loss, model.trainable_variables)
# gradients of all trainable variables in the model according to computed loss is computed

## 3. Apply gradients
To apply gradients, we must have an optimizer. Tensorflow provide most well-known optimizers in keras.optimizer module. All we have to do is call apply_gradients method


In [23]:
optimizer = keras.optimizers.Adam(0.001) #learning rate
optimizer.apply_gradients(zip(step_grad, model.trainable_variables))

<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=1>

## 4. Training Epoch
These procedures can be applied for the entire datasets

In [82]:
X = tf.random.normal((10000,100))
Y = tf.random.uniform((10000,),0,5,dtype=tf.int64)

data = tf.data.Dataset.from_tensor_slices((X,Y))

In [83]:
batchfier = data.batch(10)

In [84]:
def train_epoch(batchfier,model,optimizer):
    pbar = tf.keras.utils.Progbar(1000)
    pbar_cnt = 0
    for x,y in batchfier:
        pbar_cnt+=1
        with tf.GradientTape() as tape:
            y_predicted = model(x)
            computed_loss = loss(y_true=y, y_pred=y_predicted)
        step_grad = tape.gradient(computed_loss, model.trainable_variables)
        optimizer.apply_gradients(zip(step_grad, model.trainable_variables))
        pbar.update(pbar_cnt, [['loss',tf.reduce_mean(computed_loss)]])

In [85]:
train_epoch(batchfier,model,optimizer)

