<a href="https://colab.research.google.com/github/SAIKUMAR918/Deep_Learning_Projects/blob/main/trainable_Vs_NonTrainable_Parameters.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Trainable Vs NonTrainable Parameters

In Keras, non-trainable parameters are the ones that are not trained using gradient descent.
This is also controlled by the trainable parameter in each layer, for example:



### Model-1

In [1]:
from keras.layers import *
from keras.models import *
model = Sequential()
model.add(Dense(10, trainable=False, input_shape=(100,)))
model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 10)                1010      
                                                                 
Total params: 1010 (3.95 KB)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 1010 (3.95 KB)
_________________________________________________________________


Now if you set the layer as trainable with model.layers[0].trainable = True
then it prints:



In [2]:
model.layers[0].trainable = True
model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 10)                1010      
                                                                 
Total params: 1010 (3.95 KB)
Trainable params: 1010 (3.95 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


### Model-2

In [3]:
model = Sequential()
model.add(Dense(10, trainable=True, input_shape=(100,)))
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_1 (Dense)             (None, 10)                1010      
                                                                 
Total params: 1010 (3.95 KB)
Trainable params: 1010 (3.95 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


Now all parameters are trainable and there are zero non-trainable parameters. But there are also layers that have both trainable and non-trainable parameters, one example is the BatchNormalization layer, where the mean and standard deviation of the activations is stored for use while test time. One example:



This specific case of BatchNormalization has 40 parameters in total, 20 trainable, and 20 non-trainable. The 20 non-trainable parameters correspond to the computed mean and standard deviation of the activations that is used during test time, and these parameters will never be trainable using gradient descent, and are not affected by the trainable flag.



### Model-3

In [4]:
model.add(BatchNormalization())
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_1 (Dense)             (None, 10)                1010      
                                                                 
 batch_normalization (Batch  (None, 10)                40        
 Normalization)                                                  
                                                                 
Total params: 1050 (4.10 KB)
Trainable params: 1030 (4.02 KB)
Non-trainable params: 20 (80.00 Byte)
_________________________________________________________________


### Summary

*To sum-up:* 'trainable parameters' are those which value is modified according to their gradient (the derivative of the error/loss/cost relative to the parameter), whereas 'non-trainable parameters' are those which value is not optimized according to their gradient.