8.6 Adjusting the learning rate

Slides

One of the most important hyperparameters of deep learning models is the learning rate. It is a tuning parameter in an optimization function that determines the step size (how big or small) at each iteration while moving toward a mininum of a loss function.

We can experiement with different learning rates to find the optimal value where the model has best results. In order to try different learning rates, we should define a function to create a function first, for instance:

# Function to create model
def make_model(learning_rate=0.01):
    base_model = Xception(weights='imagenet',
                          include_top=False,
                          input_shape=(150,150,3))

    base_model.trainable = False
    
    #########################################
    
    inputs = keras.Input(shape=(150,150,3))
    base = base_model(inputs, training=False)
    vectors = keras.layers.GlobalAveragePooling2D()(base)
    outputs = keras.layers.Dense(10)(vectors)
    model = keras.Model(inputs, outputs)
    
    #########################################
    
    optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
    loss = keras.losses.CategoricalCrossentropy(from_logits=True)

    # Compile the model
    model.compile(optimizer=optimizer,
                  loss=loss,
                  metrics=['accuracy'])
    
    return model

Next, we can loop over on the list of learning rates:

# Dictionary to store history with different learning rates
scores = {}

# List of learning rates
lrs = [0.0001, 0.001, 0.01, 0.1]

for lr in lrs:
    print(lr)
    
    model = make_model(learning_rate=lr)
    history = model.fit(train_ds, epochs=10, validation_data=val_ds)
    scores[lr] = history.history
    
    print()
    print()

Visualizing the training and validation accuracies help us to determine which learning rate value is the best for for the model. One typical way to determine the best value is by looking at the gap between training and validation accuracy. The smaller gap indicates the optimal value of the learning rate.

Notes

Add notes from the video (PRs are welcome)

learning rate analogy: the speed of reading a book
reading fast (skimming thus missing details) vs reading slow (not much progress and leaving out books)
finding the optimal learning rate

⚠️	The notes are written by the community. If you see an error here, please create a PR with a fix.

Notes from Peter Ernicke

Navigation

Machine Learning Zoomcamp course
Session 8: Neural Networks and Deep Learning
Previous: Tranfser learning
Next: Checkpointing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

06-learning-rate.md

06-learning-rate.md

8.6 Adjusting the learning rate

Notes

Navigation

Files

06-learning-rate.md

Latest commit

History

06-learning-rate.md

File metadata and controls

8.6 Adjusting the learning rate

Notes

Navigation