Correctly adjust LR with LAMP #19

agemagician · 2019-07-11T09:24:14Z

Hello,

I have a question regarding adjusting learning rate with LAMP.

In your case you have a fixed learning rate which is "0.000125", and then you divide or multiple some numbers to get the correct base learning rate depend on the number of GPUs:

one_machine = 'base_lr': 0.000125 * 5 / 3 = 0.00020833333
sixteen_machines = 'base_lr': 0.000125 / 4 = 0.00003125

Then you apply another equation to get the final learning rate:

BASE_LR_BATCHSIZE = 32
total_gpus = num_gpus_per_machine * config.machines
global_batch_size = config.batch_size * total_gpus

# linear LR scaling (https://arxiv.org/abs/1706.02677)
lr = config.base_lr * (global_batch_size / BASE_LR_BATCHSIZE)

This means using 16x nodes at amazon we will get a bigger batch size and bigger learning rate:
0.00020833333 * (96 * 16 * 8 / 32) = 0.07999999872
While a single node at amazon will get a smaller batch size and smaller learning rate:
0.00003125 * (96 * 1 * 8 / 32) = 0.00075

My questions are:

Why the BASE_LR_BATCHSIZE is 32 and not 96 ?
If I want to train the model for x number of nodes using y batch size per GPU, how I can determine the correct base_lr ?

Thanks a lot.

The text was updated successfully, but these errors were encountered:

yaroslavvb · 2019-07-11T14:43:58Z

We've observed successful convergence at batch size 32 and LR 0.000125 so we made everything relative to that batch size
Base-LR is relative to BASE_LR_BATCHSIZE, so just apply linear scaling to get proper learning rate for that base size, and that'll be your base lr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly adjust LR with LAMP #19

Correctly adjust LR with LAMP #19

agemagician commented Jul 11, 2019 •

edited

Loading

yaroslavvb commented Jul 11, 2019

Correctly adjust LR with LAMP #19

Correctly adjust LR with LAMP #19

Comments

agemagician commented Jul 11, 2019 • edited Loading

yaroslavvb commented Jul 11, 2019

agemagician commented Jul 11, 2019 •

edited

Loading