Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly adjust LR with LAMP #19

Open
agemagician opened this issue Jul 11, 2019 · 1 comment
Open

Correctly adjust LR with LAMP #19

agemagician opened this issue Jul 11, 2019 · 1 comment

Comments

@agemagician
Copy link

agemagician commented Jul 11, 2019

Hello,

I have a question regarding adjusting learning rate with LAMP.

In your case you have a fixed learning rate which is "0.000125", and then you divide or multiple some numbers to get the correct base learning rate depend on the number of GPUs:

one_machine = 'base_lr': 0.000125 * 5 / 3 = 0.00020833333
sixteen_machines = 'base_lr': 0.000125 / 4 = 0.00003125

Then you apply another equation to get the final learning rate:

BASE_LR_BATCHSIZE = 32
total_gpus = num_gpus_per_machine * config.machines
global_batch_size = config.batch_size * total_gpus

# linear LR scaling (https://arxiv.org/abs/1706.02677)
lr = config.base_lr * (global_batch_size / BASE_LR_BATCHSIZE)

This means using 16x nodes at amazon we will get a bigger batch size and bigger learning rate:
0.00020833333 * (96 * 16 * 8 / 32) = 0.07999999872
While a single node at amazon will get a smaller batch size and smaller learning rate:
0.00003125 * (96 * 1 * 8 / 32) = 0.00075

My questions are:

  1. Why the BASE_LR_BATCHSIZE is 32 and not 96 ?
  2. If I want to train the model for x number of nodes using y batch size per GPU, how I can determine the correct base_lr ?

Thanks a lot.

@yaroslavvb
Copy link

  1. We've observed successful convergence at batch size 32 and LR 0.000125 so we made everything relative to that batch size
  2. Base-LR is relative to BASE_LR_BATCHSIZE, so just apply linear scaling to get proper learning rate for that base size, and that'll be your base lr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants