Skip to content

Latest commit

 

History

History
50 lines (38 loc) · 3.36 KB

File metadata and controls

50 lines (38 loc) · 3.36 KB

Advanced-optimizer-with-Gradient-Centralization

Advanced optimizer with Gradient-Centralization Please Refer to

Introduction

We embed GC into some advanced DNN optimizers, including SGD.py, Adam.py, AdamW, RAdam,Lookahead+SGD.py, Lookahead+Adam.py, Ranger.

There are three hyper-parameters use_gc, gc_conv_only and gc_loc. use_gc=True means that the algorithm adds GC operation, otherwise, not. gc_conv_only=True means the algorithm only adds GC operation for Conv layer, otherwise, for both Conv and FC layer. gc_loc controls the location of GC operation for adaptive learning rate algorithms, including Adam, Radam, Ranger and so on. There are two locations in the algorithm to add GC operation for original gradient and generalized gradient, respectively. Generalized gradient is the variable which is directly used to update the weight. For adaptive learning rate algorithms, we suggest gc_loc=False. For SGD, these two locations for GC are equivalent, so we do not introduce the hyper-parameter gc_loc.

We also give an example of how to use these algorithms in Cifar. For example:

# SGD
optimizer = SGD(net.parameters(), lr=args.lr, momentum=0.9,weight_decay = args.weight_decay,use_gc=True, gc_conv_only=False) 
# Adam
optimizer = Adam(net.parameters(), lr=args.lr, weight_decay = args.weight_decay,use_gc=True, gc_conv_only=False,gc_loc=False) 
# RAdam
optimizer = RAdam(net.parameters(), lr=args.lr, weight_decay = args.weight_decay,use_gc=True, gc_conv_only=False,gc_loc=False)
# lookahead+SGD
base_opt = SGD(net.parameters(), lr=args.lr, momentum=0.9,weight_decay = args.weight_decay,use_gc=False, gc_conv_only=False)
optimizer = Lookahead(base_opt, k=5, alpha=0.5)
# Ranger
optimizer = Ranger(net.parameters(), lr=args.lr, weight_decay = args.weight_decay,use_gc=True, gc_conv_only=False,gc_loc=False)

References: