lr_warmup should not be passed when adafactor is used as the optimizer #617

martianunlimited · 2023-04-13T20:39:06Z

It's not really a major bug, more of an inconvenience, but the gui passes lr_warmup to the training command when it is non-zero, causing library/train_util.py to raise a ValueError if the optimizer is AdaFactor. Error is raised mid-way after latents are cached, just before dataloader is created instead of the start of the calling train_network, wasting time, and making it harder to pick out what caused the error for users who are not used to reading stack traces.

ValueError: adafactor:0.0001 does not require num_warmup_steps. Set None or 0.

Suggested fix in the order of preference:
a) Pop-up a warning in the gui if user did not set lr_warmup to 0 when optimizer is set to AdaFactor (recommended)
or
b) Raise error at train_network.py when invalid combination of optimizer and lr_warmup is used.
c) Change train_util.py to raise a warning and ignore value lr_warmup (not-recommended)

Unless there is differing opinions as to why a) and b) is not the way to go, I will go ahead and send a push request for a) and b) over the weekend with said "fix".

The text was updated successfully, but these errors were encountered:

bmaltais · 2023-04-13T23:08:20Z

I can't change train_network.py because it is maintained by kohya in his repo. I can implement option a easilly enough ;-)

bmaltais · 2023-04-13T23:43:00Z

The dev branch now has the fix

danielaixer · 2023-12-20T19:58:49Z

Does setting "LR warmup (% of steps)" to "0" act as a workaround?

bmaltais closed this as completed Apr 13, 2023

bmaltais mentioned this issue Apr 18, 2023

v21.5.4 #630

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lr_warmup should not be passed when adafactor is used as the optimizer #617

lr_warmup should not be passed when adafactor is used as the optimizer #617

martianunlimited commented Apr 13, 2023

bmaltais commented Apr 13, 2023

bmaltais commented Apr 13, 2023

danielaixer commented Dec 20, 2023

lr_warmup should not be passed when adafactor is used as the optimizer #617

lr_warmup should not be passed when adafactor is used as the optimizer #617

Comments

martianunlimited commented Apr 13, 2023

bmaltais commented Apr 13, 2023

bmaltais commented Apr 13, 2023

danielaixer commented Dec 20, 2023