Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to change optimizer and lr scheduler in the middle of training? #3095

Closed
abecciu opened this issue Aug 21, 2020 · 14 comments
Closed

How to change optimizer and lr scheduler in the middle of training? #3095

abecciu opened this issue Aug 21, 2020 · 14 comments
Labels
question Further information is requested

Comments

@abecciu
Copy link

abecciu commented Aug 21, 2020

What is your question?

I need to train a model with a pre-trained backbone. For the first 10 epochs, I want to have the backbone completely frozen (ie. not touched by the optimizer). After epoch 10, I want to start training certain layers of the backbone. In regular pytorch, I would instantiate a new optimizer adding the backbone params that I want to train. Then I'd swap both optimizer and lr_scheduler.

What's the recommended way to do something like this in PL?

@abecciu abecciu added the question Further information is requested label Aug 21, 2020
@abecciu abecciu changed the title How to change optimizer and lr scheduler in the middle of training How to change optimizer and lr scheduler in the middle of training? Aug 21, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@rohitgr7
Copy link
Contributor

I would suggest doing trainer.fit twice:

class PreFineLightningModule(pl.LightningModule):

	def __init__(self, mode='pre_train'):
		super().__init__()

		self.mode = mode
		# define other params

	def _configure_optim_backbone(self):
		# return optimizers and schedulers for pre-training
		optimizer = # pre-train optimizer
		scheduler = # pre-train scheduler
		return [optimizer], [optimizer]

	def _configure_optim_finetune(self):
		# return optimizers and scheduler for fine-tine
		optimizer = # fine-tune optimizer
		scheduler = # fine-tune scheduler
		return [optimizer], [scheduler]

	def configure_optimizers(self):
		if self.mode == 'pre_train':
			return self._configure_optim_backbone()
		elif self.mode == 'fine_tune':
			return self._configure_optim_finetune()


total_epochs=15 # can be anything

# Pre Train
model = PreFineLightningModule(mode='pre_train')
freeze_backbone(model)
trainer = Trainer(max_epochs=10)
trainer.fit(model)

# Fine Tune
model = PreFineLightningModule.load_from_checkpoint(saved_model_checkpoint, mode='fine_tune')
unfreeze_layers(model)
trainer = Trainer(max_epochs=total_epochs-10)
trainer.fit(model)

@abecciu
Copy link
Author

abecciu commented Aug 22, 2020

Thanks so much for your response! I was hoping to avoid that since it would force me to write a lot more boilerplate code in my project to keep the whole training process automated, configurable, and model agnostic.

Why doesn't PL provide an api to replace instances of optimizers and schedulers? Have you guys considered that?

@rohitgr7
Copy link
Contributor

yeah, you can do that with a callback too using trainer.init_optimizers(new_optimizers, schedulers) but need to take care of 16-bit precision too. I guess pl can have trainer method replace_optimizers or setup_optimizers that take any optimizers and schedulers new/old and do all the stuff to make this process even better. What do you think @awaelchli ??

@abecciu
Copy link
Author

abecciu commented Aug 23, 2020

What would be ideal for me is to be able to replace optimizer and scheduler from within the LightningModule class that's being used for training. Would it be possible to achieve this from an existing callback like optimizer_step by simply returning new instances of optimizer and scheduler?

@rohitgr7
Copy link
Contributor

rohitgr7 commented Aug 23, 2020

you can try the on_train_epoch_start method in the callback and reconfigure the optimizers and schedulers the way you want. Here are some links that can help:

set:

class SomeCallback(Callback):
    def on_train_epoch_start(self, trainer. pl_module):
        if trainer.current_epoch == 10:
            trainer.optimizers = [new_optimizers]
            trainer.lr_schedulers = trainer.configure_schedulers([new_schedulers])
            trainer.optimizer_frequencies = [] # or optimizers frequencies if you have any

trainer = Trainer(callbacks=[SomeCallback()], ...)

I think this should work.

@abecciu
Copy link
Author

abecciu commented Aug 23, 2020

Thanks so much! This looks very promising, I couldn't tell from the docs that the on_train_epoch_start receives trainer and pl_module as params. Seems like a great solution.

I'll try this out and report back.

@rohitgr7
Copy link
Contributor

fixed the link

@abecciu
Copy link
Author

abecciu commented Sep 5, 2020

This solution worked great for me. It's important to note that the on_train_epoch_start callback does not exist in older versions of PL.

Thanks again @rohitgr7 for the help!

@abecciu abecciu closed this as completed Sep 5, 2020
@icedpanda
Copy link

icedpanda commented Jan 12, 2022

Hi @rohitgr7, how to change optimizer and LR scheduler if I am using ReduceLROnPlateau? I am not sure how to set monitor, interval, etc while using ReduceLROnPlateau

This is my sample configure_optimizers for init :

    def configure_optimizers(self):
        optimizer = optim.Adam(
            filter(
                lambda p: p.requires_grad,
                self.net.parameters()),
            lr=self.lr,
            weight_decay=self.weight_decay)
        scheduler = optim.lr_scheduler.ReduceLROnPlateau(
            optimizer,
            mode="min",
            factor=0.1,
            patience=self.scheduler_patience,
            cooldown=3,
        )
        return {
            "optimizer": optimizer,
            "lr_scheduler": {
                "scheduler": scheduler,
                "monitor": "val/loss",
                "interval": "epoch",
                "frequency": 1,
            },
        }

However, how to set this monitor, interval while using on_train_start in the middle of training?

    def on_train_start(self, trainer, pl_module) -> None:

        if trainer.current_epoch == self._unfreeze_at_epoch:
            print("unfreeze and add param group...")
            pl_module.net.freeze_backbone(False)
            new_optimizer = optim.Adam(
                filter(
                    lambda p: p.requires_grad,
                    pl_module.net.parameters()),
                lr=pl_module.lr,
                weight_decay=pl_module.weight_decay)
            new_schedulers = optim.lr_scheduler.ReduceLROnPlateau(
                new_optimizer,
                mode="min",
                factor=0.1,
                patience=pl_module.scheduler_patience,
                cooldown=3,
            )
            trainer.optimizers = [new_optimizer]
            trainer.lr_schedulers = [new_schedulers]
            trainer.optimizer_frequencies =1

Thanks in advance

@lxxue
Copy link

lxxue commented Apr 10, 2022

Hi @rohitgr7, how to change optimizer and LR scheduler if I am using ReduceLROnPlateau? I am not sure how to set monitor, interval, etc while using ReduceLROnPlateau

This is my sample configure_optimizers for init :

    def configure_optimizers(self):
        optimizer = optim.Adam(
            filter(
                lambda p: p.requires_grad,
                self.net.parameters()),
            lr=self.lr,
            weight_decay=self.weight_decay)
        scheduler = optim.lr_scheduler.ReduceLROnPlateau(
            optimizer,
            mode="min",
            factor=0.1,
            patience=self.scheduler_patience,
            cooldown=3,
        )
        return {
            "optimizer": optimizer,
            "lr_scheduler": {
                "scheduler": scheduler,
                "monitor": "val/loss",
                "interval": "epoch",
                "frequency": 1,
            },
        }

However, how to set this monitor, interval while using on_train_start in the middle of training?

    def on_train_start(self, trainer, pl_module) -> None:

        if trainer.current_epoch == self._unfreeze_at_epoch:
            print("unfreeze and add param group...")
            pl_module.net.freeze_backbone(False)
            new_optimizer = optim.Adam(
                filter(
                    lambda p: p.requires_grad,
                    pl_module.net.parameters()),
                lr=pl_module.lr,
                weight_decay=pl_module.weight_decay)
            new_schedulers = optim.lr_scheduler.ReduceLROnPlateau(
                new_optimizer,
                mode="min",
                factor=0.1,
                patience=pl_module.scheduler_patience,
                cooldown=3,
            )
            trainer.optimizers = [new_optimizer]
            trainer.lr_schedulers = [new_schedulers]
            trainer.optimizer_frequencies =1

Thanks in advance

I just tried the callback solution and found that you cannot set

trainer.lr_schedulers = [new_schedulers]

but need to do this

trainer.lr_schedulers = trainer._configure_schedulers(
                    schedulers, monitor=None, is_manual_optimization=False)

to properly set the schedulers with monitor and add other default configurations

@wangm23456
Copy link

What should I do in 2.0.2 ? Why we can't have a function like configure_optimizers()?

@wangm23456
Copy link

What should I do in 2.0.2 ? Why we can't have a function like configure_optimizers()?在2.0.2中我应该做什么?为什么我们不能有一个像configure_optimizers()这样的函数?

    def setup_optimizers(self, trainer: "pl.Trainer") -> None:
        """Creates optimizers and schedulers.

        Args:
            trainer: the Trainer, these optimizers should be connected to
        """
        if trainer.state.fn != TrainerFn.FITTING:
            return
        assert self.lightning_module is not None
        self.optimizers, self.lr_scheduler_configs = _init_optimizers_and_lr_schedulers(self.lightning_module)

    def setup(self, trainer: "pl.Trainer") -> None:
        """Setup plugins for the trainer fit and creates optimizers.

        Args:
            trainer: the trainer instance
        """
        assert self.accelerator is not None
        self.accelerator.setup(trainer)
        self.setup_optimizers(trainer)
        self.setup_precision_plugin()
        _optimizers_to_device(self.optimizers, self.root_device)

So, just self.trainer.strategy.setup(self.trainer)?

@BrunoBelucci
Copy link

You could use only the setup_optimizers method in the strategy to not mess up with the other configurations, so I would say maybe trainer.strategy.setup_optimizers(trainer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants