Allow `extra_epochs` flag in `Trainer.fit` to control finetuning time #13273

franchesoni · 2022-06-12T10:10:46Z

🚀 Feature

Trainer(max_epochs=100).fit(model, train_dl, ckpt_path=ckpt_path, extra_epochs=True) would finetune for 100 epochs

Motivation

Finetuning for N epochs requires knowing the previous number of epochs M and setting Trainer(max_epochs=M+N). Google did not tell me how to achieve this.

Pitch

Finetuning training time or number of epochs should be configurable.

Alternatives

Setting many epochs and manually stopping

Additional context

It would be cool with max_time too. I hope this is already solved and this issue is unnecessary.

cc @justusschock @kaushikb11 @awaelchli @Borda @rohitgr7

The text was updated successfully, but these errors were encountered:

carmocca · 2022-06-14T19:42:12Z

You accomplish this by doing:

trainer.fit_loop.max_epochs += 100

before trainer.fit() is called

franchesoni · 2022-06-19T10:11:56Z

If this worked, it would be very counter intuitive, because the current number of epochs is known only after calling .fit(..., ckpt_path=ckpt_path)
I think you are assuming I'm loading the model first, which is not the case, as I'm using the ckpt_path argument

When trying your solution

trainer = pl.Trainer(**trainer_params)
trainer.fit_loop.max_epochs += 2
trainer.fit(model, train_dl, val_dl, ckpt_path=best_ckpt)

I don't find the desired behavior

cc @justusschock @kaushikb11 @awaelchli @Borda @rohitgr7

franchesoni · 2022-07-03T21:10:54Z

Hello, any further news or alternative answer?

awaelchli · 2022-07-24T22:48:37Z

@franchesoni If I understand correctly, you are saying this is not an option for you?

model = Model.load_from_checkpoint("path/to/pretrained/checkpoint.ckpt")
trainer = pl.Trainer(**trainer_params, max_epochs=N) 
trainer.fit(model, train_dl, val_dl)

I assume because you want some parts of the trainer state restored from the checkpoint, e.g. optimizer state, but not the full loop state.

Then I think this is just an other version of this request #5339 to be able to control what is getting restored. I think this is something we need to start adding to the roadmap and think hard about.

carmocca · 2022-07-26T13:45:59Z

There are 2 potential solutions:

Pre-load the checkpoint manually

ckpt = torch.load(...)
current_epoch = ckpt["current_epoch"]
trainer = Trainer(max_epochs=current_epoch + N)

An issue with this method is that it loads the fully checkpoint just for this change. This relates to #5339 and #12712

Extract the state from the checkpoint in on_load_checkpoint and modify the Trainer's max_epochs. This requires editing the LightningModule hook to do this or creating a Callback just for it.

stale · 2023-04-15T19:23:55Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!

JIAOJIAYUASD · 2023-07-26T08:42:08Z

There are 2 potential solutions:

Pre-load the checkpoint manually
ckpt = torch.load(...)
current_epoch = ckpt["current_epoch"]
trainer = Trainer(max_epochs=current_epoch + N)
An issue with this method is that it loads the fully checkpoint just for this change. This relates to #5339 and #12712

Extract the state from the checkpoint in on_load_checkpoint and modify the Trainer's max_epochs. This requires editing the LightningModule hook to do this or creating a Callback just for it.

Hello, I got a image inpainting project Paint-by-Example implemented in pytorch_lightning. I want to finetune the stable diffusion model using LoRA, but I can't find the model definition and don't know how to add lora finetuning process to the project. Can you give me some advice?

franchesoni added the needs triage Waiting to be triaged by maintainers label Jun 12, 2022

carmocca closed this as completed Jun 14, 2022

carmocca added question Further information is requested trainer: argument and removed needs triage Waiting to be triaged by maintainers labels Jun 14, 2022

carmocca reopened this Jun 22, 2022

stale bot added the won't fix This will not be worked on label Apr 15, 2023

stale bot removed the won't fix This will not be worked on label Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow `extra_epochs` flag in `Trainer.fit` to control finetuning time #13273

Allow `extra_epochs` flag in `Trainer.fit` to control finetuning time #13273

franchesoni commented Jun 12, 2022 •

edited by github-actions bot

Loading

carmocca commented Jun 14, 2022

franchesoni commented Jun 19, 2022 •

edited

Loading

franchesoni commented Jul 3, 2022

awaelchli commented Jul 24, 2022 •

edited

Loading

carmocca commented Jul 26, 2022

stale bot commented Apr 15, 2023

JIAOJIAYUASD commented Jul 26, 2023 •

edited

Loading

Allow extra_epochs flag in Trainer.fit to control finetuning time #13273

Allow extra_epochs flag in Trainer.fit to control finetuning time #13273

Comments

franchesoni commented Jun 12, 2022 • edited by github-actions bot Loading

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

carmocca commented Jun 14, 2022

franchesoni commented Jun 19, 2022 • edited Loading

franchesoni commented Jul 3, 2022

awaelchli commented Jul 24, 2022 • edited Loading

carmocca commented Jul 26, 2022

stale bot commented Apr 15, 2023

JIAOJIAYUASD commented Jul 26, 2023 • edited Loading

Allow `extra_epochs` flag in `Trainer.fit` to control finetuning time #13273

Allow `extra_epochs` flag in `Trainer.fit` to control finetuning time #13273

franchesoni commented Jun 12, 2022 •

edited by github-actions bot

Loading

franchesoni commented Jun 19, 2022 •

edited

Loading

awaelchli commented Jul 24, 2022 •

edited

Loading

JIAOJIAYUASD commented Jul 26, 2023 •

edited

Loading