-
Notifications
You must be signed in to change notification settings - Fork 122
Closed
Description
Hey! I love this repo, thanks for making it 💯
Everything works well except for one thing, after some digging around/experimenting, here's what I've found:
Below are some figures for the training loss and training accuracy (on MNIST, using a resnet18).
Problem:
- Using LRFinder on a model, and then training with it afterwards appears to hurt the models learning (see pink curve below).
Solution:
- Using LRFinder on a model, and manually restoring the weights, appears to train the model optimally. (see green curve below).
- Using LRFinder on a clone of the model, and then using the original model for training, appears to train the model optimally. (see green curve below).
Regarding the figure/graphs below, both models used the same hyperparameters.
An in-code example of option 1) would be similar to what was given in the README.md:
from torch_lr_finder import LRFinder
model = ...
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()
// Then use "model" for training
An in-code example of option 3) would be:
from torch_lr_finder import LRFinder
model = ...
temp_model = *create model with same architecture*
// copy weights over
temp_model.load_state_dict(model.state_dict)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
// use temp model in lr_finder
lr_finder = LRFinder(temp_model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()
Metadata
Metadata
Assignees
Labels
No labels
