New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plotting training and validation loss #122
Comments
Losses are only computed on the training set, not the validation set. The reason is that for pair/triplet based losses, it doesn't really make sense to compute a single value for the entire set, unless you're willing to form every possible pair/triplet. Plus, some loss values don't correlate with accuracy at all. So everything returned by To compare train/val performance, you can get the tester to compute accuracy for both the training set and validation set, by changing the dataset_dict = {"train": train_dataset, "val": val_dataset}
model_folder = "example_saved_models"
tester = testers.GlobalEmbeddingSpaceTester(end_of_testing_hook=hooks.end_of_testing_hook)
end_of_epoch_hook = hooks.end_of_epoch_hook(tester, dataset_dict, model_folder) And then use
By default, only the primary metric will be returned by If you want to make a plot at the end of every epoch, you should be able to do something like this:
|
Closing for now. Please reopen if you have more questions. |
I am curious about the value of x-axis? I am sure it not the number of epochs, but it also seem not number of iteration. I set batch_size = 32 in trainer, and my train data has 458 images --> there will be 15 batches for each epoch |
Actually it will be
The x-axis should be number of iterations, so it should have length 14*25 = 350. It looks like the plot stops short of 350, but have you manually checked the length to be sure? print(len(loss_history["metric_loss"]))
That functionality doesn't exist currently. As a workaround you could try taking the average of each 14-element chunk in
The "best" model is based on the Or you can change the primary metric to the one that you're plotting, which is AMI. Then the best model will be chosen based on AMI. hook = logging_presets.get_hook_container(record_keeper, primary_metric="AMI") |
Hi, thank you for your reply.
|
Your first post actually shows training went for 26 epochs (not 25). If it ran for 338 iterations, then there were 13 batches per epoch, which implies a training size between 416 and 447. Now you've trained for 50 epochs. If it ran for 1350 iterations, then there were 27 batches per epoch, which implies a training size between 432 and 447. Can you double check the training set size, and also show the code where you initialize/run the hooks, tester, and trainer? Also a screenshot of the training logs would be helpful because it should have progress bars that indicate the number of iterations per epoch. |
Can you show the part where you initialize the sampler? If you're using MPerClassSampler, then what's probably happening is that it is reducing the epoch size to ensure the list is a multiple of |
Yes, it's just that the length of the iterable returned by MPerClassSampler gets truncated such that it is a multiple of pytorch-metric-learning/src/pytorch_metric_learning/samplers/m_per_class_sampler.py Lines 23 to 27 in 52bb21a
So that's why the number of iterations per epoch is slightly less than expected. |
Thanks to make clear. Very helpful. |
Sorry to bother you again.
How can I calculate the number 27 automatically? given my |
|
Hi @KevinMusgrave , Can you please guide me or provide a code on how to plot the training and validation loss together? Thanks, |
Thanks to your response @KevinMusgrave, How can I access both training losses (train and validation) and track them during training in the tensorboard? |
Validation loss isn't computed because there is no clear definition for it with tuple-based losses. For example, if you're using triplet loss, you wouldn't want to use all triplets (there will be too many). You could try a random sampling of triplets and compute the loss based on that, but that functionality isn't built in to this library. |
OK thanks @KevinMusgrave, can you guide me please on how to do this? |
You can loop through the validation set: loss = 0
for data, labels in val_loader:
embeddings = model(data)
loss += loss_fn(embeddings, labels)
loss /= len(val_loader)
It's not included in this library because the loss value is often meaningless, i.e. triplet loss can be flat from the beginning to end of training, but validation accuracy can still go up. |
If you're using the trainers and hooks you can do this: import tqdm
def end_of_testing_hook(tester):
for split, (embeddings, labels) in tester.embeddings_and_labels.items():
dataset = common_functions.EmbeddingDataset(embeddings.cpu().numpy(), labels.squeeze(1).cpu().numpy())
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, num_workers=1)
total_loss = 0
with torch.no_grad():
print(f"getting loss for {split} set")
for E, L in tqdm.tqdm(dataloader):
total_loss += loss(E, L)
total_loss /= len(dataloader)
tester.all_accuracies[split]["loss"] = total_loss
hooks.end_of_testing_hook(tester)
# Create the tester
tester = testers.GlobalEmbeddingSpaceTester(end_of_testing_hook = end_of_testing_hook,
dataloader_num_workers = 2,
accuracy_calculator=AccuracyCalculator(k="max_bin_count")) |
Thanks @KevinMusgrave, it works! |
I understand the point that with ConstrativeLoss, or TripleLoss, the input will be a pair / or a triple images, so maybe no clear definition of how to calculate it. But how about the case of ArcFace/cosface/adacos? I found one fully implementation of model based on ArcFace/cosface/adacos (https://www.kaggle.com/tanulsingh077/pytorch-metric-learning-pipeline-only-images) |
You raise a good point regarding classification losses. And since it seems common to compute the average loss during validation, it's probably worth adding to the library. I've created a separate issue to keep track of this feature. |
I would like to plot training and validation loss over the training iterations. I'm using the hooks.get_loss_history() and working with record-keeper to visualize the loss. It's working but I'm not able to plot the training and validation loss in the same plot and I'm not sure which loss I am plotting with hooks.get_loss_history() in the first place. Would be grateful for any advice, thanks!
The text was updated successfully, but these errors were encountered: