Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss is NAN Training Stopped. #18

Open
A-n-o-r-a-k opened this issue Nov 12, 2023 · 0 comments
Open

Loss is NAN Training Stopped. #18

A-n-o-r-a-k opened this issue Nov 12, 2023 · 0 comments

Comments

@A-n-o-r-a-k
Copy link

Working my way through the code I ran into this breaking error:

when running this code

# Train the model for three epochs
for epoch in range(num_epochs):
    # train for one epoch, printing every iteration
    train_one_epoch(model, optimizer, data_loader_train, device, epoch, print_freq=10)
    # update the learning rate
    lr_scheduler.step()
    # evaluate on the test dataset
    evaluate(model, data_loader_val, device=device)

    checkpoint_path = f'trained_model_{epoch+1}_epochs.pth'
    torch.save(model.state_dict(), checkpoint_path)

It produces the following error:

Epoch: [0] [ 0/65] eta: 0:01:55 lr: 0.000125 loss: 2.9749 (2.9749) loss_classifier: 1.1639 (1.1639) loss_box_reg: 0.0148 (0.0148) loss_objectness: 1.5950 (1.5950) loss_rpn_box_reg: 0.2011 (0.2011) time: 1.7780 data: 0.4853 max mem: 5038
Loss is nan, stopping training
{'loss_classifier': tensor(1.3285, device='cuda:0', grad_fn=), 'loss_box_reg': tensor(0.0082, device='cuda:0', grad_fn=), 'loss_objectness': tensor(nan, device='cuda:0', grad_fn=), 'loss_rpn_box_reg': tensor(0.1605, device='cuda:0', grad_fn=)}

An exception has occurred, use %tb to see the full traceback.

SystemExit: 1

/home/q/anaconda3/envs/xview3/lib/python3.9/site-packages/IPython/core/interactiveshell.py:3556: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant