-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaN:Dear author, thanks for you great work. Currently I am trying to run your code but always report NaN error, the following is the error traceback, can you have a look? Thanks in advance! #15
Comments
Thanks for the feedback. Can you clarify which script are you using and which step are you in? |
meta_training_coco_resnet101_stage_2.yaml. When I implement this step, it only runs a few steps before reporting an error that Predicted boxes or scores contain Inf/NaN. |
Are you trying to reproduce our experiments on coco dataset? This is weird. Can you show me the full running log to better understand what happened in your training? |
yes! i have finished the meta_traning_coco stage_1 of the meta_training_coco_multi... , but when i run the meta_training_coco_resnet101_stage_2, it showed as follow: [10/17 15:21:08] d2.data.datasets.coco INFO: Loading datasets/coco/new_annotations/final_split_non_voc_instances_train2014.json takes 3.08 seconds. |
May I know the model hyper-parameters configs, e.g., the batch size? Did you change the default value? |
Due to my limited gpu memory, I had to change the batchsize to 4 and the learning rate to 0.0005. can you please tell me how to adjust it? ############################################ ########################################## |
Unfortunately, our model works best with batch_size >=8 in the second step. Using smaller batch_size leads to unstable training. You can try to decrease the BASE_LR, increase the WARMUP_ITERS, decrease the number of SUPPORT_SHOT or other ways to remedy the small batch_size, but the detection accuracy may not be guaranteed. |
File "/home/sjk/anaconda3/envs/chpy/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/1Tm2/CH/Meta-Faster-R-CNN/meta_faster_rcnn/modeling/fsod/fsod_rpn.py", line 523, in predict_proposals
return find_top_rpn_proposals(
File "/home/sjk/anaconda3/envs/chpy/lib/python3.8/site-packages/detectron2/modeling/proposal_generator/proposal_utils.py", line 103, in find_top_rpn_proposals
raise FloatingPointError(
FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged.
The text was updated successfully, but these errors were encountered: