Nan when training? How to solve? #9

chenkang455 · 2023-12-26T12:44:51Z

I've tried running BAD-NeRF on my own dataset, however encountered Nan during training, which parameters can be adjusted to solve this problem?

LingzheZhao · 2023-12-26T15:19:42Z

Hi, you can first try lowering the pose_lrate, as we discovered and described here in the README

chenkang455 · 2023-12-26T15:33:31Z

Thanks for your advice. Could you please clarify when the "nan" values occurred during your experimental process? I observed the appearance of "nan" values approximately after 10,000 iterations. Even after adjusting the pose learning rate to 1e-4, the issue persisted.

chenkang455 · 2023-12-26T15:35:50Z

I‘ve tried adjust the near and far to 2 and 6,which are the parameters for lego in NeRF, but the problem still exists.

wangpeng000 · 2023-12-27T03:09:19Z

@chenkang455 Hi, we didn't process 360° scene (like lego) and the codebase aims to handle the "llff" scene. Nan value problem may happen at spline function. In our very previous experiments, this nan value problem appears with a small probability, but it basically will not happen when we decrease the initial pose learning rate.

If you want to handle 360° blur data, my advice is to transfer our spline method to a NeRF model which can directly handle 360° scene, like NeRFStudio (https://github.com/WU-CVGL/BAD-NeRFstudio) or some other framework. What's more, in the orginal nerf-pytorch code, parameters of "ndc", "near" and "far" are also influenced by the scene type, we think the code should work well in forward ("llff") scene.

LingzheZhao · 2023-12-27T03:34:17Z

As @wangpeng000 points out, this code base used NDC scene contraction by default, so if your custom data does not follow the LLFF style, a workaround may be turning that option off.

You can also try out our actively maintained BAD-Nerfstudio, since nerfstudio can handle various data types and it runs much faster.

chenkang455 · 2023-12-27T08:46:09Z

@LingzheZhao @wangpeng000 Thanks for your advice. I set the ndc False and no_ndc True. Now it seemingly works for my lego dataset with no NAN.

chenkang455 · 2024-01-19T05:56:53Z

Hi @LingzheZhao @wangpeng000,

Thank you for your detailed responses. I've come across another issue. As all the datasets in your paper are in LLFF style, I'm looking to use a 360-degree scene, like Lego, which necessitates setting the ndc to False. However, the results appear to be relatively subpar.

I'm wondering if the problem is attributed to the load_llff_data function (replacing load_blender_data with load_llff_data might resolve it) or if it's connected to the NDC setting.

[TRAIN] Iter: 198200 Loss: 0.004275559913367033  coarse_loss:, 0.0025209172163158655, PSNR: 27.5581111907959
[TRAIN] Iter: 198300 Loss: 0.0122279217466712  coarse_loss:, 0.0069176210090518, PSNR: 22.748807907104492
[TRAIN] Iter: 198400 Loss: 0.008158434182405472  coarse_loss:, 0.0045228805392980576, PSNR: 24.3942928314209
[TRAIN] Iter: 198500 Loss: 0.009360695257782936  coarse_loss:, 0.005309312138706446, PSNR: 23.923967361450195
[TRAIN] Iter: 198600 Loss: 0.005872755311429501  coarse_loss:, 0.0035669079516083, PSNR: 26.371694564819336
[TRAIN] Iter: 198700 Loss: 0.00870824046432972  coarse_loss:, 0.004816955421119928, PSNR: 24.099069595336914
[TRAIN] Iter: 198800 Loss: 0.0047332243993878365  coarse_loss:, 0.002672248985618353, PSNR: 26.859272003173828
[TRAIN] Iter: 198900 Loss: 0.006757638417184353  coarse_loss:, 0.003632021602243185, PSNR: 25.050642013549805
[TRAIN] Iter: 199000 Loss: 0.005781751591712236  coarse_loss:, 0.0031937966123223305, PSNR: 25.870431900024414
[TRAIN] Iter: 199100 Loss: 0.005918695125728846  coarse_loss:, 0.003178289858624339, PSNR: 25.62185287475586
[TRAIN] Iter: 199200 Loss: 0.006465458311140537  coarse_loss:, 0.0036145057529211044, PSNR: 25.45009994506836
[TRAIN] Iter: 199300 Loss: 0.005777256563305855  coarse_loss:, 0.003232533112168312, PSNR: 25.943593978881836
[TRAIN] Iter: 199400 Loss: 0.005243922583758831  coarse_loss:, 0.0030064096208661795, PSNR: 26.502344131469727
[TRAIN] Iter: 199500 Loss: 0.005706350319087505  coarse_loss:, 0.0031587404664605856, PSNR: 25.938671112060547
[TRAIN] Iter: 199600 Loss: 0.0033667951356619596  coarse_loss:, 0.0018724045949056745, PSNR: 28.255355834960938
[TRAIN] Iter: 199700 Loss: 0.006072608754038811  coarse_loss:, 0.00363011471927166, PSNR: 26.121665954589844
[TRAIN] Iter: 199800 Loss: 0.0029980246908962727  coarse_loss:, 0.0016257810639217496, PSNR: 28.625686645507812
[TRAIN] Iter: 199900 Loss: 0.0053638434037566185  coarse_loss:, 0.002897790865972638, PSNR: 26.079975128173828
[TRAIN] Iter: 200000 Loss: 0.0062975799664855  coarse_loss:, 0.0033614598214626312, PSNR: 25.3222599029541

wangpeng000 · 2024-01-19T06:39:45Z

@chenkang455, We have no plans to update this repository. Please refer to these #9 (comment) and limacv/Deblur-NeRF#37

chenkang455 · 2024-01-26T01:25:12Z

@chenkang455, We have no plans to update this repository. Please refer to these #9 (comment) and limacv/Deblur-NeRF#37

Got it ! Thanks for your advice.

wangpeng000 closed this as completed Jan 8, 2024

LingzheZhao mentioned this issue Jan 17, 2024

about NAN in the gradients #10

Closed

chenkang455 mentioned this issue Sep 19, 2024

BAD-Gaussians's performance on the nerf 360 dataset. WU-CVGL/BAD-Gaussians#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nan when training? How to solve? #9

Nan when training? How to solve? #9

chenkang455 commented Dec 26, 2023

LingzheZhao commented Dec 26, 2023

chenkang455 commented Dec 26, 2023

chenkang455 commented Dec 26, 2023

wangpeng000 commented Dec 27, 2023 •

edited

Loading

LingzheZhao commented Dec 27, 2023

chenkang455 commented Dec 27, 2023

chenkang455 commented Jan 19, 2024

wangpeng000 commented Jan 19, 2024

chenkang455 commented Jan 26, 2024

Nan when training? How to solve? #9

Nan when training? How to solve? #9

Comments

chenkang455 commented Dec 26, 2023

LingzheZhao commented Dec 26, 2023

chenkang455 commented Dec 26, 2023

chenkang455 commented Dec 26, 2023

wangpeng000 commented Dec 27, 2023 • edited Loading

LingzheZhao commented Dec 27, 2023

chenkang455 commented Dec 27, 2023

chenkang455 commented Jan 19, 2024

wangpeng000 commented Jan 19, 2024

chenkang455 commented Jan 26, 2024

wangpeng000 commented Dec 27, 2023 •

edited

Loading