Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nan when training? How to solve? #9

Closed
chenkang455 opened this issue Dec 26, 2023 · 9 comments
Closed

Nan when training? How to solve? #9

chenkang455 opened this issue Dec 26, 2023 · 9 comments

Comments

@chenkang455
Copy link

I've tried running BAD-NeRF on my own dataset, however encountered Nan during training, which parameters can be adjusted to solve this problem?

@LingzheZhao
Copy link
Member

Hi, you can first try lowering the pose_lrate, as we discovered and described here in the README

@chenkang455
Copy link
Author

Thanks for your advice. Could you please clarify when the "nan" values occurred during your experimental process? I observed the appearance of "nan" values approximately after 10,000 iterations. Even after adjusting the pose learning rate to 1e-4, the issue persisted.

@chenkang455
Copy link
Author

I‘ve tried adjust the near and far to 2 and 6,which are the parameters for lego in NeRF, but the problem still exists.

@wangpeng000
Copy link
Member

wangpeng000 commented Dec 27, 2023

@chenkang455 Hi, we didn't process 360° scene (like lego) and the codebase aims to handle the "llff" scene. Nan value problem may happen at spline function. In our very previous experiments, this nan value problem appears with a small probability, but it basically will not happen when we decrease the initial pose learning rate.

If you want to handle 360° blur data, my advice is to transfer our spline method to a NeRF model which can directly handle 360° scene, like NeRFStudio (https://github.com/WU-CVGL/BAD-NeRFstudio) or some other framework. What's more, in the orginal nerf-pytorch code, parameters of "ndc", "near" and "far" are also influenced by the scene type, we think the code should work well in forward ("llff") scene.

@LingzheZhao
Copy link
Member

As @wangpeng000 points out, this code base used NDC scene contraction by default, so if your custom data does not follow the LLFF style, a workaround may be turning that option off.

You can also try out our actively maintained BAD-Nerfstudio, since nerfstudio can handle various data types and it runs much faster.

@chenkang455
Copy link
Author

@LingzheZhao @wangpeng000 Thanks for your advice. I set the ndc False and no_ndc True. Now it seemingly works for my lego dataset with no NAN.

@chenkang455
Copy link
Author

Hi @LingzheZhao @wangpeng000,

Thank you for your detailed responses. I've come across another issue. As all the datasets in your paper are in LLFF style, I'm looking to use a 360-degree scene, like Lego, which necessitates setting the ndc to False. However, the results appear to be relatively subpar.

I'm wondering if the problem is attributed to the load_llff_data function (replacing load_blender_data with load_llff_data might resolve it) or if it's connected to the NDC setting.

image

[TRAIN] Iter: 198200 Loss: 0.004275559913367033  coarse_loss:, 0.0025209172163158655, PSNR: 27.5581111907959
[TRAIN] Iter: 198300 Loss: 0.0122279217466712  coarse_loss:, 0.0069176210090518, PSNR: 22.748807907104492
[TRAIN] Iter: 198400 Loss: 0.008158434182405472  coarse_loss:, 0.0045228805392980576, PSNR: 24.3942928314209
[TRAIN] Iter: 198500 Loss: 0.009360695257782936  coarse_loss:, 0.005309312138706446, PSNR: 23.923967361450195
[TRAIN] Iter: 198600 Loss: 0.005872755311429501  coarse_loss:, 0.0035669079516083, PSNR: 26.371694564819336
[TRAIN] Iter: 198700 Loss: 0.00870824046432972  coarse_loss:, 0.004816955421119928, PSNR: 24.099069595336914
[TRAIN] Iter: 198800 Loss: 0.0047332243993878365  coarse_loss:, 0.002672248985618353, PSNR: 26.859272003173828
[TRAIN] Iter: 198900 Loss: 0.006757638417184353  coarse_loss:, 0.003632021602243185, PSNR: 25.050642013549805
[TRAIN] Iter: 199000 Loss: 0.005781751591712236  coarse_loss:, 0.0031937966123223305, PSNR: 25.870431900024414
[TRAIN] Iter: 199100 Loss: 0.005918695125728846  coarse_loss:, 0.003178289858624339, PSNR: 25.62185287475586
[TRAIN] Iter: 199200 Loss: 0.006465458311140537  coarse_loss:, 0.0036145057529211044, PSNR: 25.45009994506836
[TRAIN] Iter: 199300 Loss: 0.005777256563305855  coarse_loss:, 0.003232533112168312, PSNR: 25.943593978881836
[TRAIN] Iter: 199400 Loss: 0.005243922583758831  coarse_loss:, 0.0030064096208661795, PSNR: 26.502344131469727
[TRAIN] Iter: 199500 Loss: 0.005706350319087505  coarse_loss:, 0.0031587404664605856, PSNR: 25.938671112060547
[TRAIN] Iter: 199600 Loss: 0.0033667951356619596  coarse_loss:, 0.0018724045949056745, PSNR: 28.255355834960938
[TRAIN] Iter: 199700 Loss: 0.006072608754038811  coarse_loss:, 0.00363011471927166, PSNR: 26.121665954589844
[TRAIN] Iter: 199800 Loss: 0.0029980246908962727  coarse_loss:, 0.0016257810639217496, PSNR: 28.625686645507812
[TRAIN] Iter: 199900 Loss: 0.0053638434037566185  coarse_loss:, 0.002897790865972638, PSNR: 26.079975128173828
[TRAIN] Iter: 200000 Loss: 0.0062975799664855  coarse_loss:, 0.0033614598214626312, PSNR: 25.3222599029541

@wangpeng000
Copy link
Member

@chenkang455, We have no plans to update this repository. Please refer to these #9 (comment) and limacv/Deblur-NeRF#37

@chenkang455
Copy link
Author

@chenkang455, We have no plans to update this repository. Please refer to these #9 (comment) and limacv/Deblur-NeRF#37

Got it ! Thanks for your advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants