New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NeRF 9.5h results #7
Comments
Hi kwea123, thanks for your attention, previously, I evaluate nerf with your implementation, I think there are two main differences here: 1) we trained the scenes with 16 training views instead of all images, 2) in the LLFF/nerf scenes, we didn't use the ndc ray to align their near-far range with the DTU dataset. |
recently I found the NDC ray sampling influent the quality of LLLFF a lot, so I am re-evaluating them and also IBRNet with their official code. I will update the result in our revision. |
I see, maybe that makes difference too, here I use all except one image, so 61 images. |
According to the paper, "In this work, we parameterize (u, v, z) using the normalized device coordinate (NDC) at the reference view." just after equation 4. And many other statements mention that you use NDC (for your method). So I'm confused about why you didn't use NDC for LLFF/nerf? |
Oh, we didn't use ndc rays means we didn't pre-process the dataset, e.g., we use their real near-fars (range in nerf: 2-6, LLFF~2-12, dtu 2.125-5). this process refer to this setting And yes we are using NDC position of the encoding volume (reference view), e.g., the xyz_NDC of this line |
Hi oOXpycTOo, we use three neighboring views (e.g. source views) to create the feature volume means the input is three images and they have a related small baseline, and we project other two images’ feature to the ref view(index 0 view), i hope this answers your question, thanks |
Yes, but that's quite different from the original MVS Net idea, where they projected these features onto the predicted image plane. What is the physical meaning of projecting other two features onto one of the source views? |
oh I got your point, yeah it's different, we think building volume in the target view may provide a better depth quality but is not an efficient way to do the free-viewpoint rendering. I think the projection is the homography plane wrapping. |
I would like to ask what did you mean when you said "trained the scenes with 16 training views" here? In the dataset you give DTU training data, it seems that you trained each scan with 49 pairs of views(1 target and 3 source views). So what do I understand wrong? |
Oh, the statement refers to the fine-tuning stage and the fine-tuning pairing file is in the 'configs/pairs.th', thanks~ |
Which implementation do you use to generate these results? They seem much worse than the results reported in the paper and the results I got with my implementation.
Here's my result with
horns
, PSNR reaches already 31.6 after 9.5h (reaches 25.91 after 22mins btw). I believe there should be some simplification in the implementation that you adopt, because I implement almost the same way as the original paper does, and adopt the same hyperparameters.The text was updated successfully, but these errors were encountered: