Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NeRF 9.5h results #7

Open
kwea123 opened this issue Jul 6, 2021 · 11 comments
Open

NeRF 9.5h results #7

kwea123 opened this issue Jul 6, 2021 · 11 comments

Comments

@kwea123
Copy link

kwea123 commented Jul 6, 2021

Which implementation do you use to generate these results? They seem much worse than the results reported in the paper and the results I got with my implementation.

Screenshot from 2021-07-07 00-18-45
Screenshot from 2021-07-07 00-19-23

Here's my result with horns, PSNR reaches already 31.6 after 9.5h (reaches 25.91 after 22mins btw). I believe there should be some simplification in the implementation that you adopt, because I implement almost the same way as the original paper does, and adopt the same hyperparameters.

@apchenstu
Copy link
Owner

Hi kwea123, thanks for your attention, previously, I evaluate nerf with your implementation, I think there are two main differences here: 1) we trained the scenes with 16 training views instead of all images, 2) in the LLFF/nerf scenes, we didn't use the ndc ray to align their near-far range with the DTU dataset.

@apchenstu
Copy link
Owner

apchenstu commented Jul 6, 2021

recently I found the NDC ray sampling influent the quality of LLLFF a lot, so I am re-evaluating them and also IBRNet with their official code. I will update the result in our revision.

@kwea123
Copy link
Author

kwea123 commented Jul 7, 2021

  1. we trained the scenes with 16 training views instead of all images

I see, maybe that makes difference too, here I use all except one image, so 61 images.

@kwea123
Copy link
Author

kwea123 commented Jul 7, 2021

According to the paper, "In this work, we parameterize (u, v, z) using the normalized device coordinate (NDC) at the reference view." just after equation 4. And many other statements mention that you use NDC (for your method).

So I'm confused about why you didn't use NDC for LLFF/nerf?

@apchenstu
Copy link
Owner

apchenstu commented Jul 7, 2021

  1. in the LLFF/nerf scenes, we didn't use the ndc ray to align their near-far range with the DTU dataset.

Oh, we didn't use ndc rays means we didn't pre-process the dataset, e.g., we use their real near-fars (range in nerf: 2-6, LLFF~2-12, dtu 2.125-5). this process refer to this setting

And yes we are using NDC position of the encoding volume (reference view), e.g., the xyz_NDC of this line

@oOXpycTOo
Copy link

May I also ask a question about the implementation here?
I see, that you're using 3 neighbouring views in the MVSNet to create feature volume. However, the original MVSNet used a reference image (the one we need to render and lack when doing inference). According to your code, that you projected all images onto image plane with index 0, which, I believe is a source image. This contradicts your paper, could you, please clarify that moment?
This is the code part I'm writing about:
image

@apchenstu
Copy link
Owner

Hi oOXpycTOo, we use three neighboring views (e.g. source views) to create the feature volume means the input is three images and they have a related small baseline, and we project other two images’ feature to the ref view(index 0 view), i hope this answers your question, thanks

@oOXpycTOo
Copy link

Yes, but that's quite different from the original MVS Net idea, where they projected these features onto the predicted image plane. What is the physical meaning of projecting other two features onto one of the source views?

@apchenstu
Copy link
Owner

oh I got your point, yeah it's different, we think building volume in the target view may provide a better depth quality but is not an efficient way to do the free-viewpoint rendering. I think the projection is the homography plane wrapping.

@zcong17huang
Copy link

Hi kwea123, thanks for your attention, previously, I evaluate nerf with your implementation, I think there are two main differences here: 1) we trained the scenes with 16 training views instead of all images, 2) in the LLFF/nerf scenes, we didn't use the ndc ray to align their near-far range with the DTU dataset.

I would like to ask what did you mean when you said "trained the scenes with 16 training views" here? In the dataset you give DTU training data, it seems that you trained each scan with 49 pairs of views(1 target and 3 source views). So what do I understand wrong?

@apchenstu
Copy link
Owner

Oh, the statement refers to the fine-tuning stage and the fine-tuning pairing file is in the 'configs/pairs.th', thanks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants