NeRF 9.5h results #7

kwea123 · 2021-07-06T15:26:27Z

Which implementation do you use to generate these results? They seem much worse than the results reported in the paper and the results I got with my implementation.

Here's my result with horns, PSNR reaches already 31.6 after 9.5h (reaches 25.91 after 22mins btw). I believe there should be some simplification in the implementation that you adopt, because I implement almost the same way as the original paper does, and adopt the same hyperparameters.

The text was updated successfully, but these errors were encountered:

apchenstu · 2021-07-06T16:27:41Z

Hi kwea123, thanks for your attention, previously, I evaluate nerf with your implementation, I think there are two main differences here: 1) we trained the scenes with 16 training views instead of all images, 2) in the LLFF/nerf scenes, we didn't use the ndc ray to align their near-far range with the DTU dataset.

apchenstu · 2021-07-06T16:35:33Z

recently I found the NDC ray sampling influent the quality of LLLFF a lot, so I am re-evaluating them and also IBRNet with their official code. I will update the result in our revision.

kwea123 · 2021-07-07T00:21:39Z

we trained the scenes with 16 training views instead of all images

I see, maybe that makes difference too, here I use all except one image, so 61 images.

kwea123 · 2021-07-07T12:48:54Z

According to the paper, "In this work, we parameterize (u, v, z) using the normalized device coordinate (NDC) at the reference view." just after equation 4. And many other statements mention that you use NDC (for your method).

So I'm confused about why you didn't use NDC for LLFF/nerf?

apchenstu · 2021-07-07T13:44:35Z

in the LLFF/nerf scenes, we didn't use the ndc ray to align their near-far range with the DTU dataset.

Oh, we didn't use ndc rays means we didn't pre-process the dataset, e.g., we use their real near-fars (range in nerf: 2-6, LLFF~2-12, dtu 2.125-5). this process refer to this setting

And yes we are using NDC position of the encoding volume (reference view), e.g., the xyz_NDC of this line

oOXpycTOo · 2021-07-09T10:08:34Z

May I also ask a question about the implementation here?
I see, that you're using 3 neighbouring views in the MVSNet to create feature volume. However, the original MVSNet used a reference image (the one we need to render and lack when doing inference). According to your code, that you projected all images onto image plane with index 0, which, I believe is a source image. This contradicts your paper, could you, please clarify that moment?
This is the code part I'm writing about:

apchenstu · 2021-07-09T12:37:44Z

Hi oOXpycTOo, we use three neighboring views (e.g. source views) to create the feature volume means the input is three images and they have a related small baseline, and we project other two images’ feature to the ref view(index 0 view), i hope this answers your question, thanks

oOXpycTOo · 2021-07-09T13:18:11Z

Yes, but that's quite different from the original MVS Net idea, where they projected these features onto the predicted image plane. What is the physical meaning of projecting other two features onto one of the source views?

apchenstu · 2021-07-10T08:05:37Z

oh I got your point, yeah it's different, we think building volume in the target view may provide a better depth quality but is not an efficient way to do the free-viewpoint rendering. I think the projection is the homography plane wrapping.

zcong17huang · 2021-07-13T09:23:25Z

Hi kwea123, thanks for your attention, previously, I evaluate nerf with your implementation, I think there are two main differences here: 1) we trained the scenes with 16 training views instead of all images, 2) in the LLFF/nerf scenes, we didn't use the ndc ray to align their near-far range with the DTU dataset.

I would like to ask what did you mean when you said "trained the scenes with 16 training views" here? In the dataset you give DTU training data, it seems that you trained each scan with 49 pairs of views(1 target and 3 source views). So what do I understand wrong?

apchenstu · 2021-07-13T09:43:49Z

Oh, the statement refers to the fine-tuning stage and the fine-tuning pairing file is in the 'configs/pairs.th', thanks~

kwea123 mentioned this issue Jul 12, 2021

cost volume constraint - closer camera view #9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NeRF 9.5h results #7

NeRF 9.5h results #7

kwea123 commented Jul 6, 2021

apchenstu commented Jul 6, 2021

apchenstu commented Jul 6, 2021 •

edited

kwea123 commented Jul 7, 2021 •

edited

kwea123 commented Jul 7, 2021

apchenstu commented Jul 7, 2021 •

edited

oOXpycTOo commented Jul 9, 2021

apchenstu commented Jul 9, 2021

oOXpycTOo commented Jul 9, 2021

apchenstu commented Jul 10, 2021

zcong17huang commented Jul 13, 2021

apchenstu commented Jul 13, 2021

NeRF 9.5h results #7

NeRF 9.5h results #7

Comments

kwea123 commented Jul 6, 2021

apchenstu commented Jul 6, 2021

apchenstu commented Jul 6, 2021 • edited

kwea123 commented Jul 7, 2021 • edited

kwea123 commented Jul 7, 2021

apchenstu commented Jul 7, 2021 • edited

oOXpycTOo commented Jul 9, 2021

apchenstu commented Jul 9, 2021

oOXpycTOo commented Jul 9, 2021

apchenstu commented Jul 10, 2021

zcong17huang commented Jul 13, 2021

apchenstu commented Jul 13, 2021

apchenstu commented Jul 6, 2021 •

edited

kwea123 commented Jul 7, 2021 •

edited

apchenstu commented Jul 7, 2021 •

edited