Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tanks and Temples Setup #14

Closed
tejaskhot opened this issue Nov 19, 2018 · 18 comments
Closed

Tanks and Temples Setup #14

tejaskhot opened this issue Nov 19, 2018 · 18 comments

Comments

@tejaskhot
Copy link

Thanks for sharing the code.

I am trying to reproduce the results on tanks and temples with the pre-trained model but not succeeding so far. An example camera file looks like:

extrinsic
0.333487 -0.0576322 -0.940992 -0.0320506
0.0582181 -0.994966 0.0815704 -0.0245921
-0.940956 -0.0819853 -0.328452 0.248608
0.0 0.0 0.0 1.0

intrinsic
1165.71 0 962.81
0 1165.71 541.723
0 0 1

0.193887 0.00406869 778 3.35933

I have parsed the file to adjust the depth min and max, but it doesn't seem to help much. I only have a 12GB GPU memory, so I am running at half the image resolution which shouldn't hurt a lot. However, the outputs I am getting are pretty bad and nothing like the paper. Moreover, I find that I am having to change the parameters for every single scan (horse, family, etc) separately and no set of values seems to apply to all.

@YoYo000 Since there are multiple similar questions on this, it'd be great if you could please summarize the detailed steps for reproducing the scan results including the parameters to use and changes to the repository scripts if any.

@YoYo000
Copy link
Owner

YoYo000 commented Nov 20, 2018

I will generate the original cam.txt files for Tanks and Temples dataset soon. The depth min and max of the above camera might be a little bit too relaxed.

Meanwhile if you are using the resized images, some post-proc params might need to be further tuned. I will try to find a proper config and provide a simple script.

@tejaskhot
Copy link
Author

Thanks a lot @YoYo000 !
Do you have an approx timeline for this? By when do you think it'll be possible to share?

@YoYo000
Copy link
Owner

YoYo000 commented Nov 20, 2018

@tejaskhot Hopefully today or tomorrow

@YoYo000
Copy link
Owner

YoYo000 commented Nov 20, 2018

@tejaskhot you could use this cams and try the new commit for Family dataset:

python test.py --dense_folder '/data/intel_dataset/intermediate/mvsnet_input/family/' --max_d 256 --max_w 960 --max_h 512 --interval_scale 1

python depthfusion.py --dense_folder '/data/intel_dataset/intermediate/mvsnet_input/family' --prob_threshold 0.6 --disp_thresh 0.25 --num_consistent 4

If you want to tune the point cloud I think you could change --prob_threshold. Also, I found that the downsized setting and the Fusibile post-processing would affect the reconstruction:

Downsized image + Fusibile post-proc:
image

Downsized image + Proposed post-proc:
image

Original image + Proposed post-proc:
image

Original image + Fusibile post-proc:
image

@tejaskhot
Copy link
Author

tejaskhot commented Nov 20, 2018

Thanks for the quick response. I have two followup questions.

  1. I generated outputs using the steps you mentioned with downsized images (same values you mention above) and got an output for family which I think is similar to what you posted. However, a zoomed out view of it shows plenty of surrounding areas being reconstructed as shown. Is this normal/expected?

image

  1. Using the same hyperparameters, I produced outputs for a few of the other scans and they don't look as expected. For eg, here are two views of horse.

image

image

Does this mean we have to set hyperparameters for every scan of Tanks and Temples individually? Are the results in your paper produced with such individually picked values or do you use the same set of values across the dataset?

@YoYo000
Copy link
Owner

YoYo000 commented Nov 21, 2018

Could you check again the input cameras and other settings? The new camera should have the tight depth range but your reconstruction are with the wide depth range. The zoomed out views of my reconstructions with the provided two commands and parameters look like:

image

image

image

image

Also, I use the rectified images I sent to you and donot pre-resize images. The test.py script will automatically do the resizing and cropping.

For hyperparameters on DTU evaluation, we use parameters as described in the paper. For Tanks and Temples, we fixed all parameters except the probability threshold (0.6 +- 0.2). This is because some of the scenes contain large portion of background areas and skies. Tuning the probability threshold could effectively control the filtering for these parts.

@tejaskhot
Copy link
Author

I tried downloading the files you posted, freshly cloning the repo and running the commands as is but I get all depth value predictions as nan and consequently empty point cloud. Can you please verify the files you linked? There seems to be some issue.

@YoYo000
Copy link
Owner

YoYo000 commented Nov 22, 2018

@tejaskhot I see I gave your the wrong link... here is the new cams

Sorry for my mistake!

@tejaskhot
Copy link
Author

Thanks! These files work and I am able to reproduce the results. I had one question regarding the full-resolution results. As reported in the paper, I tried using images of size 1920 x 1056 with D=256, interval=1, N=5 on a 16GB GPU but that for me also complains for going out of memory. How are you able to run these inferences at full resolution? Is there something I am missing?

@YoYo000
Copy link
Owner

YoYo000 commented Nov 23, 2018

Is the GPU also occupied by other applications (e.g., Chrome) during your experiments? I have encountered the OOM problem when only ~200 MB memory is not available. BTW my experiments were ran on the google cloud ml platform with the P100 GPU.

@tejaskhot
Copy link
Author

Thanks!

@whubaichuan
Copy link

@tejaskhot Hi, why Yao's results have the tight depth range but your reconstruction is with the wide depth range when roomed out?

Is that because of the range of depth listed in the last lines of cam.txt ?

@tejaskhot
Copy link
Author

@whubaichuan I don't remember the specifics to be honest but that seems to be a fair guess. As @YoYo000 pointed out, the cam parameters and depth range are crucial for getting good results.

@whubaichuan
Copy link

@tejaskhot Thanks for reply. Have you tested the different settings in T&T Leaderboard? Is the pro_threshold main cause to influence the results in T&T ?

@tatsy
Copy link

tatsy commented Nov 13, 2020

@YoYo000 I am sorry to annoy you in this busy time, but I'd like to ask you how I can reproduce the results for Tanks and Temples dataset in R-MVSNet paper.

I think the above images and the results in MVSNet paper is made with the camera parameters in short_range_cameras_for_mvsnet that you provided in this repo. However, this short range camera parameters is not provided for advanced dataset (although it's natural because the scene in the advanced sets might not fit in short ranges).

So, I though this means that the results in R-MVSNet paper is made by the camera parameters NOT in short_range_cameras_for_mvsnet, namely stored in cams sub-folder in the folders with scene names such as Auditorium. However, as far as I tested, the reconstruction quality using these camera parameters were significantly lower than those I could see in the R-MVSNet paper.

So, I am wondering if you would shared the tips for tuning the depth range for R-MVSNet paper. Thank you very much for your help.

@YoYo000
Copy link
Owner

YoYo000 commented Nov 16, 2020

Hi @tatsy

Yes, you are right, the R-MVSNet paper is using the long range camera for reconstruction, for both the intermediate set and the advanced set. Only MVSNet uses the short_range_cameras_for_mvsnet as it is restricted to a small depth number.

For benchmarking on the advanced dataset, the post-processing would be important. From what I observed the Fusibile point cloud is quite noisy, and I was using the fusion + refinement strategy described in the paper to get the benchmarking result.

@tatsy
Copy link

tatsy commented Nov 16, 2020

Hi @YoYo000

Thank you very much for your reply. So, I guess the problem is variational depth refinement in the R-MVSNet paper because the results I got were rather sparse as well as they are noisy. Actually, the fusibile works pretty well for DTU dataset with MVSNet (not R-MVSNet), and moreover, the behavior of the R-MVSNet is quite similar to those in the second row of Table 3 (R-MVSNet paper, for refinement ablated one).

I have already implemented the program for the variational depth refinement but it is quite unstable in the gradient descent process. As I posted in another issue, I am wondering how each of ZNCC and bilateral smoothing terms is.
#35 (comment)

Concretely, my questions are:

  • ZNCC is used as the data term as it is? It is not used like exp(-ZNCC) or others?
  • Weights for the data term (by ZNCC) and bilateral smoothing term are both 1? Or, they are different and, e.g., the weight for the data term is 1 and that for the smoothing term is 0.01 or something like that?
  • Neighbors for the bilateral smoothing term (N(p_1) in the paper) is 4-neighbor pixels, 8-neighbor pixels, or larger?

Thank you very much for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants