Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some question about test result #23

Closed
zjulabwjt opened this issue Jul 22, 2023 · 16 comments
Closed

Some question about test result #23

zjulabwjt opened this issue Jul 22, 2023 · 16 comments
Labels
good first issue Good for newcomers

Comments

@zjulabwjt
Copy link

zjulabwjt commented Jul 22, 2023

Thanks for your great work!I run your code in some dataset,and have some question.

1.About FPS mentioned in your main paper.I run your code in A40 Server,in replica dataset,about 2000 frames.The time

consume about 20 min in your deault set(tracking iter 10,mapping iter 20).How to understand FPS mentioned in your paper?

2.I found in Real world dataset,the mesh result is bad,such as in Scannet and tum_rbd datasets.I found tracking and mapping

iter is not enough,because the psnr is low.Can you give me some advice to improve the performance in real datasets?Add iters

or higher voxel resolution but reduce efficience?

Thanks for your reply!
image

@HengyiWang
Copy link
Owner

Hi @zjulabwjt, thank you for your questions. I'll address each of your concerns separately.

Regarding your first question, you can check Jingwen's reply here #11. Feel free to reach out if you cannot address this issue following those steps.

Regarding your second question:

Mesh is bad & PSNR is low -> tracking/mapping iterations are not enough -> asking suggestions for better reconstruction (giving two options that negatively impact efficiency: increase #iters or resolution).

  1. Good or bad reconstruction on real-world datasets can depend on various aspects such as sensor quality, camera trajectories, illumination conditions, the effectiveness of the algorithms, subjective perceptions, etc. It is better to compare with other methods as those datasets are quite challenging. The screenshot you show is a reconstruction of the scene in the TUM dataset, which contains lots of unobserved regions and missing depth measurements. I have attached the results of KinectFusion for your reference. Our reconstruction usually has better completion (e.g., the screen.) BTW, if you want to get rid of floaters, simply run marching cubes by masking out points that are outside of the camera frustum.
image
  1. Regarding PSNR, it's important to note that our primary focus is SLAM rather than novel view synthesis. Thus, instead of building a 3D radiance field, we mostly care about the color of the surface points no matter which view directions we use. So over-fitting on the color of each frame is not that necessary since the video contains many redundant observations of surface points (This is why we only keep 5% pixels of each keyframe to save CPU memory, and you will not expect a high re-rendered PSNR from only 5% pixels, but the actual color of the mesh is still comparable or sometimes even better than NICE-SLAM). Whether #mapping iters is enough usually depends on the tracking performance in the context of SLAM as poor mapping would lead to lose tracking.

  2. Adding those extra computational costs will give you slightly better results. However, our Co-SLAM aims to demonstrate improved performance while being computationally efficient compared to NICE-SLAM. Depending on your applications, I would suggest some other ways to improve the quality of the reconstruction.

@zjulabwjt
Copy link
Author

Hi @HengyiWang ,thank you for your quick and detailed reply and advice.

I have also some questions about the difference global BA in your code and local BA in other neural field SLAM.

As metioned in your main paper,the global BA select 5% ray in global keyframe set.But local BA only select less than 10

keyframs in slid window.In my opinion,global BA can get more global consistency but less effience.Your Strategy in global BA

(select 5% ray) aim to more global consistency result for tracking but decrease render perference? if local BA will increase render

performance because select more rays in near view?

Thanks for your reply and help!

@HengyiWang
Copy link
Owner

Hi @zjulabwjt, thank you for your questions.

Regarding the 5% pixels mentioned in the paper, it serves as a keyframe management strategy to optimize CPU memory usage by storing 5% pixels of each keyframe in the CPU. The key idea is that neural implicit SLAM doesn't require storing frame-level information for all pixels, and keeping 5% of keyframe pixels in CPU is often sufficient for the reconstruction. However, you can still increase the percentage up to 100% and performance will usually get better. This keyframe management strategy is independent of global BA.

As for local BA, it can definitely help with rendering performance as it samples more rays from unexplored areas. However, performing local BA usually requires you to freeze the decoder. Otherwise, you may have a catastrophic forgetting issue. Check out our project page for a comparison with concurrent works (ESLAM) and more discussions about global/local BA at #19.

Feel free to ask if you have any further questions. Your support is highly appreciated, and if you find our work helpful for your research, we'd be thrilled if you could star this repository ;)

@zjulabwjt
Copy link
Author

highly appreciated

Thanks for your quick reply!I am highly appreciated and I have stared your excellent work!

@zjulabwjt
Copy link
Author

zjulabwjt commented Jul 23, 2023

Hi,@HengyiWang I am sorry to bother you again.Actually,I also have some questions about code.

1、I found in scannet scene mesh performance is bad.I found in scannet scene_0059_00 the hash size is only 12,the small size of hash just for less memory burden or the hash size setting already reach bottleneck of performance?

2、I remove the one blob encode and only hash encode in scannet scene_0000,scene_0059,scene_0106,scene_0169,and I found almost no difference in mesh result.In my option,one blob might fill the code and smooth surface but might lose the geometry detail.

3、Lastly,I have a silly question.How to run marching cubes by masking out points that are outside of the camera frustum you mentioned above?If it is convenient for you,Can you share me this part of the code.

Thank you in advance for your reply!

@HengyiWang
Copy link
Owner

Hi @zjulabwjt, thanks for your question.

  1. The description of "mesh is bad" is quite vague... Can you elaborate more about why you think it is bad or show some comparisons so that I can help you with it? The hash size is determined based on the size of the scene. You can tune the hash size and the smaller it is, the less memory you will use.

  2. Coordinate encoding can give better hole-filling ability as shown in Fig.2 in our paper. However, in scannet, most of the time, the hole is quite small, and the continuity of the feature grid is usually more than enough. Additionally, joint encoding can help with memory compression. You can further reduce the hash look-up size and get something like Fig. 9 in our main paper. As for the concern about losing geometry details, we have not observed such issues, as sparse parametric encoding usually complements coordinate encoding and effectively preserves the geometry details

  3. You can try mesh culling described here neural_slam_eval. Using either frustum (NICE-SLAM and iMAP) or frustum+occlusion (Neural-RGBD and GO-Surf) strategy.

@zjulabwjt
Copy link
Author

zjulabwjt commented Jul 24, 2023

Hi,@HengyiWang thank you again for your patient reply!
I use the pretrained model in Coslam with entire scene in scannet scene0059_00,and inference the whole image in sequence.But I found such result.the render image is blur,it is because in SLAM pipline,only focus on a part of pixels that cause low rendering result(even global BA,sparse pixels in perivious frames)?I don't know whether it is reasonable(just before you said,your work is focus on tracking not rendering),I want to incremantal reconstruction scene with higher render performance,Can you give me some advice?
image

@HengyiWang
Copy link
Owner

Hi @zjulabwjt, thanks for sharing the results, now I see what you mean. The rendering result might be affected by the following factors:

  1. Insufficient pixel sampling: We have discussed this issue before.
  2. Tracking accuracy: As far as I remember, the ATE is around 10cm on this scene, leading to thicker structures in the reconstruction

I have added one more function of current frame mapping in the latest commit. The decoder is frozen to ensure the local mapping will not ruin the optimisation of the decoder, which might be helpful for your concern and can also potentially address the concerns raised by @supremeccccc #19.

You can try some of the following things:

  1. Add some iterations for current frame mapping here:
    cur_frame_iters: 0
  2. Increase the percentage of pixels stored in the keyframe here (only increase the memory usage of CPU ):
    n_pixels: 0.05
  3. Increase iters & samples for BA (It is always better to increase batch size first and then try to increase iters)

@zjulabwjt
Copy link
Author

Hi @zjulabwjt, thanks for sharing the results, now I see what you mean. The rendering result might be affected by the following factors:

  1. Insufficient pixel sampling: We have discussed this issue before.
  2. Tracking accuracy: As far as I remember, the ATE is around 10cm on this scene, leading to thicker structures in the reconstruction

I have added one more function of current frame mapping in the latest commit. The decoder is frozen to ensure the local mapping will not ruin the optimisation of the decoder, which might be helpful for your concern and can also potentially address the concerns raised by @supremeccccc #19.

You can try some of the following things:

  1. Add some iterations for current frame mapping here:
    cur_frame_iters: 0
  2. Increase the percentage of pixels stored in the keyframe here (only increase the memory usage of CPU ):
    n_pixels: 0.05
  3. Increase iters & samples for BA (It is always better to increase batch size first and then try to increase iters)

Thanks for your reply,And I will try your advice.Thanks for your excellent work and help again.

@zjulabwjt
Copy link
Author

zjulabwjt commented Jul 25, 2023

Hi,@HengyiWang.I try to closed track_render,and use gt_pose mapping and not optimize pose,but meet some problems.
In my result in scene_0059 scannet,I found when mapping 1456 frames ,the hash encode feature is nan,but not nan in scennet scene_0106 dataset.Besides,I found in mapping iters in later frames,the PSNR gradually decrease to nearly 20 and can not go up,the map_optimizer is not update?What might cause these problems?Can you give me some advice?
I remove the pose_optimizer and modify pose in global_BA, in colmap.py,there self.est_c2w_data is update by self.pose_gt in every frame.And,
current_pose = self.est_c2w_data[cur_frame_id][None,...]
poses_all = torch.cat([poses, current_pose], dim=0).to(self.device)
and every mapping iter,zero_grad and step map_optimizer
loss.backward()
self.map_optimizer.step()
self.map_optimizer.zero_grad()_
Thank you in advance for your help!

@HengyiWang
Copy link
Owner

  1. ScanNet pose is obtained by BundleFusion, which contains NaN. Check if the NaN is caused by pose. As far as I remember, scan_0059 has NaN pose.
  2. Try increasing cur_frame_iters: 0 and see if it helps with PSNR in your case.

@zjulabwjt
Copy link
Author

zjulabwjt commented Jul 26, 2023

  1. ScanNet pose is obtained by BundleFusion, which contains NaN. Check if the NaN is caused by pose. As far as I remember, scan_0059 has NaN pose.
  2. Try increasing cur_frame_iters: 0 and see if it helps with PSNR in your case.

Thanks for your help!I found in scene0059_00 dataset frames 1460-1472 is inf value and cause this problem.I will try to increase cur_frame_iters for better render performance.But I find the mesh result use tracking pose is better than use gt_pose in scannet scene0106_00 in visiualization using same config(cur_iter_frame is zero).
Why cause this result?If the pose_gt is not realiable?
the first image uses tracking pose and second one uses gt_pose,all meshs is culled.
image
image

@HengyiWang
Copy link
Owner

Yes, the poses are obtained by BundleFusion, it is not strictly gt pose. For example, take a look at the gt mesh of scene0000, you can see lots of misaligned areas, which are caused by inaccurate pose obtained by BundleFusion.

@zjulabwjt
Copy link
Author

zjulabwjt commented Jul 26, 2023

Yes, the poses are obtained by BundleFusion, it is not strictly gt pose. For example, take a look at the gt mesh of scene0000, you can see lots of misaligned areas, which are caused by inaccurate pose obtained by BundleFusion.

Thanks for your reply!So the gt_pose is in inaccurate,the pose evaluation with est_pose whether is not have reference value.Why neural slam paper use the not strictly gt pose for evaluation?

@HengyiWang
Copy link
Owner

Because you will only get the 100% gt poses in a synthetic dataset.

@HengyiWang HengyiWang added the good first issue Good for newcomers label Jul 26, 2023
@HengyiWang HengyiWang pinned this issue Jul 26, 2023
@zjulabwjt
Copy link
Author

hi, @HengyiWang ,I have some quesutions about how to eval runtime and params?
1、about params compute,what specific python statements i should use?
if it is correct use blow statements?
params = list(model.parameter()) num_params = sum([p.numel() for p in params]) print("num_params: {} M".format(num_params*4/(1024*1024)))
2、how to accuratly compute every iter in mapping and tracking? what specific python statements i should use?
Thanks for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants