Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation on Replica Dataset #18

Closed
Davidyao99 opened this issue Dec 12, 2023 · 2 comments
Closed

Evaluation on Replica Dataset #18

Davidyao99 opened this issue Dec 12, 2023 · 2 comments

Comments

@Davidyao99
Copy link

Great Work!

From my understanding, it seems like you are creating pointclouds for each 3D segment instead of "spraying" the masked segments onto a fixed set of scene pointclouds. If this is correct, may I ask how are you evaluating your mAcc and F-mIoU reported in the paper for semantic segmentation on Replica dataset? This is because the groundtruth pointclouds may not coincide with the pointclouds that your concept graph creates?

Thanks!

@georgegu1997
Copy link
Contributor

georgegu1997 commented Jan 4, 2024

Thanks for your interest and sorry for the delayed reply!

To compute the segmentation metrics on the Replica dataset, we did the following steps:

  1. We obtained the GT point clouds with semantic segmentation by running a SLAM system with per-point embedding. We used the dataset provided by SemanticNeRF, where the semantic segmentation masks are rendered in addition to the RGB-D images. The 2D semantic masks are treated as one-hot embedding per pixel and fused into 3D points using GradSLAM. The obtained point cloud with per-point embeddings is treated as the ground truth 3D semantic segmentation results.
  2. For evaluation, for each point in the GT point cloud, we compute its 1-NN in the predicted point cloud. In the case of concept graph, we stack all points from all objects together and find the 1-NN among all of them. By comparing the GT class of the point and the predicted class of the 1-NN, we built up the confusion matrix. Then the mAcc is computed as the class-mean recall score from the confusion matrix.

In this way, we always take the GT point cloud as the reference in evaluation and all reported numbers are evaluated against the same GT point cloud.

@georgegu1997
Copy link
Contributor

Updated the evaluation scripts in the repo and closing this issue. Let me know if you have any other questions! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants