Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the Voxel Features #77

Closed
taylover-pei opened this issue Sep 18, 2021 · 4 comments
Closed

Question about the Voxel Features #77

taylover-pei opened this issue Sep 18, 2021 · 4 comments

Comments

@taylover-pei
Copy link

Congratulations on your great work!

I have read your paper and have several questions that bother me:

In your work,

  1. Firstly, the voxel grid is first generated.
  2. Secondly, use the gird_to_lidar, lidar_to_cam, cam_to_img transformation to find the correspondence between the grid coordinates and the image coordinates.
  3. Thirdly, use grid_sample to sample features from Frustum to Voxel.
  4. Finally, Voxel collapse to BEV features

Since, in my opinion, the BEV features represent the world coordinates. My question is, why not just use BEV features to generate 'BEV grid', which represents the real world (lidar) coordinates? So, the gird_to_lidar step can be omitted. Am I right?

I am still confused about the 'Voxel Features'. I don'y know what is it used for?

Thank you very much, looking forward to your replay!

@codyreading
Copy link
Member

codyreading commented Sep 20, 2021

Hi and thanks for the interest!

So to answer your second question, voxel_features refers to the 3D voxel feature grid, which is referred in the paper as V. We generate this as an intermediate 3D representation before collapsing it to a BEV feature grid bev_features.

For both voxel_features and bev_features, their coordinates aren't in real world coordinates but rather in what I refer to as grid coordinates, where the coordinates are the grid cell index. Meaning that coordinates range from (0, R) where R is the maximum number of cells in a specific axis. Real world coordinates range from values in metres, which is the range shown here. You need the grid_to_lidar transformation to convert from grid indices to real world coordinates in meters.

@taylover-pei
Copy link
Author

Thanks for your reply!

There exists another question:

Is it possible to directly transform the Frustum Features to BEV features without using the Voxel Features?

Thank you very much, looking forward to your replay!

@codyreading
Copy link
Member

Yes, it would be possible if you use the same strategy as PointPillars. Essentially, you construct your voxel grid such that it only has one height layer (voxel_size_z = 4 for KITTI). This results in voxel_features being equivalent bev_features, and can use it directly in the 3D object detection stage. An issue with this a forsee is that you only have one sampling point for each "pillar" (Center of the pillar in CaDDN), where the pillar feature should include information from all points within the pillar. This is why we construct the voxel grid first, and collapse it to BEV such that it includes information from all points within the pillar.

@taylover-pei
Copy link
Author

Thank you very much. I have got it! It really helps me a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants