Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Few questions. #6

Closed
gachiemchiep opened this issue Aug 21, 2020 · 5 comments
Closed

Few questions. #6

gachiemchiep opened this issue Aug 21, 2020 · 5 comments

Comments

@gachiemchiep
Copy link

Hello @DC1991
Thank you for your work.

I run your code and it executed well.
Unfortunately, I don't understand the meaning of output values (R and T). Would you mind give me some explaination ?

In your paper I found this part. Is the code to manipulate including in current source code?
If it isn't included, would you mind give me a link to the code?

However, both LINEMOD and YCB-Video datasets do
not contain the label for each point of the point cloud. To
train G2L-Net in a supervised fashion, we adopt an automatic way to label each point of the point cloud of [?]. As
described in [?], we label each point in two steps§ First, for
the 3D model of an object, we transform it into the camera
coordinate using the corresponding ground truth. We adopt
the implementation provided by [14] for this process.

Thank you

@DC1991
Copy link
Owner

DC1991 commented Aug 21, 2020

Hi @gachiemchiep Thanks for your interest of the paper. The output value of R is the coordinate of 3D bounding box which is a 24D vector in the paper, and the output value of T is [x, y, z]. The labeling process is not available yet, but we use the implementation in this git (https://github.com/thodan/bop_toolkit) to transfer the 3D object model to the scene.

@gachiemchiep
Copy link
Author

@DC1991
Thank you for your reply.
I understand the meaning of R. So what is the meaning of [x, y, z] of T?
I will lurk into bop_toolkit to find more detail about creating training dataset.

@DC1991
Copy link
Owner

DC1991 commented Aug 22, 2020

@gachiemchiep Sorry for unclear description. [x,y,z] means the 3D coordinate of T which is the translation vector.

@gachiemchiep
Copy link
Author

@DC1991 Thank you for your explaination
I'm trying to visualize the detection result.

Is the depth data and RGB use the same coordinate origin? So the [x, y, z] can be understood as:

  1. (x, y) = coordinate of detected point in image space
  2. z = the value of depth.

Sorry I'm totally loss at understanding the T (translation). Is T the translation value between image and depth ?

@DC1991
Copy link
Owner

DC1991 commented Aug 24, 2020

@gachiemchiep . We use RGB to locate the 2D bounding box of the object, and we transform the depth image to point cloud with known camera parameters. So [x,y,z] is the 3D coordinates of the points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants