Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pose Estimation without tracking, poor perfomance #97

Open
dhanushTDS opened this issue May 1, 2024 · 5 comments
Open

Pose Estimation without tracking, poor perfomance #97

dhanushTDS opened this issue May 1, 2024 · 5 comments

Comments

@dhanushTDS
Copy link

When doing pose estimation only and no tracking, what is the suggested format for RGB, Depth, and Mask Images? Currently, I have scraped some of your code together like this, but I am getting poor results.

Also, to clarify what does the pose correspond to : the transformation of object wrt to camera or the other way around

def get_color(image_input):
    color = imageio.imread(image_input)[...,:3]
    color = cv2.resize(color, (480, 640), interpolation=cv2.INTER_NEAREST)
    return color

def get_depth(depth_input):
    depth = cv2.imread(depth_input,-1)/1e3
    depth = cv2.resize(depth, (480, 640), interpolation=cv2.INTER_NEAREST)
    depth[(depth<0.1) | (depth>=np.inf)] = 0
    return depth

def get_mask(mask_input):
    mask = cv2.imread(mask_input,-1)
    if len(mask.shape)==3:
      for c in range(3):
        if mask[...,c].sum()>0:
          mask = mask[...,c]
          break
    mask = cv2.resize(mask, (480, 640), interpolation=cv2.INTER_NEAREST).astype(bool).astype(np.uint8)
    return mask


peg_rgb_image = get_color(rgb_image_path)
depth_image = get_depth(depth_image_path)
mask_image = get_mask(mask_image_path).astype(bool)

cv2.imshow('1', peg_rgb_image)
cv2.waitKey(0)  # Wait indefinitely until a key is pressed
cv2.destroyAllWindows()

pose = est.register(K=rros_camera_I_matrix, rgb=peg_rgb_image, depth=depth_image, ob_mask=mask_image, iteration=est_refine_iter)

center_pose = pose @ np.linalg.inv(to_origin)
vis = draw_posed_3d_box(rros_camera_I_matrix, img=peg_rgb_image, ob_in_cam=center_pose, bbox=bbox)
vis = draw_xyz_axis(peg_rgb_image, ob_in_cam=center_pose, scale=0.1, K=rros_camera_I_matrix, thickness=3, transparency=0, is_input_rgb=True)
# cv2.imshow('1', vis[...,::-1])
cv2.imshow('1', vis)
cv2.waitKey(0)  # Wait indefinitely until a key is pressed
cv2.destroyAllWindows()
@dhanushTDS
Copy link
Author

@wenbowen123

@wenbowen123
Copy link
Collaborator

Hi, there are typically issues about setting up your data, in particular the depth format. I'd suggest to search in the issues as there are couple related.

The estimated poses are object to the camera.

@utsavrai
Copy link

Hi @dhanushTDS and @wenbowen123, in order to perform pose estimation of multiple images, do we need to provide mask for each input rgb image?

@dhanushTDS
Copy link
Author

If we do what @ wenbowen123 mentioned, where we are not tracking; instead, we treat each image as a sample from an unordered sequence, then we have to supply the mask to the estimator for each image.

In their example code, they are tracking on frames of a continuous video and only need the mask for the first image.

@utsavrai
Copy link

Thanks @dhanushTDS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants