Preprocessing of depth image for model-based inference #44

Trevor-wen · 2024-04-10T13:29:29Z

Hi,
I have prepared a .obj model for a teacup and recorded a sequence of RGB and depth image sequence. All the needed files are organized like the provided demo data. However, the inference of the teacup is totally incorrect.

Maybe there is something wrong with the depth image preprocessing. I have a realsense d435i camera and have scaled the depth image like the linemod dataset (value of a pixel equals millimeter in real world).

Could you specify the preprocessing of depth image?

wenbowen123 · 2024-04-10T17:28:42Z

#25 (comment)
can you try the suggestions there?

savidini · 2024-04-10T17:50:33Z

Hi @Trevor-wen,

maybe I can help, as I also had to solve some problems when first using FoundationPose on own objects. 😅

1. Make sure that you CAD model is scaled in meters as mesh units

Unlike other methods that use mm as the mesh unit, FoundationPose uses meters.

Example if he mesh units are wrong (in mm):

44_wrong_scale.mp4

2. RGB and depth images must be aligned

The captured RGB and depth frames must be aligned. How this is done depends on the sensor used.

The following Python script is adapted from librealsense and can be used to record aligned and unaligned frames with a RealSense (I used it for the examples) :

record_realsense_foundationpose.py

## License: Apache 2.0. See LICENSE file in root directory.
## Copyright(c) 2017 Intel Corporation. All Rights Reserved.

#####################################################
##              Align Depth to Color               ##
#####################################################

import pyrealsense2 as rs
import numpy as np
import cv2
import json
import time
import os

# Create a pipeline
pipeline = rs.pipeline()

# Create a config and configure the pipeline to stream
# different resolutions of color and depth streams
config = rs.config()

# Get device product line for setting a supporting resolution
pipeline_wrapper = rs.pipeline_wrapper(pipeline)
pipeline_profile = config.resolve(pipeline_wrapper)
device = pipeline_profile.get_device()
device_product_line = str(device.get_info(rs.camera_info.product_line))

found_rgb = False
for s in device.sensors:
    if s.get_info(rs.camera_info.name) == "RGB Camera":
        found_rgb = True
        break
if not found_rgb:
    print("The demo requires Depth camera with Color sensor")
    exit(0)

config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)

if device_product_line == "L500":
    config.enable_stream(rs.stream.color, 960, 540, rs.format.bgr8, 30)
else:
    config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

# Start streaming
profile = pipeline.start(config)

# Getting the depth sensor's depth scale (see rs-align example for explanation)
depth_sensor = profile.get_device().first_depth_sensor()
depth_scale = depth_sensor.get_depth_scale()
print("Depth Scale is: ", depth_scale)

# We will be removing the background of objects more than
#  clipping_distance_in_meters meters away
clipping_distance_in_meters = 1  # 1 meter
clipping_distance = clipping_distance_in_meters / depth_scale

# Create an align object
# rs.align allows us to perform alignment of depth frames to others frames
# The "align_to" is the stream type to which we plan to align depth frames.
align_to = rs.stream.color
align = rs.align(align_to)

# Get the absolute path to the subfolder
script_dir = os.path.dirname(os.path.abspath(__file__))
subfolder_depth = os.path.join(script_dir, "out/depth")
subfolder_rgb = os.path.join(script_dir, "out/rgb")
subfolder_depth_unaligned = os.path.join(script_dir, "out/depth_unaligned")
subfolder_rgb_unaligned = os.path.join(script_dir, "out/rgb_unaligned")

# Check if the subfolder exists, and create it if it does not
if not os.path.exists(subfolder_depth):
    os.makedirs(subfolder_depth)
if not os.path.exists(subfolder_rgb):
    os.makedirs(subfolder_rgb)
if not os.path.exists(subfolder_depth_unaligned):
    os.makedirs(subfolder_depth_unaligned)
if not os.path.exists(subfolder_rgb_unaligned):
    os.makedirs(subfolder_rgb_unaligned)

# Create all 

RecordStream = False

# Streaming loop
try:
    while True:
        # Get frameset of color and depth
        frames = pipeline.wait_for_frames()
        # frames.get_depth_frame() is a 640x360 depth image

        # Align the depth frame to color frame
        aligned_frames = align.process(frames)

        # Get aligned frames
        aligned_depth_frame = (
            aligned_frames.get_depth_frame()
        )  # aligned_depth_frame is a 640x480 depth image
        color_frame = aligned_frames.get_color_frame()

        unaligned_depth_frame = frames.get_depth_frame()
        unaligned_color_frame = frames.get_color_frame()

        # Get instrinsics from aligned_depth_frame
        intrinsics = aligned_depth_frame.profile.as_video_stream_profile().intrinsics

        # Validate that both frames are valid
        if not aligned_depth_frame or not color_frame:
            continue

        depth_image = np.asanyarray(aligned_depth_frame.get_data())
        color_image = np.asanyarray(color_frame.get_data())

        # Remove background - Set pixels further than clipping_distance to grey
        grey_color = 153
        depth_image_3d = np.dstack(
            (depth_image, depth_image, depth_image)
        )  # depth image is 1 channel, color is 3 channels
        bg_removed = np.where(
            (depth_image_3d > clipping_distance) | (depth_image_3d <= 0),
            grey_color,
            color_image,
        )

        unaligned_depth_image = np.asanyarray(unaligned_depth_frame.get_data())
        unaligned_rgb_image = np.asanyarray(unaligned_color_frame.get_data())

        # Render images:
        #   depth align to color on left
        #   depth on right
        depth_colormap = cv2.applyColorMap(
            cv2.convertScaleAbs(depth_image, alpha=0.03), cv2.COLORMAP_JET
        )
        images = np.hstack((color_image, depth_colormap))

        cv2.namedWindow("Align Example", cv2.WINDOW_NORMAL)
        cv2.imshow("Align Example", images)

        key = cv2.waitKey(1)

        # Start saving the frames if space is pressed once until it is pressed again
        if key & 0xFF == ord(" "):
            if not RecordStream:
                time.sleep(0.2)
                RecordStream = True

                with open(os.path.join(script_dir, "out/cam_K.txt"), "w") as f:
                    f.write(f"{intrinsics.fx} {0.0} {intrinsics.ppx}\n")
                    f.write(f"{0.0} {intrinsics.fy} {intrinsics.ppy}\n")
                    f.write(f"{0.0} {0.0} {1.0}\n")

                print("Recording started")
            else:
                RecordStream = False
                print("Recording stopped")

        if RecordStream:
            framename = int(round(time.time() * 1000))

            # Define the path to the image file within the subfolder
            image_path_depth = os.path.join(subfolder_depth, f"{framename}.png")
            image_path_rgb = os.path.join(subfolder_rgb, f"{framename}.png")
            image_path_depth_unaligned = os.path.join(subfolder_depth_unaligned, f"{framename}.png")
            image_path_rgb_unaligned = os.path.join(subfolder_rgb_unaligned, f"{framename}.png")

            cv2.imwrite(image_path_depth, depth_image)
            cv2.imwrite(image_path_rgb, color_image)
            cv2.imwrite(image_path_depth_unaligned, unaligned_depth_image)
            cv2.imwrite(image_path_rgb_unaligned, unaligned_rgb_image)

        # Press esc or 'q' to close the image window
        if key & 0xFF == ord("q") or key == 27:

            cv2.destroyAllWindows()

            break
finally:
    pipeline.stop()

Example if the RBG and depth frames are not aligned properly:

44_unaligned.mp4

3. Wrong sensor intrinsics

Make sure that you use the correct instrinsics in the following format (RealSense with pyrealsense2, see code above):

intrinsics.fx 0.0 intrinsics.ppx
0.0 intrinsics.fy intrinsics.ppy
0.0 0.0 1.0

Example of very wrong intrinsics:

44_intrinsics.mp4

4. Impressive pose estimation when everything is done right

Example if everything works fine:

44_correct.mp4

Edit: @wenbowen123 was quicker, but maybe it still helps. 😃

ethanshenze · 2024-04-11T02:01:21Z

@savidini Could you tell me how to modify the unit of YCB-Video objects CAD models?

savidini · 2024-04-11T09:56:07Z

@Ethan-shen-lab you can use software like MeshLab to manually scale down objects:
Import > Filters > Normals, ... > Transform: Scale, ... > Check Uniform and scale one axis > Export

You can also use packages like for example trimesh to do this in Python, see simple example below:

import trimesh
mesh = trimesh.load('path_to_your_file.obj')
mesh.apply_scale(0.001)
mesh.export('scaled_down_file.obj')

I am not sure what you want to do (run_ycb_video.py or run_demo.py with custom data?). However, it seems that there are two versions of the YCB-V datasets, the BOP version is converted to millimeters, but the original version uses meters and I suppose it should work without any changes in FoundationPose.

ethanshenze · 2024-04-11T11:49:36Z

@Ethan-shen-lab you can use software like MeshLab to manually scale down objects: Import > Filters > Normals, ... > Transform: Scale, ... > Check Uniform and scale one axis > Export

You can also use packages like for example trimesh to do this in Python, see simple example below:
import trimesh
mesh = trimesh.load('path_to_your_file.obj')
mesh.apply_scale(0.001)
mesh.export('scaled_down_file.obj')
I am not sure what you want to do (run_ycb_video.py or run_demo.py with custom data?). However, it seems that there are two versions of the YCB-V datasets, the BOP version is converted to millimeters, but the original version uses meters and I suppose it should work without any changes in FoundationPose.

thank you very much! And I have another question. I used the realsense code you provided to collect image data, but after running the run_demo.py, an error like this occurred: RuntimeError: Cuda error: 2[cudaMalloc(&m_gpuPtr, bytes);].
So I suspected that there was a problem with my own data, because I could run it successfully using the officially provided image data. Can you share the data you collected? I want to verify if my guess is correct.

savidini · 2024-04-11T12:10:55Z

@ethanshenze Example data with the Rubik's Cube used for the last video in my comment above. (Setup: RTX4090 and Docker with CUDA 12.1 as described in #27)

ethanshenze · 2024-04-11T12:42:25Z

@ethanshenze Example data with the Rubik's Cube used for the last video in my comment above. (Setup: RTX4090 and Docker with CUDA 12.1 as described in #27)

I really appreciate your help！

Trevor-wen · 2024-04-11T15:31:28Z

@savidini Thanks a lot for the detailed instruction! I will try them as soon as possible. Really appreciate for the help!

monajalal · 2024-04-11T17:35:11Z

You can use either of MeshLab or Blender to scale down by 0.001 along each axis ethanshenze

ethanshenze · 2024-04-12T03:08:53Z

You can use either of MeshLab or Blender to scale down by 0.001 along each axis ethanshenze

copy that! I will try it and thanks you very much~

aThinkingNeal · 2024-04-16T06:54:20Z

@savidini @wenbowen123
Thanks for your instructions before!

However, after I tried the steps, I am facing an issue that the bounding box is too small and not following the object (banana)

Could you help with it and tell me what things could I do? Thanks in advance!

More Context:

The screenshot and files in the debug folder are attached below.

debug folder: https://drive.google.com/drive/folders/1bDfOyJq7fFKyRybSMrxkKaP9_HN6Xr6N?usp=sharing

I am using the banana CAD model from YCBV official website, a link from Bowen's previous repo https://github.com/wenbowen123/iros20-6d-pose-tracking

I have checked the scene_complete.ply file by visualizer and it seems fine to me (so I assume the depth images are ok?)

I have checked the model.obj file and it seems fine to me

savidini · 2024-04-16T10:42:59Z

@aThinkingNeal can you maybe provide your intrinsics/cam_K.txt file? The model seems fine, but I was able to reproduce a somewhat similar behavior using wrong units for the intrinsics:

44_small.mp4

aThinkingNeal · 2024-04-16T12:34:24Z

@savidini Thanks for the advice! I have callibrated the cam_K.txt file and got the bounding box size back to normal.

However, the pose estimation seems to be wandering off after the first frame. I am using an apple as the object and put the debug info in the following folder:

https://drive.google.com/drive/folders/1BxYq0pwn7ROqTe-IhSWgmc2gw_cGN8WF?usp=sharing

The behavior is like the images below, the first frame is fine, then the boudning box starts to drift away, even though the object is not even moving:

savidini · 2024-04-16T16:41:11Z

@aThinkingNeal from the images of your debug output, it looks like there are several "skips" in the images you recorded, i.e. after img_1.png and img_54.png. Is this correct? And if so, are you running the inference on all images with the default run_demo.py?

Below is a video showing the effect of "skipping" frames, resulting in sudden changes in the tracked object:

44_skip.mp4

Apparently this cannot be handled by FoundationPose's tracking (although the correct pose will eventually be correct if enough frames are provided after a skip). This behavior is somewhat different from other methods that do not use tracking, but instead re-run the pose estimation on every frame.

If my assumption is correct, but you can't avoid these skips in your input, see #37 for running the pose estimation on every frame.

wenbowen123 · 2024-04-16T21:36:25Z

Hi @Trevor-wen,

maybe I can help, as I also had to solve some problems when first using FoundationPose on own objects. 😅

1. Make sure that you CAD model is scaled in meters as mesh units

Unlike other methods that use mm as the mesh unit, FoundationPose uses meters.

Example if he mesh units are wrong (in mm):
44_wrong_scale.mp4

2. RGB and depth images must be aligned

The captured RGB and depth frames must be aligned. How this is done depends on the sensor used.

The following Python script is adapted from librealsense and can be used to record aligned and unaligned frames with a RealSense (I used it for the examples) :
record_realsense_foundationpose.py

Example if the RBG and depth frames are not aligned properly:
44_unaligned.mp4

3. Wrong sensor intrinsics

Make sure that you use the correct instrinsics in the following format (RealSense with pyrealsense2, see code above):
intrinsics.fx 0.0 intrinsics.ppx
0.0 intrinsics.fy intrinsics.ppy
0.0 0.0 1.0
Example of very wrong intrinsics:
44_intrinsics.mp4

4. Impressive pose estimation when everything is done right

Example if everything works fine:
44_correct.mp4

Edit: @wenbowen123 was quicker, but maybe it still helps. 😃

your banana model seems wrong in the scale. It's 2 meters long.

wenbowen123 · 2024-04-16T21:39:47Z

@aThinkingNeal the file names need to be padded with 0 in front to make a fixed num of digits (see our example data)

aThinkingNeal · 2024-04-17T08:43:20Z

@wenbowen123 @savidini Thanks for the help!

I think my problem is solved by:

adjusting the file names
calibrate the cam.K file
Use the .obj model provided by the YCBV official dataset

Now I am facing another issue about how to get an accurate custom CAD model, but I will ask it in another issue

Thanks again for your help!

savidini mentioned this issue Apr 11, 2024

Bad performance with custom dataset #48

Closed

wenbowen123 mentioned this issue Apr 12, 2024

Reason your own model data #42

Closed

Trevor-wen closed this as completed Apr 13, 2024

GavinYang5 mentioned this issue Apr 18, 2024

Please init pose by register first #65

Closed

KomputerMaster64 mentioned this issue Apr 24, 2024

Error while running the run_demo.py file for a novel object (model based) #83

Closed

tobottyx mentioned this issue Apr 27, 2024

Some issues with the custom dataset #92

Closed

eunseon02 mentioned this issue Apr 28, 2024

bad result with custom data #94

Closed

eunseon02 mentioned this issue May 7, 2024

poor performance in rotation #109

Closed

shzcuber mentioned this issue May 8, 2024

Poor performance on fast-moving data & inconsistent orientation on constant pose estimations #102

Closed

nirajkark07 mentioned this issue May 20, 2024

Pose Estimation on the First Frame is Out of Scale #132

Closed

wenbowen123 mentioned this issue May 27, 2024

Coordinate system #143

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing of depth image for model-based inference #44

Preprocessing of depth image for model-based inference #44

Trevor-wen commented Apr 10, 2024

wenbowen123 commented Apr 10, 2024

savidini commented Apr 10, 2024 •

edited

ethanshenze commented Apr 11, 2024

savidini commented Apr 11, 2024

ethanshenze commented Apr 11, 2024

savidini commented Apr 11, 2024

ethanshenze commented Apr 11, 2024 •

edited

Trevor-wen commented Apr 11, 2024

monajalal commented Apr 11, 2024

ethanshenze commented Apr 12, 2024

aThinkingNeal commented Apr 16, 2024 •

edited

savidini commented Apr 16, 2024 •

edited

aThinkingNeal commented Apr 16, 2024

savidini commented Apr 16, 2024

wenbowen123 commented Apr 16, 2024

1. Make sure that you CAD model is scaled in meters as mesh units

2. RGB and depth images must be aligned

3. Wrong sensor intrinsics

4. Impressive pose estimation when everything is done right

wenbowen123 commented Apr 16, 2024

aThinkingNeal commented Apr 17, 2024

Preprocessing of depth image for model-based inference #44

Preprocessing of depth image for model-based inference #44

Comments

Trevor-wen commented Apr 10, 2024

wenbowen123 commented Apr 10, 2024

savidini commented Apr 10, 2024 • edited

1. Make sure that you CAD model is scaled in meters as mesh units

2. RGB and depth images must be aligned

3. Wrong sensor intrinsics

4. Impressive pose estimation when everything is done right

ethanshenze commented Apr 11, 2024

savidini commented Apr 11, 2024

ethanshenze commented Apr 11, 2024

savidini commented Apr 11, 2024

ethanshenze commented Apr 11, 2024 • edited

Trevor-wen commented Apr 11, 2024

monajalal commented Apr 11, 2024

ethanshenze commented Apr 12, 2024

aThinkingNeal commented Apr 16, 2024 • edited

savidini commented Apr 16, 2024 • edited

aThinkingNeal commented Apr 16, 2024

savidini commented Apr 16, 2024

wenbowen123 commented Apr 16, 2024

1. Make sure that you CAD model is scaled in meters as mesh units

2. RGB and depth images must be aligned

3. Wrong sensor intrinsics

4. Impressive pose estimation when everything is done right

wenbowen123 commented Apr 16, 2024

aThinkingNeal commented Apr 17, 2024

savidini commented Apr 10, 2024 •

edited

ethanshenze commented Apr 11, 2024 •

edited

aThinkingNeal commented Apr 16, 2024 •

edited

savidini commented Apr 16, 2024 •

edited