Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessing of depth image for model-based inference #44

Closed
Trevor-wen opened this issue Apr 10, 2024 · 17 comments
Closed

Preprocessing of depth image for model-based inference #44

Trevor-wen opened this issue Apr 10, 2024 · 17 comments

Comments

@Trevor-wen
Copy link

Hi,
I have prepared a .obj model for a teacup and recorded a sequence of RGB and depth image sequence. All the needed files are organized like the provided demo data. However, the inference of the teacup is totally incorrect.

Maybe there is something wrong with the depth image preprocessing. I have a realsense d435i camera and have scaled the depth image like the linemod dataset (value of a pixel equals millimeter in real world).

Could you specify the preprocessing of depth image?

@wenbowen123
Copy link
Collaborator

#25 (comment)
can you try the suggestions there?

@savidini
Copy link

savidini commented Apr 10, 2024

Hi @Trevor-wen,

maybe I can help, as I also had to solve some problems when first using FoundationPose on own objects. 😅

1. Make sure that you CAD model is scaled in meters as mesh units

Unlike other methods that use mm as the mesh unit, FoundationPose uses meters.

Example if he mesh units are wrong (in mm):

44_wrong_scale.mp4

2. RGB and depth images must be aligned

The captured RGB and depth frames must be aligned. How this is done depends on the sensor used.

The following Python script is adapted from librealsense and can be used to record aligned and unaligned frames with a RealSense (I used it for the examples) :

record_realsense_foundationpose.py
## License: Apache 2.0. See LICENSE file in root directory.
## Copyright(c) 2017 Intel Corporation. All Rights Reserved.

#####################################################
##              Align Depth to Color               ##
#####################################################

import pyrealsense2 as rs
import numpy as np
import cv2
import json
import time
import os

# Create a pipeline
pipeline = rs.pipeline()

# Create a config and configure the pipeline to stream
# different resolutions of color and depth streams
config = rs.config()

# Get device product line for setting a supporting resolution
pipeline_wrapper = rs.pipeline_wrapper(pipeline)
pipeline_profile = config.resolve(pipeline_wrapper)
device = pipeline_profile.get_device()
device_product_line = str(device.get_info(rs.camera_info.product_line))

found_rgb = False
for s in device.sensors:
    if s.get_info(rs.camera_info.name) == "RGB Camera":
        found_rgb = True
        break
if not found_rgb:
    print("The demo requires Depth camera with Color sensor")
    exit(0)

config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)

if device_product_line == "L500":
    config.enable_stream(rs.stream.color, 960, 540, rs.format.bgr8, 30)
else:
    config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

# Start streaming
profile = pipeline.start(config)

# Getting the depth sensor's depth scale (see rs-align example for explanation)
depth_sensor = profile.get_device().first_depth_sensor()
depth_scale = depth_sensor.get_depth_scale()
print("Depth Scale is: ", depth_scale)

# We will be removing the background of objects more than
#  clipping_distance_in_meters meters away
clipping_distance_in_meters = 1  # 1 meter
clipping_distance = clipping_distance_in_meters / depth_scale

# Create an align object
# rs.align allows us to perform alignment of depth frames to others frames
# The "align_to" is the stream type to which we plan to align depth frames.
align_to = rs.stream.color
align = rs.align(align_to)

# Get the absolute path to the subfolder
script_dir = os.path.dirname(os.path.abspath(__file__))
subfolder_depth = os.path.join(script_dir, "out/depth")
subfolder_rgb = os.path.join(script_dir, "out/rgb")
subfolder_depth_unaligned = os.path.join(script_dir, "out/depth_unaligned")
subfolder_rgb_unaligned = os.path.join(script_dir, "out/rgb_unaligned")

# Check if the subfolder exists, and create it if it does not
if not os.path.exists(subfolder_depth):
    os.makedirs(subfolder_depth)
if not os.path.exists(subfolder_rgb):
    os.makedirs(subfolder_rgb)
if not os.path.exists(subfolder_depth_unaligned):
    os.makedirs(subfolder_depth_unaligned)
if not os.path.exists(subfolder_rgb_unaligned):
    os.makedirs(subfolder_rgb_unaligned)

# Create all 

RecordStream = False

# Streaming loop
try:
    while True:
        # Get frameset of color and depth
        frames = pipeline.wait_for_frames()
        # frames.get_depth_frame() is a 640x360 depth image

        # Align the depth frame to color frame
        aligned_frames = align.process(frames)

        # Get aligned frames
        aligned_depth_frame = (
            aligned_frames.get_depth_frame()
        )  # aligned_depth_frame is a 640x480 depth image
        color_frame = aligned_frames.get_color_frame()

        unaligned_depth_frame = frames.get_depth_frame()
        unaligned_color_frame = frames.get_color_frame()

        # Get instrinsics from aligned_depth_frame
        intrinsics = aligned_depth_frame.profile.as_video_stream_profile().intrinsics

        # Validate that both frames are valid
        if not aligned_depth_frame or not color_frame:
            continue

        depth_image = np.asanyarray(aligned_depth_frame.get_data())
        color_image = np.asanyarray(color_frame.get_data())

        # Remove background - Set pixels further than clipping_distance to grey
        grey_color = 153
        depth_image_3d = np.dstack(
            (depth_image, depth_image, depth_image)
        )  # depth image is 1 channel, color is 3 channels
        bg_removed = np.where(
            (depth_image_3d > clipping_distance) | (depth_image_3d <= 0),
            grey_color,
            color_image,
        )

        unaligned_depth_image = np.asanyarray(unaligned_depth_frame.get_data())
        unaligned_rgb_image = np.asanyarray(unaligned_color_frame.get_data())

        # Render images:
        #   depth align to color on left
        #   depth on right
        depth_colormap = cv2.applyColorMap(
            cv2.convertScaleAbs(depth_image, alpha=0.03), cv2.COLORMAP_JET
        )
        images = np.hstack((color_image, depth_colormap))

        cv2.namedWindow("Align Example", cv2.WINDOW_NORMAL)
        cv2.imshow("Align Example", images)

        key = cv2.waitKey(1)

        # Start saving the frames if space is pressed once until it is pressed again
        if key & 0xFF == ord(" "):
            if not RecordStream:
                time.sleep(0.2)
                RecordStream = True

                with open(os.path.join(script_dir, "out/cam_K.txt"), "w") as f:
                    f.write(f"{intrinsics.fx} {0.0} {intrinsics.ppx}\n")
                    f.write(f"{0.0} {intrinsics.fy} {intrinsics.ppy}\n")
                    f.write(f"{0.0} {0.0} {1.0}\n")

                print("Recording started")
            else:
                RecordStream = False
                print("Recording stopped")

        if RecordStream:
            framename = int(round(time.time() * 1000))

            # Define the path to the image file within the subfolder
            image_path_depth = os.path.join(subfolder_depth, f"{framename}.png")
            image_path_rgb = os.path.join(subfolder_rgb, f"{framename}.png")
            image_path_depth_unaligned = os.path.join(subfolder_depth_unaligned, f"{framename}.png")
            image_path_rgb_unaligned = os.path.join(subfolder_rgb_unaligned, f"{framename}.png")

            cv2.imwrite(image_path_depth, depth_image)
            cv2.imwrite(image_path_rgb, color_image)
            cv2.imwrite(image_path_depth_unaligned, unaligned_depth_image)
            cv2.imwrite(image_path_rgb_unaligned, unaligned_rgb_image)

        # Press esc or 'q' to close the image window
        if key & 0xFF == ord("q") or key == 27:

            cv2.destroyAllWindows()

            break
finally:
    pipeline.stop()

Example if the RBG and depth frames are not aligned properly:

44_unaligned.mp4

3. Wrong sensor intrinsics

Make sure that you use the correct instrinsics in the following format (RealSense with pyrealsense2, see code above):

intrinsics.fx 0.0 intrinsics.ppx
0.0 intrinsics.fy intrinsics.ppy
0.0 0.0 1.0

Example of very wrong intrinsics:

44_intrinsics.mp4

4. Impressive pose estimation when everything is done right

Example if everything works fine:

44_correct.mp4

Edit: @wenbowen123 was quicker, but maybe it still helps. 😃

@ethanshenze
Copy link

@savidini Could you tell me how to modify the unit of YCB-Video objects CAD models?

@savidini
Copy link

@Ethan-shen-lab you can use software like MeshLab to manually scale down objects:
Import > Filters > Normals, ... > Transform: Scale, ... > Check Uniform and scale one axis > Export

You can also use packages like for example trimesh to do this in Python, see simple example below:

import trimesh
mesh = trimesh.load('path_to_your_file.obj')
mesh.apply_scale(0.001)
mesh.export('scaled_down_file.obj')

I am not sure what you want to do (run_ycb_video.py or run_demo.py with custom data?). However, it seems that there are two versions of the YCB-V datasets, the BOP version is converted to millimeters, but the original version uses meters and I suppose it should work without any changes in FoundationPose.

@ethanshenze
Copy link

@Ethan-shen-lab you can use software like MeshLab to manually scale down objects: Import > Filters > Normals, ... > Transform: Scale, ... > Check Uniform and scale one axis > Export

You can also use packages like for example trimesh to do this in Python, see simple example below:

import trimesh
mesh = trimesh.load('path_to_your_file.obj')
mesh.apply_scale(0.001)
mesh.export('scaled_down_file.obj')

I am not sure what you want to do (run_ycb_video.py or run_demo.py with custom data?). However, it seems that there are two versions of the YCB-V datasets, the BOP version is converted to millimeters, but the original version uses meters and I suppose it should work without any changes in FoundationPose.

thank you very much! And I have another question. I used the realsense code you provided to collect image data, but after running the run_demo.py, an error like this occurred: RuntimeError: Cuda error: 2[cudaMalloc(&m_gpuPtr, bytes);].
So I suspected that there was a problem with my own data, because I could run it successfully using the officially provided image data. Can you share the data you collected? I want to verify if my guess is correct.

@savidini
Copy link

@ethanshenze Example data with the Rubik's Cube used for the last video in my comment above. (Setup: RTX4090 and Docker with CUDA 12.1 as described in #27)

@ethanshenze
Copy link

ethanshenze commented Apr 11, 2024

@ethanshenze Example data with the Rubik's Cube used for the last video in my comment above. (Setup: RTX4090 and Docker with CUDA 12.1 as described in #27)

I really appreciate your help!

@Trevor-wen
Copy link
Author

@savidini Thanks a lot for the detailed instruction! I will try them as soon as possible. Really appreciate for the help!

@monajalal
Copy link

You can use either of MeshLab or Blender to scale down by 0.001 along each axis ethanshenze

@ethanshenze
Copy link

You can use either of MeshLab or Blender to scale down by 0.001 along each axis ethanshenze

copy that! I will try it and thanks you very much~

@aThinkingNeal
Copy link

aThinkingNeal commented Apr 16, 2024

@savidini @wenbowen123
Thanks for your instructions before!

However, after I tried the steps, I am facing an issue that the bounding box is too small and not following the object (banana)

Could you help with it and tell me what things could I do? Thanks in advance!

More Context:

The screenshot and files in the debug folder are attached below.

debug folder: https://drive.google.com/drive/folders/1bDfOyJq7fFKyRybSMrxkKaP9_HN6Xr6N?usp=sharing

image

I am using the banana CAD model from YCBV official website, a link from Bowen's previous repo https://github.com/wenbowen123/iros20-6d-pose-tracking

image

I have checked the scene_complete.ply file by visualizer and it seems fine to me (so I assume the depth images are ok?)

image

I have checked the model.obj file and it seems fine to me

image

@savidini
Copy link

savidini commented Apr 16, 2024

@aThinkingNeal can you maybe provide your intrinsics/cam_K.txt file? The model seems fine, but I was able to reproduce a somewhat similar behavior using wrong units for the intrinsics:

44_small.mp4

@aThinkingNeal
Copy link

@savidini Thanks for the advice! I have callibrated the cam_K.txt file and got the bounding box size back to normal.

However, the pose estimation seems to be wandering off after the first frame. I am using an apple as the object and put the debug info in the following folder:

https://drive.google.com/drive/folders/1BxYq0pwn7ROqTe-IhSWgmc2gw_cGN8WF?usp=sharing

The behavior is like the images below, the first frame is fine, then the boudning box starts to drift away, even though the object is not even moving:

image

image

image

@savidini
Copy link

@aThinkingNeal from the images of your debug output, it looks like there are several "skips" in the images you recorded, i.e. after img_1.png and img_54.png. Is this correct? And if so, are you running the inference on all images with the default run_demo.py?

Below is a video showing the effect of "skipping" frames, resulting in sudden changes in the tracked object:

44_skip.mp4

Apparently this cannot be handled by FoundationPose's tracking (although the correct pose will eventually be correct if enough frames are provided after a skip). This behavior is somewhat different from other methods that do not use tracking, but instead re-run the pose estimation on every frame.

If my assumption is correct, but you can't avoid these skips in your input, see #37 for running the pose estimation on every frame.

@wenbowen123
Copy link
Collaborator

Hi @Trevor-wen,

maybe I can help, as I also had to solve some problems when first using FoundationPose on own objects. 😅

1. Make sure that you CAD model is scaled in meters as mesh units

Unlike other methods that use mm as the mesh unit, FoundationPose uses meters.

Example if he mesh units are wrong (in mm):
44_wrong_scale.mp4

2. RGB and depth images must be aligned

The captured RGB and depth frames must be aligned. How this is done depends on the sensor used.

The following Python script is adapted from librealsense and can be used to record aligned and unaligned frames with a RealSense (I used it for the examples) :
record_realsense_foundationpose.py

Example if the RBG and depth frames are not aligned properly:
44_unaligned.mp4

3. Wrong sensor intrinsics

Make sure that you use the correct instrinsics in the following format (RealSense with pyrealsense2, see code above):

intrinsics.fx 0.0 intrinsics.ppx
0.0 intrinsics.fy intrinsics.ppy
0.0 0.0 1.0

Example of very wrong intrinsics:
44_intrinsics.mp4

4. Impressive pose estimation when everything is done right

Example if everything works fine:
44_correct.mp4

Edit: @wenbowen123 was quicker, but maybe it still helps. 😃

your banana model seems wrong in the scale. It's 2 meters long.

@wenbowen123
Copy link
Collaborator

@aThinkingNeal the file names need to be padded with 0 in front to make a fixed num of digits (see our example data)

@aThinkingNeal
Copy link

@wenbowen123 @savidini Thanks for the help!

I think my problem is solved by:

  1. adjusting the file names
  2. calibrate the cam.K file
  3. Use the .obj model provided by the YCBV official dataset

image

Now I am facing another issue about how to get an accurate custom CAD model, but I will ask it in another issue

Thanks again for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants