Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use Realsense L515 to find the real world (x,y) coordinate of the corner of an object? #10484

Closed
kennangaibel opened this issue May 9, 2022 · 27 comments

Comments

@kennangaibel
Copy link

kennangaibel commented May 9, 2022


Required Info
Camera Model L515
Firmware Version 01.05.08.01
Operating System & Version Windows 11
Kernel Version (Linux Only)
Platform PC
SDK Version
Language Python
Segment Robot

Issue Description

Hi, I am very new to using Realsense and would like some guidance on this problem. I need to take a single picture, then use that image to to convert a single point (the corner of an object), into real world cartesian coordinate system. The depth of the camera from the object will be known and both the camera and the object will be in a fixed position. The idea is that this camera will be used so that a robot knows where an object is and can grab it properly.

  1. How do I take a single image and not a continual stream of frames?
  2. How would I go about outputting a single (x,y) real-world coordinate from a corner of this image? I saw some interesting posts mentioning aligning the depth stream and then deprojecting, unsure if that is the best way to approach this?
    Thank you!
@MartyG-RealSense
Copy link
Collaborator

Hi @kennangaibel You could use a RealSense SDK instruction called save_single_frameset() to save a single frame to a file format called a rosbag. This is like a video recording of camera data that can be used instead of a live camera, except in this case the file will only contain one frame. Information about this instruction and a link to an example Python program for it can be found at #9778 (comment)

In regard to obtaining the 3D world-space coordinate of a single specific pixel coordinate on an image, this can be done with the instruction rs2_project_color_pixel_to_depth_pixel and information about this instruction is at #5603 (comment)

If you plan to use the bag file created by save_single_frameset() for the frame to obtain the pixel coordinate from then you will need to use the instruction cfg.enable_device_from_file to tell your program to retrieve the data from the file instead of a live camera. The RealSense SDK Python example program read_bag_example.py demonstrates how to do this by placing the instruction just before the pipe start line of a script.

https://github.com/IntelRealSense/librealsense/blob/master/wrappers/python/examples/read_bag_example.py#L36-#L42

@kennangaibel
Copy link
Author

kennangaibel commented May 10, 2022

Hi @MartyG-RealSense I've been able to get started thanks to your really helpful guidance but have a few more questions:

  1. In cfg.enable_device_from_file, I need to input the "args.input" or the bag filename if I'm not mistaken I am a bit confused as to where to get this from my current program. From my integration of save_single_frameset() I have:
`   pipeline = rs.pipeline()
    config = rs.config()
    rs.config.enable_device_from_file(config, args.input)

    config.enable_stream(rs.stream.depth, 1280, 720, rs.format.z16, 15)
    config.enable_stream(rs.stream.color, 1920, 1080, rs.format.rgb8, 15)

    profile = pipeline.start(config)

    filt = rs.save_single_frameset()

    for x in range(100):
        pipeline.wait_for_frames()

    frame = pipeline.wait_for_frames()
    filt.process(frame)`

Would the bag file name be filt? Or must I pass in the bag file_name another way?

  1. For 3D world-space coordinate, with the example code from [Question] How to use rs2_project_color_pixel_to_depth_pixel in python to convert 1 point from color coordinate into depth coordinate #5603 you sent, I'm assuming the color_points matrix stands for the 4 colored dots? For my specific application, I would just replace that with a single dot, a pixel from the screen, I find from my own opencv script right? Additionally, do I extract the x,y coordinate from depth_point_ somehow?
    depth_scale = profile.get_device().first_depth_sensor().get_depth_scale()
    depth_min = 0.11 #meter
    depth_max = 1.0 #meter

    depth_intrin = profile.get_stream(rs.stream.depth).as_video_stream_profile().get_intrinsics()
    color_intrin = profile.get_stream(rs.stream.color).as_video_stream_profile().get_intrinsics()

    depth_to_color_extrin =  profile.get_stream(rs.stream.depth).as_video_stream_profile().get_extrinsics_to( profile.get_stream(rs.stream.color))
    color_to_depth_extrin =  profile.get_stream(rs.stream.color).as_video_stream_profile().get_extrinsics_to( profile.get_stream(rs.stream.depth))

    color_points = [
        [400.0, 150.0],
        [560.0, 150.0],
        [560.0, 260.0],
        [400.0, 260.0]
    ]
    for color_point in color_points:

       depth_point_ = rs.rs2_project_color_pixel_to_depth_pixel(
                    depth_frame.get_data(), depth_scale,
                    depth_min, depth_max,
                    depth_intrin, color_intrin, depth_to_color_extrin, color_to_depth_extrin, color_point)
       print(depth_point_)

I apologize for all of the questions, it is my first time doing something like this. Thank you!

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented May 10, 2022

Do not worry, there is not a limit on the number of questions that you can ask. :) In regard to those questions:

  1. You can either use args.input or instead provide the bag file name as a specific filename or a string stored in a 'path' variable.

There should not be a space after the comma that separates commands in the bracket, and specific filenames should be placed within quotation marks. For example:

rs.config.enable_device_from_file(config,"test.bag")

Or

rs.config.enable_device_from_file(config,path)

  1. I would also assume (not certain) that the four white dots in the image at [Question] How to use rs2_project_color_pixel_to_depth_pixel in python to convert 1 point from color coordinate into depth coordinate #5603 (comment) represent the 4 color points of the matrix.

I researched Python scripts for rs2_project_color_pixel_to_depth_pixel carefully and concluded that the 4-dot script that you are using above is one of the best and most appropriate script for your project.

In a long and detailed discussion at #6239 about a RealSense user who wanted to use rs2_project_color_pixel_to_depth_pixel and OpenCV, a RealSense team member also provides a script at #6239 (comment) that includes a line for getting the depth value of a specific pixel.

@kennangaibel
Copy link
Author

kennangaibel commented May 12, 2022

@MartyG-RealSense

I was able to create a function def get_bag_file(self) that can successfully use save_single_frameset to output a bag file. After I get this bag file, I will then need to input it into rs.config.enable_device_from_file(). Here is what I have so far for that:

# Obtains a bag file from a single frame taken by L515
def get_bag_file(self):
    # Obtains a bag file from a single frame taken by L515
    pipeline = rs.pipeline()
    config = rs.config()

    # lower resolution 640, 480 rather than 1280, 720 for pipeline.start(config) to work
    config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
    config.enable_stream(rs.stream.color, 1920, 1080, rs.format.rgb8, 15)

    # Align objects
    align_to = rs.stream.color
    align = rs.align(align_to)

    profile = pipeline.start(config)
    filt = rs.save_single_frameset()

    for x in range(100):
        pipeline.wait_for_frames()

    frame = pipeline.wait_for_frames()
    filt.process(frame)

    pipeline.stop()
    # Returns the single frame captured by the camera
    return filt

# Finds coordinate
def find_corner(filt):
    # Creates Pipeline
    pipeline = rs.pipeline()
    # Creates a config object
    config = rs.config()
    # Tell config that we will use a recorded device from file to be used by the pipeline through playback.
    # Allows us to use the bag file created by save_single_frameset
    rs.config.enable_device_from_file(config,filt)

    # Configure the pipeline to stream the depth stream
    # Change this parameters according to the recorded bag file resolution
    config.enable_stream(rs.stream.depth, 1280, 720, rs.format.z16, 15)
    config.enable_stream(rs.stream.color, 1920, 1080, rs.format.rgb8, 15)

    # Start streaming from file
    profile = pipeline.start(config)
  1. Must I pipeline.stop() after getting the bag file and create a new pipeline for the find_corner function?
  2. Additionally, is passing in filt for rs.config.enable_device_from_file(config,filt) the correct way to insert the path of the ROSBag file I just captured?
  3. Lastly, I am a bit lost as to how to find the specific pixel (the corner I'm trying to find) to extract the real-world x, y coordinate from. Will I have to use some sort of opencv script to get that? The opencv functions seem to take in images, how would that work with a bag file, will I need to convert it?

The resources you have shown me I believe will make it really easy to get the x,y real-world coordinate once I find this pixel.

@MartyG-RealSense
Copy link
Collaborator

  1. I would think that it would not be necessary to do pipeline.stop() as you are just loading a single frameset into memory and are not looping playback.

  2. If the path is stored in the filt variable then yes, that approach is fine.

  3. As mentioned earlier in this case, you can use the SDK instruction rs2_project_color_pixel_to_depth_pixel to find the real-world coordinates of a specific pixel. A diagram at the link below shows how to calculate the plus and minus coordinates of pixels for each of the 4 corners of the image for a particular resolution (the diagram uses 640x480). Basically, divide the resolution's X and Y value in half (such as 1280x720 being 640,360) and assign plus and minus to X and Y depending on which corner pixel you are using.

https://www.researchgate.net/figure/Coordinates-of-the-center-and-corners-of-a-screen-of-640x480-pixels_fig4_34711357

@kennangaibel
Copy link
Author

kennangaibel commented May 13, 2022

@MartyG-RealSense

Thank you for the help! The reason I was asking the third question is because of how I will need to find the corner. From my understanding, I thought to implement an opencv "find_corner" script, I would have to transform the bag file into a .png file. And then work with that file. Is that not necessary? Can I simply just use something like:
color_image = np.asanyarray(color_frame.get_data())? And then use opencv on the numpy array?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented May 15, 2022

A bag file is treated by the SDK like a live camera except that its data is pre-recorded. So if you can access a particular pixel coordinate such as the corners in real-time with rs2_project_color_pixel_to_depth_pixel then you should also be able to access the corner-pixel coordinate access with that instruction and do a numpy operation on it when using a bag file.

It seems that 'find corner' scripts in OpenCV do tend to use PNG instead of bag as the data source, such as the Harris corner detection in various OpenCV tutorials online including the official one in the link below.

https://docs.opencv.org/4.x/dc/d0d/tutorial_py_features_harris.html

If a RealSense depth frame is saved to a PNG then most of the depth information is lost though, making it inaccurate as a method of storing depth information.

@kennangaibel
Copy link
Author

kennangaibel commented May 20, 2022

@MartyG-RealSense
I am wanting to filter out the pixels from the opencv corner detection algorithm based on a known-depth. I was able to pull this off however, there are some inaccuracies due to some of the depth data recording as 0. The black dots on the edges all denote as being 0.0 meters far away despite them not being so in reality. Is there a way to fix this?
image

For context, when I got this bag file, I did

config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
# config.enable_stream(rs.stream.depth, 1280, 720, rs.format.z16, 15)
config.enable_stream(rs.stream.color, 1920, 1080, rs.format.rgb8, 15)

Additionally, I had to make the depth stream 640, 480 because I was getting an error when it was higher (the commented out config depth stream). Is there a way to get the depth stream to match to 1920 1080? I am unsure why a lower resolution would not cause error and the higher resolution would.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented May 21, 2022

The L515 does not support a depth resolution of 1920x1080. It can use 320x240, 640x480 or 1024x768.

In regard to the black edges, you could try applying a post-processing filter on your L515 called a Temporal Filter, which has a 'persistency index' function for replacing missing pixels.

https://dev.intelrealsense.com/docs/post-processing-filters#section-temporal-filter

#2851 (comment) provides an example of Python code for implementing the filter.

image

#10078 discusses setting the temporal filter's persistency value with rs.option.holes_fill

image

@kennangaibel
Copy link
Author

kennangaibel commented May 24, 2022

@MartyG-RealSense
On an earlier comment I made, you mentioned how I can provide the bag file name as a specific filename or a string stored in a 'path' variable for:
Method 1. rs.config.enable_device_from_file(config,"test.bag")

Or

Method 2. rs.config.enable_device_from_file(config,path)
If I save a recording with path = rs.save_single_frameset(), get the filename that it outputs on my current directory, and input it using this filename (Method 1), it works fine. However, in my application, I would like to completely automate the process, meaning I would like the realsense to take a picture by itself and then immediately use the path variable (Method 2). I have tried this with the following code:

import pyrealsense2 as rs
import numpy as np
import cv2
from PIL import Image
MIN_DEPTH = 0.4
MAX_DEPTH = 1.0

# Obtains a bag file from a single frame taken by L515
def get_bag_file():
    # Obtains a bag file from a single frame taken by L515
    # try:
    pipeline = rs.pipeline()
    config = rs.config()
    # lower resolution for pipeline.start(config) to work
    config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
    config.enable_stream(rs.stream.color, 1920, 1080, rs.format.rgb8, 15)
    # Align objects
    align_to = rs.stream.color
    align = rs.align(align_to)

    profile = pipeline.start(config)
    # Bag file representing single frame taken by camera
    path = rs.save_single_frameset()

    for x in range(100):
        pipeline.wait_for_frames()

    frame = pipeline.wait_for_frames()
    path.process(frame)

    # !: Must I use pipeline.stop?
    # pipeline.stop()

    # Returns the single frame captured by the camera
    return path

# From get_corners.py
# Gets an array of pixels that represent corners of an image
def get_corner_pixels(path):
    # Creates Pipeline
    pipeline = rs.pipeline()
    # Creates a config object
    config = rs.config()
    # Tell config that we will use a recorded device from file to be used by the pipeline through playback.
    # Allows us to use the bag file created by save_single_frameset
    rs.config.enable_device_from_file(config,path)

    profile = pipeline.start(config)

    # Converts bag file to numpy image that opencv can use
    # Wait for a coherent pair of frames: depth and color
    frames = pipeline.wait_for_frames()
    depth_frame = frames.get_depth_frame()
    color_frame = frames.get_color_frame()

    # Convert images to numpy arrays
    depth_image = np.asanyarray(depth_frame.get_data())
    color_image = np.asanyarray(color_frame.get_data())

path = get_bag_file()
coordinate = get_corner_pixels(path)

(omitted a lot of def get_corner_pixels for easier reading)
But get the error:
image
The compiler having a problem with rs.config.enable_device_from_file(config,path) in get_corner_pixels. I know that def get_bag_file works independently.

Am I transmitting the path the wrong way?

@kennangaibel
Copy link
Author

kennangaibel commented May 24, 2022

@MartyG-RealSense
Additionally, I finally got my program to the point where it can use the rs.rs2_project_color_pixel_to_depth_pixel in practice.
Here is my entire program:

import pyrealsense2 as rs
import numpy as np
import cv2
from PIL import Image
MIN_DEPTH = 0.4
MAX_DEPTH = 1.0

# Obtains a bag file from a single frame taken by L515
def get_bag_file():
    # Obtains a bag file from a single frame taken by L515
    # try:
    pipeline = rs.pipeline()
    config = rs.config()
    # lower resolution for pipeline.start(config) to work
    config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
    config.enable_stream(rs.stream.color, 1920, 1080, rs.format.rgb8, 15)
    # Align objects
    align_to = rs.stream.color
    align = rs.align(align_to)

    profile = pipeline.start(config)
    # Bag file representing single frame taken by camera
    path = rs.save_single_frameset()

    for x in range(100):
        pipeline.wait_for_frames()

    frame = pipeline.wait_for_frames()
    path.process(frame)

    # !: Must I use pipeline.stop?
    # pipeline.stop()

    # Returns the single frame captured by the camera
    return path

# From get_corners.py
# Gets an array of pixels that represent corners of an image
def get_corner_pixels(path):
    # Creates Pipeline
    pipeline = rs.pipeline()
    # Creates a config object
    config = rs.config()
    # Tell config that we will use a recorded device from file to be used by the pipeline through playback.
    # Allows us to use the bag file created by save_single_frameset
    rs.config.enable_device_from_file(config,'./test.bag')
    # rs.config.enable_device_from_file(config,path,True)

    profile = pipeline.start(config)

    # Converts bag file to numpy image that opencv can use
    # Wait for a coherent pair of frames: depth and color
    frames = pipeline.wait_for_frames()
    depth_frame = frames.get_depth_frame()
    color_frame = frames.get_color_frame()

    # Convert images to numpy arrays
    depth_image = np.asanyarray(depth_frame.get_data())
    color_image = np.asanyarray(color_frame.get_data())

    # Gets numpy array color_image and converts it to a png that opencv can use
    im = Image.fromarray(color_image)
    # What to save the image as
    im.save('stringer_image.png')

    # Loads image and runs through opencv corner finding algorithm
    img = cv2.imread('stringer_image.png')
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    gray = np.float32(gray)
    dst = cv2.cornerHarris(gray, 5, 3, 0.04)
    ret, dst = cv2.threshold(dst, 0.1 * dst.max(), 255, 0)
    dst = np.uint8(dst)
    ret, labels, stats, centroids = cv2.connectedComponentsWithStats(dst)
    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.001)
    corners = cv2.cornerSubPix(gray, np.float32(centroids), (5, 5), (-1, -1), criteria)
    # Filter out corners based on DEPTH, only want corner within certain depth range
    # List to store coordinates of corners we actually want
    filtered_corners = []
    for i in range(1, len(corners)):
        # Gets the pixel coordinates of the corner
        x_pixel = int(corners[i][0] / 3)
        y_pixel = int(corners[i][1] / 3)
        # Filters based on desired depth range
        # if (depth_image[y_pixel][x_pixel] > MIN_DEPTH and
        #         depth_image[y_pixel][x_pixel] < MAX_DEPTH):
        print("Corner detected:")
        print(corners[i])
        if (depth_image[y_pixel][x_pixel] < 1):
            filtered_corners.append(corners[i])
            print("filtered corners")
            print(corners[i])
    img[dst > 0.1 * dst.max()] = [0, 0, 255]

    # Deproject pixels from filtered_corners
    depth_scale = profile.get_device().first_depth_sensor().get_depth_scale()
    depth_min = MIN_DEPTH  # meter
    depth_max = MAX_DEPTH  # meter

    depth_intrin = profile.get_stream(rs.stream.depth).as_video_stream_profile().get_intrinsics()
    color_intrin = profile.get_stream(rs.stream.color).as_video_stream_profile().get_intrinsics()

    depth_to_color_extrin = profile.get_stream(rs.stream.depth).as_video_stream_profile().get_extrinsics_to(
        profile.get_stream(rs.stream.color))
    color_to_depth_extrin = profile.get_stream(rs.stream.color).as_video_stream_profile().get_extrinsics_to(
        profile.get_stream(rs.stream.depth))
    print("filtered_corners: ")
    print(filtered_corners)
    # List to store the real world coordinates of filtered_corners
    coordinate = []
    for color_point in filtered_corners:
        depth_point_ = rs.rs2_project_color_pixel_to_depth_pixel(
            depth_frame.get_data(), depth_scale,
            depth_min, depth_max,
            depth_intrin, color_intrin, depth_to_color_extrin, color_to_depth_extrin, color_point)
        print("depth value ")
        print(depth_point_)
        coordinate.append(depth_point_)
    # Shows image with corners until any key press
    # (Delete later for program to run without interruption)
    cv2.imshow('image', img)
    cv2.waitKey(0)
    cv2.destroyAllWindows

    # Returns the array of corners from our program
    return coordinate

# Converts 2D pixel coordinates to 3D world coordinates
path = get_bag_file()
coordinate = get_corner_pixels(path)

When I print depth_point_ from

depth_point_ = rs.rs2_project_color_pixel_to_depth_pixel(
            depth_frame.get_data(), depth_scale,
            depth_min, depth_max,
            depth_intrin, color_intrin, depth_to_color_extrin, color_to_depth_extrin, color_point)

I get [230.83444213867188, 282.647705078125].

  1. Are these the real-world x and y coordinates? If so, is it basing the z value (depth) from the LiDAR or the values of depth_min and depth_max? I am confused because I thought I was going to get back (x,y,z) not just two values?
  2. What do depth_min and depth_max do for the rs.rs2_project_color_pixel_to_depth_pixel function?
  3. How do I verify that this depth_point_ is a correct value?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented May 25, 2022

  1. It should produce real-world coordinates, yes. I do not know the rs2_project_color_pixel_to_depth_pixel well enough to explain your output though, unfortunately. Looking at the script at Getting distance and 3D coordinate from specific pixel #9945 (comment) and its output in the image below may help you to work it out. I do apologize.

image

  1. The purpose of depth_min and depth_max is not documented but my interpretation is that they are defining a minimum and maximum range for what depth values are considered to be valid and excluding depth values from consideration that are outside of that range (like a depth threshold filter).

The SDK file rsutil.h describes it in terms of finding a projected pixel with unknown depth.

https://github.com/IntelRealSense/librealsense/blob/master/include/librealsense2/rsutil.h#L31-L34

The code in your script that checks for valid depth values for the corner pixels seems to back up this theory.

# Filters based on desired depth range
# if (depth_image[y_pixel][x_pixel] > MIN_DEPTH and
#         depth_image[y_pixel][x_pixel] < MAX_DEPTH):
print("Corner detected:")
  1. If the projected depth values are extracted from a color PNG image then there is a good chance that they will not be as accurate as those from a live camera or a file format suited to storing depth values, such as .raw or .csv.

As you have already defined a numpy depth array, a quick way to check the depth value for accuracy against the value obtained from the PNG may be to save the array's depth data to a Python .npy file as an array of scaled matrices, as demonstrated in a script at #4934 (comment)

The simplest way to check actual real-world distance against calculated distance is to use a ruler or tape measure between the camera and the observed object / surface.

@kennangaibel
Copy link
Author

kennangaibel commented May 27, 2022

Hi @MartyG-RealSense
I think I may have figured the problem out. It seems that rs.rs2_project_color_pixel_to_depth_pixel returns the pixel, (x then y value) of the original depth frame, which I don't think I can really use in my application. I changed it to use
rs.rs2_deproject_pixel_to_point, which returns three values, the x, y, and z of what I believe is the 3D real-word coordinate since the z value is an accurate depth value. Doing this seems to give me the correct 3D values in depth_point_.

    for color_point in filtered_corners:
        depth = depth_frame.get_distance(int(color_point[0]), int(color_point[1]))
        depth_point_ = rs.rs2_deproject_pixel_to_point(depth_intrin, [color_point[0], color_point[1]], depth

Additionally, I have a few questions:
When I align the depth and RGB frames

align_to = rs.stream.color
#  Aligns Depth->Color
align = rs.align(align_to)

Based off

config.enable_stream(rs.stream.depth, 848, 480, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 1280, 720, rs.format.rgb8, 15)

Does that mean the depth numpy array depth_image is then spanned onto a 1280x720 pixel format rather than 848x480? Here is the code together:

    align_to = rs.stream.color
#  Aligns Depth->Color
    align = rs.align(align_to)

    # Align the depth frame to color frame
    aligned_frames = align.process(frames)

    # Get aligned frames
    depth_frame = aligned_frames.get_depth_frame()  # aligned_depth_frame is a 640x480 depth image
    color_frame = aligned_frames.get_color_frame()

    spat_filter = rs.spatial_filter()  # Spatial - edge-preserving spatial smoothing
    temp_filter = rs.temporal_filter()  # Temporal - reduces temporal noise
    spat_filter.set_option(rs.option.filter_magnitude, 4)
    spat_filter.set_option(rs.option.holes_fill, 3)
    frame = spat_filter.process(frames)
    frame = temp_filter.process(frame)

    # Convert images to numpy arrays
    depth_image = np.asanyarray(depth_frame.get_data())
    color_image = np.asanyarray(color_frame.get_data())
  1. Essentially, I am wondering, when I get a pixel from depth_image or color_image, will the range of values then be 1280x720 for both numpy arrays because of the align?
  2. Now, I want to get the depth values of the RGB pixels that my opencv algorithm chooses. When I directly get depth values from the numpy array, something like depth_image[y_pixel, x_pixel] I get an error when I input x then y value rather than y then x value. Is there a reason for this? Are the parameters for getting pixel values (y, x) rather than (x, y) for certain functions in realsense? I have found using the .get_distance function to be more accurate and less strange.

@MartyG-RealSense
Copy link
Collaborator

When depth is aligned to color with align_to, the RealSense SDK's 'align processing block' mechanism will resize the depth's field of view (FOV) size to match the color FOV size. The processing block will also automatically adjust for differences in resolution between the depth and color streams.

If the color FOV is smaller than the depth FOV then this may result in the outer edges of the depth image being cut off in the aligned image.

If color to depth alignment is performed instead though by using ** align_to = rs.stream.depth** then if the color FOV is smaller than the depth FOV, the image will stretch out to fill the screen instead of having its edges cut off.

  1. The official SDK documentation about alignment in the link below states that aligned frames will share the same resolution. My expectation would be that the shared resolution that is used will be the resolution of the stream that is being aligned to (the color stream's 1280x480 in this particular case).

https://github.com/IntelRealSense/librealsense/wiki/Projection-in-RealSense-SDK-2.0#frame-alignment

  1. The link below provides advice about why y,x is used instead of x,y in OpenCV. Apparently it is because of OpenCV's matrix notation.

https://stackoverflow.com/questions/54549322/why-should-i-use-y-x-instead-of-x-y-to-access-a-pixel-in-opencv

@kennangaibel
Copy link
Author

kennangaibel commented May 31, 2022

Hi @MartyG-RealSense,
I am just about finished with this issue. I just have two more questions about some weird quirks with this program.

  1. Earlier, we went over single_frame_set as well as using the path saved from save_single_frame_set. I found that this wasn't necessary and I could simply run my code since the camera will be placed statically. Running this code on any python file works fine, however, when I tried putting this code into main.py the realsense seems to completely ignore my code and just shows the livestream frames of "stabilized" and normal video. Using the same exact code, in any python file not named main.py, it works as normal. Do you know why this is? If so, am I able to fix it so that main.py runs my code?

  2. Sometimes when I test the program, the image comes out like
    stringer_image
    vs. normal
    image

The image is strangely dark and green tinted. This behavior seems to be random. I am wondering why this is the case and how to fix it?

Here is my entire program (seems to work pretty well besides these quirks)

import pyrealsense2 as rs
import numpy as np
import cv2
from PIL import Image

MIN_DEPTH = 0.4
MAX_DEPTH = 0.7

# Gets an array of pixels that represent corners of an image
def get_corner_pixels():
    # Creates Pipeline
    pipeline = rs.pipeline()
    # Creates a config object
    config = rs.config()
    # Set resolutions for color and depth frames
    config.enable_stream(rs.stream.depth, 848, 480, rs.format.z16, 30)
    config.enable_stream(rs.stream.color, 1280, 720, rs.format.rgb8, 15)

    # Replays an already captured bag file rather than using a present stream
    # Comment out rs.config.enable_device_from_file if you want to take a picture
    # in the present and automate the process
    # rs.config.enable_device_from_file(config,'RealSense Frameset 117.bag')

    profile = pipeline.start(config)

    # Wait for a coherent pair of frames: depth and color
    frames = pipeline.wait_for_frames()
    # Creates an align object
    # The "align_to" is the stream type to which we plan to align depth frames.
    align_to = rs.stream.color
    #  Aligns depth frame to color frame
    align = rs.align(align_to)
    aligned_frames = align.process(frames)

    # Get aligned frames
    depth_frame = aligned_frames.get_depth_frame()  # aligned_depth_frame is a 640x480 depth image
    color_frame = aligned_frames.get_color_frame()

    spat_filter = rs.spatial_filter()  # Spatial - edge-preserving spatial smoothing
    # Uses previous frames to decide whether missing values should be filled
    # with previous data
    temp_filter = rs.temporal_filter()  # Temporal - reduces temporal noise
    # Sets up custom parameters for spatial filter
    # Can play around with parameters to see what yields
    # best result
    spat_filter.set_option(rs.option.filter_magnitude, 4)
    spat_filter.set_option(rs.option.holes_fill, 3)
    frame = spat_filter.process(frames)
    frame = temp_filter.process(frame)

    # Convert images to numpy arrays
    depth_image = np.asanyarray(depth_frame.get_data())
    color_image = np.asanyarray(color_frame.get_data())
    # Runs opencv algorithm
    corners = find_corners(color_image)

    # Filter out corners based on DEPTH, only want corner within certain depth range
    # List to store coordinates of corners we actually want
    filtered_corners = []
    for i in range(1, len(corners)):
        # Gets the pixel coordinates of the corner
        x_pixel = int(corners[i][0])
        y_pixel = int(corners[i][1])
        print(f"X pixel: {x_pixel}")
        print(f"Y pixel: {y_pixel}")

        # Filters based on desired depth range
        print("Corner detected:")
        print(corners[i])
        if ((MIN_DEPTH < depth_frame.get_distance(int(corners[i][0]), int(corners[i][1])) < MAX_DEPTH)):
            filtered_corners.append(corners[i])
            print("filtered corners")
            print(corners[i])
            print("depth")
            print(depth_frame.get_distance(int(corners[i][0]), int(corners[i][1])))

    # Deproject pixels from filtered_corners
    depth_intrin = profile.get_stream(rs.stream.depth).as_video_stream_profile().get_intrinsics()

    print("filtered_corners: ")
    print(filtered_corners)
    # List to store the real world coordinates of filtered_corners
    coordinates = []
    # Converts all pixels into (x,y,z) coordinates
    for color_pixel in filtered_corners:
        # get the 3D coordinate
        # camera_coordinate = rs.rs2_deproject_pixel_to_point(depth_intrs, [x, y], dis)
        print(color_pixel)

        # Gets depth value of a pixel
        depth = depth_frame.get_distance(int(color_pixel[0]), int(color_pixel[1]))
        # Gets the 3D coordinate of that pixel
        depth_point_ = rs.rs2_deproject_pixel_to_point(depth_intrin, [color_pixel[0], color_pixel[1]], depth)

        print("depth value ")
        print(depth_point_[0])
        print(depth_point_[1])
        print(depth_point_[2])

        coordinates.append(depth_point_)
    # Shows image with corners until any key press
    # (Delete later for program to run without interruption)
    cv2.waitKey(0)
    cv2.destroyAllWindows

    # Returns the array of corner coordinates from our program
    return coordinates

def find_corners(color_image):
    # Gets numpy array color_image and converts it to a png that opencv can use
    im = Image.fromarray(color_image)

    # !: In final implementation, delete png after use
    im.save('stringer_image.png')
    # Loads image and runs through opencv corner finding algorithm
    img = cv2.imread('stringer_image.png')
    # Gets rid of "salt and pepper" noise
    img = cv2.bilateralFilter(img, 11, 21, 7)

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    gray = np.float32(gray)
    dst = cv2.cornerHarris(gray, 5, 3, 0.04)
    ret, dst = cv2.threshold(dst, 0.1 * dst.max(), 255, 0)
    dst = np.uint8(dst)
    ret, labels, stats, centroids = cv2.connectedComponentsWithStats(dst)
    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.001)
    corners = cv2.cornerSubPix(gray, np.float32(centroids), (5, 5), (-1, -1), criteria)
    img[dst > 0.1 * dst.max()] = [0, 0, 255]
    # Shows the image with the algorithm
    cv2.imshow('image', img)
    return corners

# Array of 3D real-world coordinates from corners
coordinate = get_corner_pixels()
exit()

@MartyG-RealSense
Copy link
Collaborator

  1. In the main.py version of your script, please check whether # rs.config.enable_device_from_file(config,'RealSense Frameset 117.bag') has the # symbol removed from the start of the line. Otherwise, if # is present then the script would ignore the bag file loading instruction and instead use a live camera as the data source if one is attached to the computer.

  2. If the darkening and green tinting of the image only happens sometimes then I assume that the problem is not recorded in the bag file but is occurring in real-time after the bag frames have been read.

If the image is darkening then that makes me think that the RGB exposure may be dropping to a low value. If it was an RGB white-balance problem then the image would not darken like that. If the data is from a bag file and not a live camera then it is difficult to see what might cause such a problem, as there are no physical sensors involved that could be affected.

@kennangaibel
Copy link
Author

Hi @MartyG-RealSense,

  1. I am purposely commenting out the rs.config.enable_device_from_file(config,'RealSense Frameset 117.bag') so that I can run the program automatically. On programs not named main.py this surprisingly works, seemingly saving a single frame, outputting the png of it with corners marked and then ending perfectly. This is to fix an earlier issue, where I tried to use rs.config.enable_device_from_file(config,path), but got the error
    image

So this was the only way I could figure out to automate it. rs.config.enable_device_from_file(config,"test.bag") only works when I manually get a bagfile from the camera, save it, and then type in the bag file name in this function, which is not ideal since I want the whole process automated.

  1. Yes the problem seems to be in real time. The data is from a live camera as I have commented out # rs.config.enable_device_from_file(config,'RealSense Frameset 117.bag'). Is there a way to fix the RGB exposure dropping to a low value?

@MartyG-RealSense
Copy link
Collaborator

If the folder that the script is looking for the bag in will only have one bag file in, a way to automate the process may be to have the file path be any file in the folder that has the .bag extension without having to know the filename. Code for doing so can be found at the link below.

https://stackoverflow.com/questions/3964681/find-all-files-in-a-directory-with-extension-txt-in-python

As it is the RGB sensor being affected rather than the depth sensor, if the apparent exposure drop is caused by a flash of strong sunlight then there is likely not much that could be done to reduce the likelihood of the sensor being saturated by the sunlight other than purchasing and applying a physical filter product over the lens of the RGB sensor on the outside of the camera to add protection from the light. The L515 is designed for use indoors in controlled lighting conditions and is not suited for outdoor use or indoors in areas with natural light.

@MartyG-RealSense
Copy link
Collaborator

Hi @kennangaibel Do you require further assistance with this case, please? Thanks!

@kennangaibel
Copy link
Author

Hi @MartyG-RealSense
We can close the case. Thanks so much for all the help, I couldn't have done it without you. Feel free to reference my repository if anyone you come across needs to do something similar.

@MartyG-RealSense
Copy link
Collaborator

You are very welcome, @kennangaibel - thanks very much for the update! :)

@Nel1212
Copy link

Nel1212 commented Oct 7, 2022

Hello
How would I go about outputting a single (x,y) real-world coordinate from a corner of this image?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Oct 7, 2022

Hi @Nel1212 As mentioned above, the real-world XYZ for a specific XY pixel coordinate can be obtained with the RealSense SDK instruction rs2_project_color_pixel_to_depth_pixel which converts a color pixel to a depth pixel.

The center pixel for any RGB resolution can be found by dividing it by 2 - for example, the center pixel of 1280x720 would be the coordinate 640x360.

So the four corner coordinates of resolution 1280x720 on an RGB image will be:

Top left: 0,0
Bottom left: 0,720
Top right: 1280,0
Bottom right: 1280, 720

image

On a RealSense 400 Series depth image, the displayed corner coordinates are divided by 2.

Top left: 0,0
Bottom left: 0,360
Top right: 640,0
Bottom right: 640, 360

image

An L515 camera does not divide the depth coordinates though. So for 640x480 depth:

Top left: 0,0
Bottom left: 0,480
Top right: 640,0
Bottom right: 640, 480

@SammaniSandumini
Copy link

SammaniSandumini commented Feb 12, 2024

Hii @MartyG-RealSense . I'm working on a project using a Realsense D435 camera and I've encountered some issues,
I used followng code to record rgbd data from intel realsense d435 camera to a mp4 file.

import pyrealsense2 as rs
import cv2
import numpy as np
from datetime import datetime

pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

pipeline.start(config)

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
depth_filename = f'depth_video  _{timestamp}.mp4'
color_filename = f'color_video_{timestamp}.mp4'

depth_fourcc = cv2.VideoWriter_fourcc(*'mp4v')
depth_out = cv2.VideoWriter(depth_filename, depth_fourcc, 30, (640, 480), isColor=False)

color_fourcc = cv2.VideoWriter_fourcc(*'mp4v')
color_out = cv2.VideoWriter(color_filename, color_fourcc, 30, (640, 480))

try:
    while True:
        frames = pipeline.wait_for_frames()
        depth_frame = frames.get_depth_frame()
        color_frame = frames.get_color_frame()

        if not depth_frame or not color_frame:
            continue

        depth_image = np.asanyarray(depth_frame.get_data())
        color_image = np.asanyarray(color_frame.get_data())

        depth_out.write(depth_image)
        color_out.write(color_image)

        cv2.imshow('Color Image', color_image)
        cv2.imshow('Depth Image', depth_image)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

finally:
    pipeline.stop()
    depth_out.release()
    color_out.release()
    cv2.destroyAllWindows()
  1. Is this approach correct for saving depth information for later use?can we save this depth information in mp4 format?
    2.Is there a way to convert x,y pixel coordinates in color video into realworld coordinates without using the depth video.
    3.can we use rs2_deproject_pixel_to_point function to this task without the depth information?

@MartyG-RealSense
Copy link
Collaborator

Using cv2.VideoWriter fourcc code to save RealSense data to .mp4 format is an appropriate method of export to that format. However, most of the depth information will be lost.

To preserve the depth values, the data should be exported to a format that preserves it. An example of such a format in Python is .npy and #4934 (comment) has a script for exporting depth to .npy.

You can obtain Z-depth for a color pixel without depth data by converting it to a depth pixel with rs2_project_color_pixel_to_depth_pixel. See #5603 (comment) for a Python example of the instruction's use.

rs2_deproject_pixel_to_point requires use of depth information.

@SammaniSandumini
Copy link

@MartyG-RealSense Is this rs2_project_color_pixel_to_depth_pixel method can be applied for mp4 files recorded using intel realsense d435?(this mp4 file only contains color stream data).Or is it only applicable in real time ? Is there a method to convert these pixel coordinates in mp4 files into real world coordinates?

@MartyG-RealSense
Copy link
Collaborator

Hi @SammaniSandumini The RealSense SDK cannot directly read MP4 files. An indirect approach would be to read the frames using OpenCV and then send them to the SDK's software-device virtual camera interface, as discussed at #2249

More information about software-device can be found at the link below.

https://github.com/IntelRealSense/librealsense/tree/master/examples/software-device

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants