Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will the application of depth camera lead to false detection of yolov5 algorithm? #9134

Closed
yangminwei95 opened this issue Jun 1, 2021 · 8 comments

Comments

@yangminwei95
Copy link


Required Info
Camera Model D435i
Firmware Version opencv
Operating System & Version {Win (10)
Kernel Version (Linux Only)
Platform PC
SDK Version pyrealsense2
Language python
Segment Robot

Issue Description

I want to insert the depth camera into yolov5 algorithm, so that the output of yolov5 not only contains confidence, but also can express the three-dimensional coordinates of the center point of the detection box. But the effect is not good. The improved yolov5 algorithm always detects the background as an object when detecting objects. It can't be as accurate as the original color camera detection through opencv. Is it because of the alignment between the depth map and the color map?
In addition, the following problems sometimes occur when the program is running,
I want to insert the depth camera into yolov5 algorithm, so that the output of yolov5 not only contains confidence, but also can express the three-dimensional coordinates of the center point of the detection frame. But the effect is not good. The improved yolov5 algorithm always detects the background as an object when detecting objects. It can't be as accurate as the original color camera detection through opencv. Is it because of the alignment between the depth map and the color map?
In addition, the following problems sometimes occur when the program is running,

depth = depth_frame.as_depth_frame().get_distance(pixel_x, pixel_y)
RuntimeError: out of range value for argument "y"

I want to know why. The code is attached at the back.
`import pyrealsense2 as rs
import argparse
import time
from pathlib import Path
import numpy as np
import cv2
import torch
import torch.backends.cudnn as cudnn
from collections import defaultdict
from numpy import random
from realsense_device_manager import *
from models.experimental import attempt_load
from utils.datasets import LoadStreams, LoadImages
from utils.general import check_img_size, check_requirements, check_imshow, non_max_suppression, apply_classifier,
scale_coords, xyxy2xywh, strip_optimizer, set_logging, increment_path
from utils.plots import plot_one_box
from utils.torch_utils import select_device, load_classifier, time_synchronized
import calibration_kabsch as ck
def detect(save_img=False):
weights, view_img, save_txt, imgsz = opt.weights, opt.view_img, opt.save_txt, opt.img_size
# webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith(
# ('rtsp://', 'rtmp://', 'http://'))
# Directories
save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok)) # increment run
print('save_dir=', save_dir)
(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir
print('weights=', weights, 'view_img=', view_img, 'save_txt=', save_txt, 'imgsz=', imgsz)

# Initialize
set_logging()
device = select_device(opt.device)
print('device=', device)
half = device.type != 'cpu'  # half precision only supported on CUDA
print('half=', half)
# Load model
model = attempt_load(weights, map_location=device)  # load FP32 model
# print('model=', model)
stride = int(model.stride.max())  # model stride
print('stride=', stride)
imgsz = check_img_size(imgsz, s=stride)  # check img_size
print('imgsz=', imgsz)
if half:
    model.half()  # to FP16

# Second-stage classifier

classify = False
if classify:
    modelc = load_classifier(name='resnet101', n=2)  # initialize
    modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']).to(device).eval()

# Set Dataloader
vid_path, vid_writer = None, None
# view_img = check_imshow()
# cudnn.benchmark = True  # set True to speed up constant image size inference
# dataset = LoadStreams(source, img_size=imgsz, stride=stride)
# print('dataset0=', dataset)
resolution_width = 640  # pixels
resolution_height = 480  # pixels
frame_rate = 15  # fps
dispose_frames_for_stablisation = 30  # frames
rs_config = rs.config()
rs_config.enable_stream(rs.stream.depth, resolution_width, resolution_height, rs.format.z16, frame_rate)
rs_config.enable_stream(rs.stream.infrared, 1, resolution_width, resolution_height, rs.format.y8, frame_rate)
rs_config.enable_stream(rs.stream.color, resolution_width, resolution_height, rs.format.bgr8, frame_rate)
device_manager = DeviceManager(rs.context(), rs_config)
device_manager.enable_all_devices()

# Allow some frames for the auto-exposure controller to stablise让自动曝光控制器的一些帧稳定下来
for frame in range(dispose_frames_for_stablisation):
    frames = device_manager.poll_frames()
    # frames={'042222070147': {stream.depth: <pyrealsense2.frame Z16 #23>,
    # (stream.infrared, 1): <pyrealsense2.frame Y8 #23>, stream.color: <pyrealsense2.frame BGR8 #33>}}

assert (len(device_manager._available_devices) > 0)
intrinsics_devices = device_manager.get_device_intrinsics(frames)
print("intrinsics_devices=", intrinsics_devices)
extrinsics_devices = device_manager.get_depth_to_color_extrinsics(frames)


# Get the calibration info as a dictionary to help with display of the measurements onto the color image instead of infrared image

calibration_info_devices = defaultdict(list)
for calibration_info in ( intrinsics_devices, extrinsics_devices):
    for key, value in calibration_info.items():
        calibration_info_devices[key].append(value)
print("calibration_info_devices=", calibration_info_devices)
# Get names and colors
names = model.module.names if hasattr(model, 'module') else model.names
print('names=', names)

colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]
print('colors=', colors)

# Run inference
if device.type != 'cpu':
    model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once
    print('device!=cpu')
t0 = time.time()
print('t0=', t0)

a = 1
while a:
    frames = device_manager.poll_frames()
    corners3D = {}
    for (serial, frameset) in frames.items():
        depth_frame = post_process_depth_frame(frameset[rs.stream.depth])
        print('depth_frame=', depth_frame)
        infrared_frame = frameset[(rs.stream.infrared, 1)]
        print('infared_frame=', infrared_frame)
        depth_intrinsics = intrinsics_devices[serial][rs.stream.depth]
        print('depth_intrinsics=', depth_intrinsics)
        infrared_image = np.asanyarray(infrared_frame.get_data())
        color_frame = frameset[rs.stream.color]
        img_color = np.asanyarray(color_frame.get_data())

        img = np.transpose(img_color)
        img = torch.from_numpy(img).to(device)
        print('img=', img)
     
        img = img.half() if half else img.float()  # uint8 to fp16/32 ?
        print('img=', img)
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        print('img=', img)
        if img.ndimension() == 3:  
            img = img.unsqueeze(0)  
            print('img=', img)
        t1 = time_synchronized()  
        pred = model(img, augment=opt.augment)[0]
        print('pred=', pred)

        # Apply NMS
      
        pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
        print('pred=', pred)
        t2 = time_synchronized()  
        print('t2=', t2)

        # Process detections
       
        for i, det in enumerate(pred):  # detections per image
            p = '1'  # to Path
            p = Path(p)
            save_path = str(save_dir / p.name)  # img.jpg
            txt_path = str(save_dir / 'labels' / p.stem) + ('_') + (str(a))  # img.txt
            a = a + 1
            s = '0: '
            s += '%gx%g ' % img.shape[2:]  # print string
            gn = torch.tensor(img.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            if len(det):

                # Print results
             
                for c in det[:, -1].unique():
                    n = (det[:, -1] == c).sum()  # detections per class
                    print('n=', n)
                    s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string
                    print('s=', s)

            # Write results
                '''
                dict1 = {0: 'dst_points0', 1: 'dst_points1', 2: 'dst_points2'}
                src_points = np.zeros(shape=(len(det), 3))
                dst_points0 = np.random.rand(len(det), 3)
                dst_points1 = np.random.rand(len(det), 3)
                dst_points2 = np.random.rand(len(det), 3)
                '''
                m = 0
                for *xyxy, conf, cls in reversed(det):
                    print('*xyxy=', *xyxy, 'conf=', conf, 'cls=', cls)
                    # Write to file
                  
                    xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                    print('xywh=', xywh)
                    pixel_x = round(xywh[0])
                    pixel_y = round(xywh[1])
                    depth = depth_frame.as_depth_frame().get_distance(pixel_x, pixel_y)

                    x = (pixel_x - depth_intrinsics.ppx) / depth_intrinsics.fx * depth
                    x = round(x)
                    y = (pixel_y - depth_intrinsics.ppy) / depth_intrinsics.fy * depth
                    round(y)
                    z = round(depth)
                    m = m+1
                    if save_txt:
                        line = (cls, *xywh, x, y, z, conf) if opt.save_conf else (cls, *xywh, x, y, z)  # label format
                        print('line=', line)
                        with open(txt_path + '.txt', 'a') as f:
                            f.write(('%g ' * len(line)).rstrip() % line + '\n')
                        

                    if save_img or view_img:  # Add bbox to image
                        label = f'{names[int(cls)]} {conf:.2f}'
                        print('label=', label)
                      
                        plot_one_box(xyxy, img_color, label=label, color=colors[int(cls)], line_thickness=3,
                                 points_x=x, points_y=y,
                                 points_z=z)
             
            print(f'{s}Done. ({t2 - t1:.3f}s)')
            print('111')
            if view_img:
                cv2.imshow('frame', img_color)
          
                cv2.waitKey(1)  # 1 millisecond
                print('view_img')
            
        '''
        if save_img:
            print('save_img')
            if dataset.mode == 'image':
                    cv2.imwrite(save_path, im0)  
            else:  # 'video'
                if vid_path != save_path:  # new video
                    vid_path = save_path
                    if isinstance(vid_writer, cv2.VideoWriter): 
                        vid_writer.release()  # release previous video writer

                    fourcc = 'mp4v'  # output video codec
                    fps = vid_cap.get(cv2.CAP_PROP_FPS)  
                    w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                    h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                    vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*fourcc), fps, (w, h))
                vid_writer.write(im0)
                '''

if save_txt or save_img:
    s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''

    print(f"Results saved to {save_dir}{s}")
print(f'Done. ({time.time() - t0:.3f}s)')
print('222')

if name == 'main':
parser = argparse.ArgumentParser()
parser.add_argument('--weights', nargs='+', type=str,
default='G:/GitHub/yolov5-master/runs/train/exp37/weights/best.pt', help='model.pt path(s)')
# parser.add_argument('--source', type=str, default='0', help='source') # file/folder, 0 for webcam
parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--view-img', action='store_false', help='display results')
parser.add_argument('--save-txt', action='store_false', help='save results to *.txt')
parser.add_argument('--save-conf', action='store_false', help='save confidences in --save-txt labels')
parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
parser.add_argument('--agnostic-nms', action='store_false', help='class-agnostic NMS')
parser.add_argument('--augment', action='store_true', help='augmented inference')
parser.add_argument('--update', action='store_true', help='update all models')
parser.add_argument('--project', default='runs/detect', help='save results to project/name')
parser.add_argument('--name', default='exp', help='save results to project/name')
parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
opt = parser.parse_args()
print('opt=', opt)
check_requirements()

with torch.no_grad():  
    if opt.update:  # update all models (to fix SourceChangeWarning)
        for opt.weights in ['yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt']:
            detect()
            strip_optimizer(opt.weights)  
    else:
        detect()

屏幕截图 2021-06-01 165954

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jun 1, 2021

Hi @yangminwei95 Depth to color alignment should be beneficial when the alignment is accurate, as it can help to distinguish background pixels from foreground pixels.

The get_distance instruction seems to have a problem with the value of pixel_y that is being fed to it. I have seen a similar Python case in the past involving an out of range error when using alignment and get_distance, and storing pixel values in x and y variables. In that particular case, the error was most likely to occur when reading coordinates near the edge of the image rather than the center of the image.

#7395

The RealSense user in that case concluded that their problem was being caused by the alignment, and they posted an updated script at the end of the case.

#7395 (comment)


If you are performing object detection with Python then it may also be worth considering using the RealSense SDK's compatibility wrapper for the open-source TensorFlow platform.

https://github.com/IntelRealSense/librealsense/tree/master/wrappers/tensorflow

@yangminwei95
Copy link
Author

Thanks for your reply,but But it didn't solve my problem.Is there a way for opencv to open the depth camera and color camera of D435i at the same time?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jun 1, 2021

If you are aiming for a program for OpenCV that uses Python and displays depth and color as separate panels, the SDK's opencv_viewer_example program may meet your needs.

https://github.com/IntelRealSense/librealsense/blob/master/wrappers/python/examples/opencv_viewer_example.py

The link below describes how to adapt that example for multiple cameras on Windows if you should need to.

https://rahulvishwakarma.wordpress.com/2019/08/17/realsense-435i-depth-rgb-multi-camera-setup-and-opencv-python-wrapper-intel-realsense-sdk-2-0-compiled-from-source-on-win10/

If you need depth to color alignment with Python, the SDK example align_depth2color.py imports OpenCV for image rendering.

https://github.com/IntelRealSense/librealsense/blob/master/wrappers/python/examples/align-depth2color.py


If use of Python is not important to you though then the OpenCV-equipped depth to color alignment project in the link below may be helpful.

https://github.com/UnaNancyOwen/RealSense2Sample/tree/master/sample/Align

And Intel provide an OpenCV C++ script for alignment here:

https://dev.intelrealsense.com/docs/opencv-wrapper#section-get-aligned-color-depth

@sam598
Copy link

sam598 commented Jun 1, 2021

@yangminwei95

Do you want yolov5 to use both RGB color and Depth data when it is running inference to better detect an object?

Or do you want to detect an object from just the RGB data, and then find that object's 3D position using the depth data within that bounding box?

@yangminwei95
Copy link
Author

@yangminwei95

Do you want yolov5 to use both RGB color and Depth data when it is running inference to better detect an object?

Or do you want to detect an object from just the RGB data, and then find that object's 3D position using the depth data within that bounding box?

Yes, I want to detect an object from just the RGB data, and then find that object's 3D position using the depth data within that bounding box.However, if the color camera and depth camera are turned on at the same time by using the method provided by pyrealsense, the detection effect will be poor, and the real target object will not be detected, so the background will always be detected as the target.
Do you have any good methods?Thank you in advance.

@MartyG-RealSense
Copy link
Collaborator

Hi @yangminwei95 Do you require further assistance with this case, please? Thanks!

@yangminwei95
Copy link
Author

Do you require further assistance with this case, please? Thanks!

I don't need it for the time being. I found the cause of the problem. Because I made a mistake about the dimensions of the data, it led to false detection. Thank you sincerely for your help these days.

@MartyG-RealSense
Copy link
Collaborator

It is great to hear that you found the solution @yangminwei95 - thanks very much for the update! I will close this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants