Accuracy yolov3 in deepstream lower then darknet #5413

daliel · 2020-04-29T11:51:08Z

I use yolov3. Trained with the parameter letter_box = 1. In testing, I use the key -letter_box and I'm happy with the result.
When I use deeppstream I use the key maintain-aspect-ratio = 1. And as a result, objects that come in contact with the bottom edge of the image are not marked as objects. If I use the key maintain-aspect-ratio = 0, then these objects are marked with a bounding box. But at the same time, overall accuracy drops.

my code:

static std::vector<NvDsInferParseObjectInfo>
nonMaximumSuppression(const float nmsThresh, std::vector<NvDsInferParseObjectInfo> binfo, const uint& netW, const uint& netH)
{
    auto overlap1D = [](float x1min, float x1max, float x2min, float x2max) -> float {
        if (x1min > x2min)
        {
            std::swap(x1min, x2min);
            std::swap(x1max, x2max);
        }
        return x1max < x2min ? 0 : std::min(x1max, x2max) - x2min;
    };
    auto computeIoU
        = [&overlap1D](NvDsInferParseObjectInfo& bbox1, NvDsInferParseObjectInfo& bbox2) -> float {
        float overlapX
            = overlap1D(bbox1.left, bbox1.left + bbox1.width, bbox2.left, bbox2.left + bbox2.width);
        float overlapY
            = overlap1D(bbox1.top, bbox1.top + bbox1.height, bbox2.top, bbox2.top + bbox2.height);
        float area1 = (bbox1.width) * (bbox1.height);
        float area2 = (bbox2.width) * (bbox2.height);
        float overlap2D = overlapX * overlapY;
        float u = area1 + area2 - overlap2D;
        return u == 0 ? 0 : overlap2D / u;
    };

    std::stable_sort(binfo.begin(), binfo.end(),
                     [](const NvDsInferParseObjectInfo& b1, const NvDsInferParseObjectInfo& b2) {
                         return b1.detectionConfidence > b2.detectionConfidence;
                     });
    std::vector<NvDsInferParseObjectInfo> out;
    std::vector<NvDsInferParseObjectInfo> out_out;
    for (auto i : binfo)
    {
        bool keep = true;
        for (auto j : out)
        {
            if (keep)
            {
                float overlap = computeIoU(i, j);
                keep = overlap <= nmsThresh;
            }
            else
                break;
        }
        if (keep)
        {
            out.push_back(i);
            //out_out.push_back(i);
            float centerx = i.left + (i.width / 2);
            float centery = i.top + (i.height / 2);
            float side = std::min(std::min(netW - 1, netH - 1), std::max(i.width, i.height));
            i.left = std::max(centerx - (side / 2),float(0.0));
            i.top = std::max(centery - (side / 2),float(0.0));
            i.width = side;
            i.height = side;
            //i.left = clamp(i.left, 1, (netW - 1));
            //i.top = clamp(i.top, 1, (netH - 1));
            printf("points: %i, %i, %i, %i, wh: %i, %i\n",int(i.left), int(i.top), int(i.left+i.width), int(i.top+i.height), int(i.width), int(i.height));
            if ((i.left+i.width) > netW) {
                i.left = i.left + (netW - 1 - (i.left + i.width));
                printf(" change left %i\n", int(i.left));
            }
            if ((i.top+i.height) > (netH-1)) {
                i.top = i.top + (netH - 1 -(i.top + i.height));
                printf(" change top %i\n", int(i.top));
            }
            if (i.left < 0) {
                i.left = 0;
            }
            if (i.top < 0) {
                i.top = 0;
            }
            out_out.push_back(i);
        }
    }
    return out_out;
}

The text was updated successfully, but these errors were encountered:

marcoslucianops · 2020-04-29T14:39:14Z

The accuracy is slightly lower than Darknet, but try change lines 307-108 in nvdsparsebbox_Yolo.cpp

    static const float kNMS_THRESH = 0.3f;
    static const float kPROB_THRESH = 0.7f;

to (Darknet values)

    static const float kNMS_THRESH = 0.45;
    static const float kPROB_THRESH = 0.25;

And see if you get better results.

AlexeyAB · 2020-04-29T14:52:36Z

What approach of resizing is used in TRT for keepning aspect ration? #232 (comment)

as in the Darknet with letter_box=1?
or as in the OpenCV ?

What network resolution do you use?
And what image size do you use?

daliel · 2020-04-29T15:59:39Z

What approach of resizing is used in TRT for keepning aspect ration? #232 (comment)

I don’t know which approach the deepstream uses. But I assume that

letterbox_image

Objects on the left, right and top border of the image are marked.
Network resolution: 608x608x3
Image resolution: 1920x1080x3

AlexeyAB · 2020-04-30T00:03:48Z

Also you can try to use https://github.com/ceccocats/tkDNN + TensorRT instead of Deepstream+TensorRT.

tkDNN supports YOLOv4 with higher speed than Deepstream.

marcoslucianops · 2020-04-30T03:30:12Z

tkDNN supports YOLOv4 with higher speed than Deepstream.

In my preliminary tests, tkDNN is slower than DeepStream (used yolov3-tiny on Jetson Nano), but I haven’t tested it property yet.

daliel · 2020-04-30T09:21:32Z

Sorry, but I want to use a cascade of neural networks. This is implemented in the deepstream, but for some unknown reason, the classification works better with square boxes from yolo than with the built-in crop function while maintaining the aspect ratio.

marcoslucianops · 2020-04-30T20:50:47Z

@daliel, do you using patch to non-square bbox in deepstream?

daliel · 2020-05-01T07:32:57Z

@marcoslucianops, square - all sides are equal. I changed the output code of the bounding boxes to the deepstream as in my first message (correct works with maintain-aspect-ratio = 1). But the problem is with the detection of objects on the lower border of the image (the object is partially outside the image). Boxes are not displayed when touching the lower border of the image

marcoslucianops · 2020-05-05T23:52:09Z

@daliel, Try new DeepStream 5: https://developer.nvidia.com/deepstream-sdk

shubham-shahh · 2020-10-03T08:18:29Z

What approach of resizing is used in TRT for keepning aspect ration? #232 (comment)

as in the Darknet with letter_box=1?

or as in the OpenCV ?

What network resolution do you use?
And what image size do you use?

is letter_box = 1 used for training or detection?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy yolov3 in deepstream lower then darknet #5413

Accuracy yolov3 in deepstream lower then darknet #5413

daliel commented Apr 29, 2020 •

edited

Loading

marcoslucianops commented Apr 29, 2020

AlexeyAB commented Apr 29, 2020

daliel commented Apr 29, 2020

AlexeyAB commented Apr 30, 2020

marcoslucianops commented Apr 30, 2020

daliel commented Apr 30, 2020 •

edited

Loading

marcoslucianops commented Apr 30, 2020

daliel commented May 1, 2020

marcoslucianops commented May 5, 2020

shubham-shahh commented Oct 3, 2020

Accuracy yolov3 in deepstream lower then darknet #5413

Accuracy yolov3 in deepstream lower then darknet #5413

Comments

daliel commented Apr 29, 2020 • edited Loading

marcoslucianops commented Apr 29, 2020

AlexeyAB commented Apr 29, 2020

daliel commented Apr 29, 2020

AlexeyAB commented Apr 30, 2020

marcoslucianops commented Apr 30, 2020

daliel commented Apr 30, 2020 • edited Loading

marcoslucianops commented Apr 30, 2020

daliel commented May 1, 2020

marcoslucianops commented May 5, 2020

shubham-shahh commented Oct 3, 2020

daliel commented Apr 29, 2020 •

edited

Loading

daliel commented Apr 30, 2020 •

edited

Loading