Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy yolov3 in deepstream lower then darknet #5413

Open
daliel opened this issue Apr 29, 2020 · 10 comments
Open

Accuracy yolov3 in deepstream lower then darknet #5413

daliel opened this issue Apr 29, 2020 · 10 comments

Comments

@daliel
Copy link

daliel commented Apr 29, 2020

I use yolov3. Trained with the parameter letter_box = 1. In testing, I use the key -letter_box and I'm happy with the result.
When I use deeppstream I use the key maintain-aspect-ratio = 1. And as a result, objects that come in contact with the bottom edge of the image are not marked as objects. If I use the key maintain-aspect-ratio = 0, then these objects are marked with a bounding box. But at the same time, overall accuracy drops.

my code:

static std::vector<NvDsInferParseObjectInfo>
nonMaximumSuppression(const float nmsThresh, std::vector<NvDsInferParseObjectInfo> binfo, const uint& netW, const uint& netH)
{
    auto overlap1D = [](float x1min, float x1max, float x2min, float x2max) -> float {
        if (x1min > x2min)
        {
            std::swap(x1min, x2min);
            std::swap(x1max, x2max);
        }
        return x1max < x2min ? 0 : std::min(x1max, x2max) - x2min;
    };
    auto computeIoU
        = [&overlap1D](NvDsInferParseObjectInfo& bbox1, NvDsInferParseObjectInfo& bbox2) -> float {
        float overlapX
            = overlap1D(bbox1.left, bbox1.left + bbox1.width, bbox2.left, bbox2.left + bbox2.width);
        float overlapY
            = overlap1D(bbox1.top, bbox1.top + bbox1.height, bbox2.top, bbox2.top + bbox2.height);
        float area1 = (bbox1.width) * (bbox1.height);
        float area2 = (bbox2.width) * (bbox2.height);
        float overlap2D = overlapX * overlapY;
        float u = area1 + area2 - overlap2D;
        return u == 0 ? 0 : overlap2D / u;
    };

    std::stable_sort(binfo.begin(), binfo.end(),
                     [](const NvDsInferParseObjectInfo& b1, const NvDsInferParseObjectInfo& b2) {
                         return b1.detectionConfidence > b2.detectionConfidence;
                     });
    std::vector<NvDsInferParseObjectInfo> out;
    std::vector<NvDsInferParseObjectInfo> out_out;
    for (auto i : binfo)
    {
        bool keep = true;
        for (auto j : out)
        {
            if (keep)
            {
                float overlap = computeIoU(i, j);
                keep = overlap <= nmsThresh;
            }
            else
                break;
        }
        if (keep)
        {
            out.push_back(i);
            //out_out.push_back(i);
            float centerx = i.left + (i.width / 2);
            float centery = i.top + (i.height / 2);
            float side = std::min(std::min(netW - 1, netH - 1), std::max(i.width, i.height));
            i.left = std::max(centerx - (side / 2),float(0.0));
            i.top = std::max(centery - (side / 2),float(0.0));
            i.width = side;
            i.height = side;
            //i.left = clamp(i.left, 1, (netW - 1));
            //i.top = clamp(i.top, 1, (netH - 1));
            printf("points: %i, %i, %i, %i, wh: %i, %i\n",int(i.left), int(i.top), int(i.left+i.width), int(i.top+i.height), int(i.width), int(i.height));
            if ((i.left+i.width) > netW) {
                i.left = i.left + (netW - 1 - (i.left + i.width));
                printf(" change left %i\n", int(i.left));
            }
            if ((i.top+i.height) > (netH-1)) {
                i.top = i.top + (netH - 1 -(i.top + i.height));
                printf(" change top %i\n", int(i.top));
            }
            if (i.left < 0) {
                i.left = 0;
            }
            if (i.top < 0) {
                i.top = 0;
            }
            out_out.push_back(i);
        }
    }
    return out_out;
}
@marcoslucianops
Copy link

The accuracy is slightly lower than Darknet, but try change lines 307-108 in nvdsparsebbox_Yolo.cpp

    static const float kNMS_THRESH = 0.3f;
    static const float kPROB_THRESH = 0.7f;

to (Darknet values)

    static const float kNMS_THRESH = 0.45;
    static const float kPROB_THRESH = 0.25;

And see if you get better results.

@AlexeyAB
Copy link
Owner

What approach of resizing is used in TRT for keepning aspect ration? #232 (comment)

  • as in the Darknet with letter_box=1?
  • or as in the OpenCV ?

What network resolution do you use?
And what image size do you use?

@daliel
Copy link
Author

daliel commented Apr 29, 2020

What approach of resizing is used in TRT for keepning aspect ration? #232 (comment)

I don’t know which approach the deepstream uses. But I assume that

letterbox_image

Objects on the left, right and top border of the image are marked.
Network resolution: 608x608x3
Image resolution: 1920x1080x3

@AlexeyAB
Copy link
Owner

Also you can try to use https://github.com/ceccocats/tkDNN + TensorRT instead of Deepstream+TensorRT.

tkDNN supports YOLOv4 with higher speed than Deepstream.

@marcoslucianops
Copy link

tkDNN supports YOLOv4 with higher speed than Deepstream.

In my preliminary tests, tkDNN is slower than DeepStream (used yolov3-tiny on Jetson Nano), but I haven’t tested it property yet.

@daliel
Copy link
Author

daliel commented Apr 30, 2020

Sorry, but I want to use a cascade of neural networks. This is implemented in the deepstream, but for some unknown reason, the classification works better with square boxes from yolo than with the built-in crop function while maintaining the aspect ratio.

@marcoslucianops
Copy link

@daliel, do you using patch to non-square bbox in deepstream?

@daliel
Copy link
Author

daliel commented May 1, 2020

@marcoslucianops, square - all sides are equal. I changed the output code of the bounding boxes to the deepstream as in my first message (correct works with maintain-aspect-ratio = 1). But the problem is with the detection of objects on the lower border of the image (the object is partially outside the image). Boxes are not displayed when touching the lower border of the image

@marcoslucianops
Copy link

@daliel, Try new DeepStream 5: https://developer.nvidia.com/deepstream-sdk

@shubham-shahh
Copy link

What approach of resizing is used in TRT for keepning aspect ration? #232 (comment)

  • as in the Darknet with letter_box=1?
  • or as in the OpenCV ?

What network resolution do you use?
And what image size do you use?

is letter_box = 1 used for training or detection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants