Skip to content

Training mmod detector crashes when using bounding box regression #2153

@arrufat

Description

@arrufat

I am training a normal mmod detector like the ones in the examples, but I enable bounding box regression with:

net.subnet().layer_details().set_num_filters(5 * options.detector_windows.size());

Current Behavior

I get the following error really often:

Error detected at line 1591.                                       
Error detected in file ../../external/dlib/dlib/../dlib/dnn/loss.h.

which refers to:

dlib/dlib/dnn/loss.h

Lines 1591 to 1592 in b401185

DLIB_CASSERT(w > 0);
DLIB_CASSERT(h > 0);

I was not getting this error with the same code a while ago, but many things have changed since (especially CUDA versions).
Most of the time, this error happens at the beginning of the training, when everything is very chaotic. However, sometimes I get the error after the loss has stabilized, as well.

I tried updating the gradient for h and w only when they are positive and let the gradient be 0 otherwise.
This avoids the crashing, but messes up the training (loss goes to inf).

I've also noticed this didn't happen when I changed the lambda value to much lower values, such as 1, instead of the default 100. Maybe it's just that the lambda is too big?
EDIT: it also happens.

How would you proceed?

  • Version: dlib master (19.21.99)
  • Where did you get dlib: github
  • Platform: Linux 64 bit, CUDA 11.0.2, CUDNN 8.0.2.39
  • Compiler: GCC-9.3.0 for C (in order to enable CUDA) and GCC-10.2 for C++

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions