-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
I am training a normal mmod detector like the ones in the examples, but I enable bounding box regression with:
net.subnet().layer_details().set_num_filters(5 * options.detector_windows.size());Current Behavior
I get the following error really often:
Error detected at line 1591.
Error detected in file ../../external/dlib/dlib/../dlib/dnn/loss.h.
which refers to:
Lines 1591 to 1592 in b401185
| DLIB_CASSERT(w > 0); | |
| DLIB_CASSERT(h > 0); |
I was not getting this error with the same code a while ago, but many things have changed since (especially CUDA versions).
Most of the time, this error happens at the beginning of the training, when everything is very chaotic. However, sometimes I get the error after the loss has stabilized, as well.
I tried updating the gradient for h and w only when they are positive and let the gradient be 0 otherwise.
This avoids the crashing, but messes up the training (loss goes to inf).
I've also noticed this didn't happen when I changed the lambda value to much lower values, such as 1, instead of the default 100. Maybe it's just that the lambda is too big?
EDIT: it also happens.
How would you proceed?
- Version: dlib master (19.21.99)
- Where did you get dlib: github
- Platform: Linux 64 bit, CUDA 11.0.2, CUDNN 8.0.2.39
- Compiler: GCC-9.3.0 for C (in order to enable CUDA) and GCC-10.2 for C++