Loss of precision with v1.3 #247

Xavier31 · 2021-01-05T09:06:29Z

Hi
It seems that the results of the update (version 1.3.1) are noticeably worse than the last one (version 1.2).
I made some benchmark and although it does not seem much in terms of NME, it is there and even more noticeable when you look at the images

(look at the eyes and temple landmarks)

version 1.2

version 1.3
I looked at a bunch of results on the FFHQ dataset, and noticed consistently worse precision.

I could not track down what causes this difference, my suspicion is currently on the new batch inference code but could not pinpoint it yet

1adrianb · 2021-01-05T10:07:10Z

This is strange indeed, what function are you using to get the landmarks? Could you please attach the input image as seen by the detector, without points on top?

Xavier31 · 2021-01-05T10:46:40Z

I am using get_landmarks_from_image() with the face bounding box detected by SFD in argument
here is the original image

1adrianb · 2021-01-05T23:51:46Z

Thanks for attaching it. The difference is caused by how the image normalization is performed. The correct one for SFD should be BGR+subtracting the mean. Prior to 1.3.0 there were some inconsistencies on this matter (during batch detection this was wrongly not performed).
Now, it looks that the fix made things slightly worse in certain cases which implies that the scale used is slightly suboptimal.
If on average you are seeing better performance using this I can look into reverting this, thought this is not the proper fix. Have you tried this one other datasets too? This particular dataset has some strong bias towards large frontal poses that cover most of the image and may not be representative for more "in-the-wild" images.

Xavier31 · 2021-01-06T11:18:30Z

Thanks for taking the time to look into this.
I did try on a private dataset but it also consists of large frontal poses like selfies so I guess it is also biased. Results are also worse with v1.3 (graph at the top of my first post). On the other hand the results using the BlazeFace detector are fine. I did not try on a more in the wild dataset such as 300W
The question is how tuned is the actual implementation to the face detector ? I can see several hard coded variables when computing the scale. (the reference_scale, the 200.0 in the transform function..) Shouldn't the reference_scale be different for different detectors ? perhaps did you tune those values with the wrong normalization for SFD ?
a more general question is how would I go if I want to use another face detector ? especially one with non-square boxes ?

Xavier31 · 2021-01-06T11:30:16Z

bonus question : for selfie images, cropping the face to a square introduces a lot of deformation. Would you recommend to retrain (or fine-tune) the network on these images ?

1adrianb · 2021-01-06T14:44:03Z

@Xavier31 At the time this models were trained the noise was generated synthetically during training, neither SFD nor blaze face were released yet.
There is not much tuning performed, the values for the scales and shifts were selected by testing 2-3 values on a small subset of images. Ideally yes, this values should be slightly different for each detector, assuming that they define or predict bounding boxes of different size of course (ie some may consider as face the full face, others just the region enclosing the eyes and the mouth).

The bounding boxes don't have to be square. The way the cropping works is by taking the bounding box and computing based on it a center point and a scale. Based on them a center crop and re-scaling is performed. There is no distortion introduced, the aspect ratio is preserved by the cropping function.

The detectors provided already predict rectangles so they are not squares. You can pretty much use any detector you like, there is no particular preference for one or other as long as they perform well.

Xavier31 · 2021-01-06T16:12:00Z

ah thanks for the explanation, I did not realize the aspect ratio was preserved. But that means that in some cases, the cropped image will include a lot of background, and the heatmap that are far from the center of the image are much more noisy right ?

1adrianb · 2021-01-06T16:34:51Z

What happens is that sometimes the face indeed can be too small (ie include a lot of background) or cropped out (lets see that the chin may get cut by mistake if the scale is off.
The networks can be retrained with more rectangular shapes if that's desired. For example for human pose, since most often that not humans can be contained in a tall rectangle, the shape of the heatmaps are indeed rectangular.

1adrianb closed this as completed Aug 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss of precision with v1.3 #247

Loss of precision with v1.3 #247

Xavier31 commented Jan 5, 2021

1adrianb commented Jan 5, 2021

Xavier31 commented Jan 5, 2021

1adrianb commented Jan 5, 2021

Xavier31 commented Jan 6, 2021

Xavier31 commented Jan 6, 2021 •

edited

1adrianb commented Jan 6, 2021

Xavier31 commented Jan 6, 2021

1adrianb commented Jan 6, 2021

Loss of precision with v1.3 #247

Loss of precision with v1.3 #247

Comments

Xavier31 commented Jan 5, 2021

1adrianb commented Jan 5, 2021

Xavier31 commented Jan 5, 2021

1adrianb commented Jan 5, 2021

Xavier31 commented Jan 6, 2021

Xavier31 commented Jan 6, 2021 • edited

1adrianb commented Jan 6, 2021

Xavier31 commented Jan 6, 2021

1adrianb commented Jan 6, 2021

Xavier31 commented Jan 6, 2021 •

edited