Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection Confidence Needed. #262

Closed
MagicFrogSJTU opened this issue Apr 1, 2021 · 14 comments
Closed

Detection Confidence Needed. #262

MagicFrogSJTU opened this issue Apr 1, 2021 · 14 comments

Comments

@MagicFrogSJTU
Copy link
Contributor

The current code outputs grid coordinates as detection results without detection confidence. Therefore, the model often generates confusing detections for some edge-case images.
It is easy to get the face detection confidence, while it is hard to get the alignment confidence. I go through the code but it is not an easy job for new comers. Is there any approach?

@1adrianb
Copy link
Owner

1adrianb commented Apr 1, 2021

You can use the value of the max as a network confidence measure. While this is not perfect it can be used to detect wrong points fairly accurate. To achieve this you can modify this function:

def get_preds_fromhm(hm, center=None, scale=None):
and return also the max value in addition to that of argmax.

@MagicFrogSJTU
Copy link
Contributor Author

You can use the value of the max as a network confidence measure. While this is not perfect it can be used to detect wrong points fairly accurate. To achieve this you can modify this function:

def get_preds_fromhm(hm, center=None, scale=None):

and return also the max value in addition to that of argmax.

Thank you for your reply! I have already implemented the method. I wonder if there is any plan to add this as a formal feature?

@1adrianb
Copy link
Owner

1adrianb commented Apr 2, 2021

There were a few similar questions in the past, so probably its worth adding it. Feel free to make a pull request.

@MagicFrogSJTU
Copy link
Contributor Author

There were a few similar questions in the past, so probably its worth adding it. Feel free to make a pull request.

I will try to make a pull request in recent days!

@MagicFrogSJTU
Copy link
Contributor Author

MagicFrogSJTU commented Apr 29, 2021

@1adrianb Very Sorry that these days I am quite occupied.

Before making a PR, should we discuss the API first? How about this:

  1. Add a keyword: return_confidence=False
  2. return landmark point confidence along with coordinates. shape_before: 68, 2. shape_after: 68, 3.

What do you think?

@1adrianb
Copy link
Owner

@MagicFrogSJTU No worries!

  1. Agree
  2. My only concern with this is that depending on the detection type (2D or 3D) the value of the 3rd column may change from representing the depth (for 3D points) to confidence for 2D. Perhaps returning a separate vector with 68 values is simpler?

@MagicFrogSJTU
Copy link
Contributor Author

2. My only concern with this is that depending on the detection type (2D or 3D) the value of the 3rd column may change from representing the depth (for 3D points) to confidence for 2D. Perhaps returning a separate vector with 68 values is simpler?

My concern is:

  1. For detected faces, both the confidence and the coordinates are combined in one variable. For instance, [p1_x, p1_y, p2_x, p2_y, confidence]. Thus, for landmarks, should we keep the same pattern?
  2. If the last column represents landmark confidence, we can still make 2D and 3D compatible. For 2D, the 3rd column is for confidence. For 3D, the 4th column is for confidence.

What do you think?

@MagicFrogSJTU
Copy link
Contributor Author

@MagicFrogSJTU No worries!

  1. Agree
  2. My only concern with this is that depending on the detection type (2D or 3D) the value of the 3rd column may change from representing the depth (for 3D points) to confidence for 2D. Perhaps returning a separate vector with 68 values is simpler?

I know very little about API designing, so maybe you are right. Just give me a final result and I will implement it!

@1adrianb
Copy link
Owner

Can we go with a separate array please? Could you also describe please both the new flag and the returned value in the function doc description? Thanks!

@MagicFrogSJTU
Copy link
Contributor Author

MagicFrogSJTU commented Apr 30, 2021

Can we go with a separate array please? Could you also describe please both the new flag and the returned value in the function doc description? Thanks!

One last thing,
Which one to take?

  1. return landmark, landmark_confidence, detected_faces
  2. return (landmark, landmark_confidence), detected_faces

@MagicFrogSJTU
Copy link
Contributor Author

def get_landmarks_from_image(self, image_or_path, detected_faces=None, return_bboxes=False, return_landmark_score=False,):
    """Predict the landmarks for each face present in the image.
    This function predicts a set of 68 2D or 3D images, one for each image present.
    If detect_faces is None the method will also run a face detector.
     Arguments:
        image_or_path {string or numpy.array or torch.tensor} -- The input image or path to it.
    Keyword Arguments:
        detected_faces {list of numpy.array} -- list of bounding boxes, one for each face found
        in the image (default: {None})
        return_bboxes {boolean} -- If True, return the face bounding boxes in addition to the keypoints.
        return_landmark_score {boolean} -- If True, return the keypoint scores along with the keypoints.
    Return:
        result:
            1. If both return_bboxes and return_landmark_score is True, result will be:
                (landmarks, landmarks_scores), detected_faces
            2. If only return_landmark_score is True, result will be:
                landmarks, landmarks_scores
            3. If only return_bboxes is True, result will be:
                landmarks, detected_faces
            4. Otherwise:
                landmarks
    """

It seems over complicated. Cause we will have a lot of combinations.

What about always keeping returning three objects landmark, landmark_confidence, detected_faces, and setting the latter two as None in default? Like landmark, None, None

@1adrianb
Copy link
Owner

Agree, let's go then with 2 cases only: if either landmarks_confidence or detected_faces is True we return 3 values as you suggested, if both are False we will return for now a single value (i.e. only the landmarks). This should simplify this conditioning while maintaining backward compatibility. At a later point in a more major code revision this can be unified.

@MagicFrogSJTU
Copy link
Contributor Author

#271

@1adrianb 1adrianb closed this as completed May 4, 2021
@hengfei-wang
Copy link

What is the scale of confidence score? I got something like 1.7108647, 1.718052 , 1.6957333, 1.6364386, 1.5783452, 1.6006193. Is it not in (0,1)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants