Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing results with SuperPoint + MNN on Megadepth-1500 #56

Closed
guipotje opened this issue Aug 26, 2023 · 2 comments
Closed

Reproducing results with SuperPoint + MNN on Megadepth-1500 #56

guipotje opened this issue Aug 26, 2023 · 2 comments

Comments

@guipotje
Copy link

Hello, first of all thank you for the great work and to make the license permissive, it will surely boost research in image matching!

I am trying to reproduce SuperPoint + MNN as a baseline. For that, I follow the protocol of the paper, trying to achieve results as close as possible to values reported in Table 2 of the LightGlue paper. I am doing the following steps:

  • Resize image such that longer dim is 1600 px;
  • Extract top 2048 keypoints using the default parameters for SuperPoint defined in this repo;
  • Match descriptors using NN + Mutual Check;
  • Use OpenCV findEssentialMat with prob=0.99999 and default "classic" cv2.RANSAC. As those details are not explicitly mentioned, I'm basically following LoFTR protocol defined in their original repo as suggested in LightGlue's paper for finding pose AUC @ [5,10,20].
  • I tested with several inlier thresholds ranging from [0.25, 2.5] px. The best results I can achieve is the following:
ransac_thr = 1.5 
{'auc@5': 0.251782299270867, 'auc@10': 0.3987322068921645, 'auc@20': 0.5415882032042043}

I also attempted to run LO-RANSAC instead of using cv2.RANSAC since it gives a great boost in AUC in Table 2, but without success. I tested the implementation from both pydegensac and cv2.USACs, but with results very far away to the performance of AUC@5 of 0.51, testing with several configurations of inlier thresholds and different flags. Could you guys kindly provide more details on the SuperPoint parameters, RANSAC implementation and hyperparameters used to achieve these results, specifically for SuperPoint + MNN matching (Table 2)?

Thank you in advance!

@Phil26AT
Copy link
Collaborator

Phil26AT commented Aug 31, 2023

Hi @guipotje

sorry for the late reply. The pipeline is resize to 1600px -> inference -> rescale keypoints to original image size -> estimate relative pose. We use top 2048 keypoints and set the detection threshold to 0. The threshold range is correct, we found the best results at th=1 for SP+NN.

Here is the pose estimation code for opencv:

def estimate_relative_pose(
    kpts0, kpts1, K0, K1, thresh, conf=0.99999, solver=cv2.RANSAC
):
    if len(kpts0) < 5:
        return None

    f_mean = np.mean([K0[0, 0], K1[0, 0], K0[1, 1], K1[1, 1]])
    norm_thresh = thresh / f_mean

    kpts0 = (kpts0 - K0[[0, 1], [2, 2]][None]) / K0[[0, 1], [0, 1]][None]
    kpts1 = (kpts1 - K1[[0, 1], [2, 2]][None]) / K1[[0, 1], [0, 1]][None]

    E, mask = cv2.findEssentialMat(
        kpts0, kpts1, np.eye(3), threshold=norm_thresh, prob=conf, method=solver
    )

    if E is None:
        return None

    best_num_inliers = 0
    ret = None
    for _E in np.split(E, len(E) / 3):
        n, R, t, _ = cv2.recoverPose(_E, kpts0, kpts1, np.eye(3), 1e9, mask=mask)
        if n > best_num_inliers:
            best_num_inliers = n
            ret = (R, t[:, 0], mask.ravel() > 0)
    return ret

For the best results (LO-RANSAC) we used the excellent PoseLib, which provides python bindings. There, we tested thresholds in a range [0.5, 3.0].

Here is a small script for poselib:

import poselib


def intrinsics_to_camera(K):
    px, py = K[0, 2], K[1, 2]
    fx, fy = K[0, 0], K[1, 1]
    return {
        "model": "PINHOLE",
        "width": int(2 * px),
        "height": int(2 * py),
        "params": [fx, fy, px, py],
    }


M, info = poselib.estimate_relative_pose(
    kpts0, kpts1,
    intrinsics_to_camera(K0),
    intrinsics_to_camera(K1),
    {"max_epipolar_error": th},
)

R, t, inl = M.R, M.t, info["inliers"]

@guipotje
Copy link
Author

Hello @Phil26AT, thank you very much for the detailed answer!

Poselib indeed provides impressive gains in pose accuracy. After the suggestions, I was able to reproduce SuperPoint results by running the suggested pipeline only after using MNN + ratio test with r = 0.95 (31.4 AuC @ 5), but with MNN alone the results I obtain are worse than the reported in Table 2 (24.3 AuC @ 5). However, I think this is sufficient to validate the baseline. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants