Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluating on KITTI Improved Ground Truth #21

Open
alelopes opened this issue Nov 22, 2023 · 1 comment
Open

Evaluating on KITTI Improved Ground Truth #21

alelopes opened this issue Nov 22, 2023 · 1 comment

Comments

@alelopes
Copy link

alelopes commented Nov 22, 2023

Hi, First of all, congratulations on this great work!

I'm evaluating recent Depth Estimation techniques and I'm wondering if you could help me to validate the results.

I downloaded your SwinLarge predictions and wanted to compare them with the KITTI Improved Ground Truth [1] directly by comparing your output map with the GT.

I followed your instructions by dividing by 256 (as the GT data), and I interpolated just like your code do on the output of the model, using F.interpolate with mode=bicubic and align_corners=True.

I'm following Monodepth2 procedures to compare, therefore not using Garg's crop in here.

The results are the following:

abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.086 & 0.539 & 4.228 & 0.153 & 0.913 & 0.979 & 0.991 \

I was expecting really lower results. Can you validate these steps, please? Are the SwinLarge predictions giving the correct outcome?

The code is quite simple, and I'll share it above here just so you can check it (if you want).

`

def compute_errors(gt, pred):
    thresh = np.maximum((gt / pred), (pred / gt))
    a1 = (thresh < 1.25     ).mean()
    a2 = (thresh < 1.25 ** 2).mean()
    a3 = (thresh < 1.25 ** 3).mean()
    rmse = (gt - pred) ** 2
    rmse = np.sqrt(rmse.mean())

    rmse_log = (np.log(gt) - np.log(pred)) ** 2
    rmse_log = np.sqrt(rmse_log.mean())

    abs_rel = np.mean(np.abs(gt - pred) / gt)

    sq_rel = np.mean(((gt - pred) ** 2) / gt)

    return abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3
MIN_DEPTH = 1e-3
MAX_DEPTH = 80
pred = cv2.imread(pred_path, -1)
pred = pred / 256
gt = cv2.imread(gt_path, -1)
gt_depth = gt / 256
gt_height, gt_width = gt_depth.shape[:2]
mask = np.logical_and(gt_depth > MIN_DEPTH, gt_depth < MAX_DEPTH)    
pred_depth = F.interpolate(
            torch.from_numpy(pred).unsqueeze(0).unsqueeze(0),
            gt.shape,
            mode="bicubic",
            align_corners=True,
        )            
pred_depth[pred_depth < MIN_DEPTH] = MIN_DEPTH
pred_depth[pred_depth > MAX_DEPTH] = MAX_DEPTH
compute_errors(gt_depth, pred_depth)

`

I was expecting lower values than what you provided in the paper (like... Abs Rel probably lower than 0.05) but actually got way higher values (like... Abs Rel 0.086).

Thanks again for your work!

Ref.

[Uhrig, Jonas, et al. "Sparsity invariant cnns." 2017 international conference on 3D Vision (3DV). IEEE, 2017.]

@lpiccinelli-eth
Copy link
Collaborator

Hey, thank you for the appreciation.

The evaluation crop is usually quite important.
Anyway, typically the validation RGB images correspond to the cropped validation images with shape (352,1216), you can look in kitti.py lines 171-176.
The snippet you provided seems fine, so the problem lies in the data. Either the prediction I provided or the validation

I add here the results with eigen-crop or without any crop obtained by running test.py with kitti config, swinlarge model and changing the "crop" flag in the .json file (to "eigen" for Eigen crop and to "any" for no crop).
Crop | abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
No-crop | 0.0574 | 0.201 | 2.463 | 0.088 | 0.966 | 0.995 | 0.999
Eigen | 0.059 | 0.216 | 2.573 | 0.089 | 0.965 | 0.995 | 0.999 \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants