Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does anybody achieve the metric depth estimation on a custom dataset successfully? #68

Open
MichaelWangGo opened this issue Feb 1, 2024 · 16 comments

Comments

@MichaelWangGo
Copy link

Hi,

This post is just to discuss how to achieve metric depth estimation on a custom dataset, like I am using SCARED dataset. If anyone successfully fine-tune the model and achieve metric depth estimation, could you tell me which code did you modify?

@MichaelWangGo MichaelWangGo changed the title Does anybody fine-tune the metric depth estimation model successfully? Does anybody achieve the metric depth estimation on a custom dataset successfully? Feb 1, 2024
@1ssb
Copy link
Contributor

1ssb commented Feb 3, 2024

I did it on the RealEstate10k unfortunately Depth does not exist on the dataset as such so evaluation is not possible but in general if you take an anecdotal look, it's pretty good.

@Denny-kef
Copy link

@1ssb Have you seen results like this with the metric depth outdoor checkpoints? With all of my experimentation it seems like the sky predictions are not good. Although the relative depth predicts the sky and other "background" extremely well!
image

@1ssb
Copy link
Contributor

1ssb commented Feb 5, 2024 via email

@Denny-kef
Copy link

Denny-kef commented Feb 5, 2024

Hi @1ssb thanks for getting back to me! I am using the outdoor checkpoints and just wondering if you (or anyone else) has seen similar results with the metric depth predictions?

@Denny-kef
Copy link

Denny-kef commented Feb 5, 2024

That's interesting about the metric vs relative thing. I mean I could mask out the background using the relative depth network or a lightweight segmentation network but it seems like there should be a better way..

@1ssb
Copy link
Contributor

1ssb commented Feb 5, 2024

Oh this is interesting @Denny-kef I took a look at your image and it seems to me it is captured by either a fish eye lens or the image itself is a bit distorted as in to the eye itself it looks like the clouds are much closer than they actually are, this is a very cool thing as well. I am not sure I have an exact answer to this but it might as well be an OOD.

@pestrstr
Copy link

pestrstr commented Feb 6, 2024

@Denny-kef Idk if this can help you in your task, but this is my personal interpretation:
authors report on the paper that they set the disparity value (inverse of depth) to 0 for all the pixels labeled as "sky" by a semantic segmentation model (see section 3.1 of the paper on arxiv). I haven't seen the implementation details of their code, but this can affect their training in different ways:

  • if the disparity value of 0 is considered as invalid and masked during training, it means that the MDE model is not trained to decode the original disparity/depth value for the sky
  • even if the disparity value of 0 is considered as "valid" and not masked to the model during training, the MDE model sees all the sky pixels as having the same disparity value (that is 0), so the learned features for the sky pixels would not be expressive enough to eventually decode other values

That said, I think the features from their frozen encoder are really powerful for metric depth estimation, but I guess it would very hard to use them to produce correct values on an absolute scale for the sky.
In relative depth estimation, the overall qualitative goodness of the sky predictions could come from the semantic feature alignment done during training (see section 3.3)

@loevlie
Copy link

loevlie commented Feb 8, 2024

The best solution that I found for the "background" issue with metric depth estimation predictions is this:

  1. Retrieve the relative depth map as a secondary output from the metric depth estimation model.
  2. Using that depth map (since it is much better at predicting the background) I was able to generate a binary mask to eliminate things like the sky from my metric depth results.

Importantly, these operations don't add any significant time to the inference.

@1ssb
Copy link
Contributor

1ssb commented Feb 8, 2024

Bottomline: Don't try to predict skies or reflections.

@loevlie
Copy link

loevlie commented Feb 8, 2024

I was not trying to predict skies but I was trying to remove them from the outputted depth map so they don't show up in the point cloud. But yes do not try to predict the depth of the sky or reflections!

@LiheYoung
Copy link
Owner

LiheYoung commented Feb 9, 2024

Hi @loevlie, if you are trying to detect the sky and remove it, you can try our relative depth models. The output value 0 from these models can be considered as the sky (or extremely far). Alternatively, you can use a pre-trained semantic segmentation model to detect the sky.

@loevlie
Copy link

loevlie commented Feb 9, 2024

Hi @LiheYoung, yes that works very well! Thank you!

@xiaobh1519
Copy link

@1ssb Have you seen results like this with the metric depth outdoor checkpoints? With all of my experimentation it seems like the sky predictions are not good. Although the relative depth predicts the sky and other "background" extremely well! image

Could you kindly share with me the parameters you adjusted during fine-tuning? I've been encountering poor performance in my experiments with another dataset, and I've been struggling to resolve the issue. The details of the problem are as follows.#172 (comment)

@andrewhbradley9
Copy link

andrewhbradley9 commented May 17, 2024

@Denny-kef
Hi Denny, would you be able to explain how you got the metric depth working and your output depth images? I've been trying to run the metric outdoor model on my custom dataset but have been running into a lot of issues. Any help would be greatly appreciated!

@shilpaullas97
Copy link

I did it on the RealEstate10k unfortunately Depth does not exist on the dataset as such so evaluation is not possible but in general if you take an anecdotal look, it's pretty good.

Hi @1ssb ,

Are you able to train metric depth estimation on a dataset without depth maps (labels) ?
Could you please share more details about your training trial?

@1ssb
Copy link
Contributor

1ssb commented Jun 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants