Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

near real-time?? #17

Closed
monajalal opened this issue Jun 27, 2023 · 8 comments
Closed

near real-time?? #17

monajalal opened this issue Jun 27, 2023 · 8 comments

Comments

@monajalal
Copy link

Hi @wenbowen123
I am trying to fill in the gap for my own understanding.
So, your method works for 6 DoF pose of novel objects (and novel classes) that were not in training and are only shown in inference time to the model.

However, what I don't understand why is it taking ~3 hr ish as reported by other users to just run the demo for the milk?

Also, in the paper you mention we would only need the segmentation mask for the first frame of the video, however, in the milk input data from Google Drive, the entire video has seg mask.

Any update is really great specially about novel objects of novel class 6 DoF

image

@redgreenblue3
Copy link

redgreenblue3 commented Jun 27, 2023

Regarding segmentation, I think the idea is to run the XMem segmentation code (https://github.com/hkchengrex/XMem).

This requires only the mask for one frame and then outputs the mask for all other frames. This worked for me (just follow the setup and instructions on the XMem repo). The README.md of this (BundleSDF) project mentions something along those lines I think (and that this code couldn't be included in this repo because of copyright issues).

@wenbowen123
Copy link
Collaborator

Thanks @redgreenblue3 for the explanation! Yes, and as part of the reason, the current release has some code re-factorization that does not necessarily reproduce the same speed. But we hope this code release could still benefit the community, and we are also looking forwarding to seeing future work making it blazingly fast. That said, there are many parameters in the config where you can tune. The current config is not the optimal for live running. Like I mentioned, there are settings like sync restriction, image resolution, etc that will also affect the speed a lot. @monajalal V100 GPU is also not a state-of-art GPU.

@monajalal
Copy link
Author

While I understand V100 is not a state-of-the-art GPU, could you please state what is the minimum GPU needed for your experiments? Is it 3090 Ti with 24 G VRAM or a GPU with certain specs and CC?

Thanks for your help with the open-source community.

As someone who is interested in reproducing the results but doesn't necessarily have access to a 3090 Ti but have access to Azure Subscription and can potentially find something if I know enough specs, this would be a great help.

@wenbowen123
Copy link
Collaborator

#18

@wang-shuaikang
Copy link

关于分段,我认为想法是运行 XMem 分段代码(https://github.com/hk Chengrex/XMem)。

这只需要一帧的掩码,然后输出所有其他帧的掩码。这对我有用(只需遵循 XMem 存储库上的设置和说明即可)。我认为这个(BundleSDF)项目的 README.md 提到了类似的内容(并且由于版权问题,该代码不能包含在这个存储库中)。

May I ask how to combine these two? Is it to get the required data set of BundleSDF after obtaining the segment mask in XMem?

@Jingranxia
Copy link

Hi 你好@wenbowen123 I am trying to fill in the gap for my own understanding.我试图填补我自己理解的空白。 So, your method works for 6 DoF pose of novel objects (and novel classes) that were not in training and are only shown in inference time to the model.因此,您的方法适用于未在训练中且仅在推理时间内向模型显示的新对象(和新类)的 6 DoF 姿势。

However, what I don't understand why is it taking ~3 hr ish as reported by other users to just run the demo for the milk?但是,我不明白为什么其他用户报告的运行牛奶演示需要 ~3 小时?

Also, in the paper you mention we would only need the segmentation mask for the first frame of the video, however, in the milk input data from Google Drive, the entire video has seg mask.此外,在您提到的论文中,我们只需要视频第一帧的分割掩码,但是,在来自 Google Drive 的牛奶输入数据中,整个视频都有 seg 掩码。

Any update is really great specially about novel objects of novel class 6 DoF任何更新都很棒,特别是关于新型 6 类自由度的新物体

image

Hello, thank you for reproducing the work. I also have the same question. Why do some people run the first line of code segment for 3 hours? Does this conflict with the author's statement of real-time?

@Jingranxia
Copy link

Thank you for your excellent work @wenbowen123 run_demo.py, took an hour, does it conflict with what you said about real-time? My understanding is not thorough enough. What exactly does this feature do? Would it be convenient to answer my questions. Delaying some of your valuable time

@monajalal
Copy link
Author

@wenbowen123 with a RTX ADA6000 GPU and 48G GPU VRAM, it is not (near) real-time even if we pass all the masks for each frame of the video (and don't count that time). Are you planning to release the version of the code that corresponds to your abstract statement in the OP?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants