near real-time?? #17

monajalal · 2023-06-27T18:12:41Z

Hi @wenbowen123
I am trying to fill in the gap for my own understanding.
So, your method works for 6 DoF pose of novel objects (and novel classes) that were not in training and are only shown in inference time to the model.

However, what I don't understand why is it taking ~3 hr ish as reported by other users to just run the demo for the milk?

Also, in the paper you mention we would only need the segmentation mask for the first frame of the video, however, in the milk input data from Google Drive, the entire video has seg mask.

Any update is really great specially about novel objects of novel class 6 DoF

redgreenblue3 · 2023-06-27T19:16:19Z

Regarding segmentation, I think the idea is to run the XMem segmentation code (https://github.com/hkchengrex/XMem).

This requires only the mask for one frame and then outputs the mask for all other frames. This worked for me (just follow the setup and instructions on the XMem repo). The README.md of this (BundleSDF) project mentions something along those lines I think (and that this code couldn't be included in this repo because of copyright issues).

wenbowen123 · 2023-06-27T22:30:53Z

Thanks @redgreenblue3 for the explanation! Yes, and as part of the reason, the current release has some code re-factorization that does not necessarily reproduce the same speed. But we hope this code release could still benefit the community, and we are also looking forwarding to seeing future work making it blazingly fast. That said, there are many parameters in the config where you can tune. The current config is not the optimal for live running. Like I mentioned, there are settings like sync restriction, image resolution, etc that will also affect the speed a lot. @monajalal V100 GPU is also not a state-of-art GPU.

monajalal · 2023-06-28T01:24:17Z

While I understand V100 is not a state-of-the-art GPU, could you please state what is the minimum GPU needed for your experiments? Is it 3090 Ti with 24 G VRAM or a GPU with certain specs and CC?

Thanks for your help with the open-source community.

As someone who is interested in reproducing the results but doesn't necessarily have access to a 3090 Ti but have access to Azure Subscription and can potentially find something if I know enough specs, this would be a great help.

wenbowen123 · 2023-06-28T02:21:37Z

#18

wang-shuaikang · 2023-08-15T02:44:35Z

关于分段，我认为想法是运行 XMem 分段代码（https://github.com/hk Chengrex/XMem）。

这只需要一帧的掩码，然后输出所有其他帧的掩码。这对我有用（只需遵循 XMem 存储库上的设置和说明即可）。我认为这个（BundleSDF）项目的 README.md 提到了类似的内容（并且由于版权问题，该代码不能包含在这个存储库中）。

May I ask how to combine these two? Is it to get the required data set of BundleSDF after obtaining the segment mask in XMem?

Jingranxia · 2023-11-29T02:42:42Z

Hi 你好@wenbowen123 I am trying to fill in the gap for my own understanding.我试图填补我自己理解的空白。 So, your method works for 6 DoF pose of novel objects (and novel classes) that were not in training and are only shown in inference time to the model.因此，您的方法适用于未在训练中且仅在推理时间内向模型显示的新对象（和新类）的 6 DoF 姿势。

However, what I don't understand why is it taking ~3 hr ish as reported by other users to just run the demo for the milk?但是，我不明白为什么其他用户报告的运行牛奶演示需要 ~3 小时？

Also, in the paper you mention we would only need the segmentation mask for the first frame of the video, however, in the milk input data from Google Drive, the entire video has seg mask.此外，在您提到的论文中，我们只需要视频第一帧的分割掩码，但是，在来自 Google Drive 的牛奶输入数据中，整个视频都有 seg 掩码。

Any update is really great specially about novel objects of novel class 6 DoF任何更新都很棒，特别是关于新型 6 类自由度的新物体

Hello, thank you for reproducing the work. I also have the same question. Why do some people run the first line of code segment for 3 hours? Does this conflict with the author's statement of real-time?

Jingranxia · 2023-11-29T02:49:43Z

Thank you for your excellent work @wenbowen123 run_demo.py, took an hour, does it conflict with what you said about real-time? My understanding is not thorough enough. What exactly does this feature do? Would it be convenient to answer my questions. Delaying some of your valuable time

monajalal · 2023-11-29T15:57:27Z

@wenbowen123 with a RTX ADA6000 GPU and 48G GPU VRAM, it is not (near) real-time even if we pass all the masks for each frame of the video (and don't count that time). Are you planning to release the version of the code that corresponds to your abstract statement in the OP?

wenbowen123 closed this as completed Jun 27, 2023

wenbowen123 mentioned this issue Jul 14, 2023

Modifying code for real-time tracking #45

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

near real-time?? #17

near real-time?? #17

monajalal commented Jun 27, 2023

redgreenblue3 commented Jun 27, 2023 •

edited

wenbowen123 commented Jun 27, 2023

monajalal commented Jun 28, 2023

wenbowen123 commented Jun 28, 2023

wang-shuaikang commented Aug 15, 2023

Jingranxia commented Nov 29, 2023

Jingranxia commented Nov 29, 2023

monajalal commented Nov 29, 2023

near real-time?? #17

near real-time?? #17

Comments

monajalal commented Jun 27, 2023

redgreenblue3 commented Jun 27, 2023 • edited

wenbowen123 commented Jun 27, 2023

monajalal commented Jun 28, 2023

wenbowen123 commented Jun 28, 2023

wang-shuaikang commented Aug 15, 2023

Jingranxia commented Nov 29, 2023

Jingranxia commented Nov 29, 2023

monajalal commented Nov 29, 2023

redgreenblue3 commented Jun 27, 2023 •

edited