Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking more than 8 frames per sequence #10

Open
phongnhhn92 opened this issue Dec 5, 2022 · 8 comments
Open

Tracking more than 8 frames per sequence #10

phongnhhn92 opened this issue Dec 5, 2022 · 8 comments

Comments

@phongnhhn92
Copy link

Hi,

In the demo.py file, when I tried to change S = 8 to S = 10 then the model doesn't work. Did you hard-coded the model to only works if there are only 8 input frames each time ?

@aharley
Copy link
Owner

aharley commented Dec 12, 2022

Yes, the released model weights are for S=8. For longer tracking, you need to chain the model over time. There is code for this in chain_demo.py

@phongnhhn92
Copy link
Author

Hi @aharley, thanks for your reply ! I have another question: How can I obtain the dense optical flow output similar to RAFT ? In your method, in both example, we need to choose a small amount of points to perform tracking. In my case, I need to obtain the trajectories of every pixels in the starting frame.

@aharley
Copy link
Owner

aharley commented Dec 12, 2022

In the part where you specify the start locations:
https://github.com/aharley/pips/blob/main/demo.py#L31-L37

change it to a dense grid, like this:

    grid_y, grid_x = utils.basic.meshgrid2d(B, H, W, stack=False, norm=False, device='cuda')
    xy = torch.stack([grid_x, grid_y], dim=-1) # B, H*W, 2

If you run out of memory when trying to run the model for this many particles, split the list into batches of a good size for your GPU, like this: https://github.com/aharley/pips/blob/main/test_on_davis.py#L111-L125

@phongnhhn92
Copy link
Author

Thanks for the pointer ! I will close the issue now.

@phongnhhn92
Copy link
Author

Hi @aharley , I have a further question: in the davis code, I see that you only predict flows of the entire image using a sequence of 8 frames. However, in the chain_demo.py, you show an example of a single tracked pixels.

I wonder if you have tried to extend the chain_demo.py with dense predictions ? I assume the confidence score thresholding here is important to make sure the trajectory is correct.

@phongnhhn92 phongnhhn92 reopened this Dec 16, 2022
@aharley
Copy link
Owner

aharley commented Dec 19, 2022

Tracking all pixels is generally very memory-heavy, and pairing that with the chaining is tricky but doable. The tricky part is: the chaining technique allows each target to choose variable-length step sizes for re-initializing the tracker (within the S=8, by looking at the confidence/visibility score like you said), so you need some clever bookkeeping to parallelize as much as possible. I think you can aim for a model that runs at most K forward passes, where K is the number of frames times the amount you need to serialize (e.g., on a 80G gpu maybe no serialization will be necessary, but on a 12G gpu maybe you do 4 forward passes to get all the pixels' trajectories).

@dkhanna511
Copy link

Hi, I also wanted to figure out how to track multiple points in a frame for a longer duration. I've run demo.py and chain_demo.py and as per my understanding, demo takes a grid of points and the chain_demo takes only one point. I would like to run it on longer sequences with different data to make some sense of the outputs. Can we change either of these files to work it out?

@dkhanna511
Copy link

Hi @phongnhhn92 , you had any luck running chain_demo.py with multiple points simultaneously and create a single gif?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants