-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Multi-Object Multi-Camera Tracking #4915
Comments
@saurabheights , thanks for submitting the issue.
CVAT doesn't have batch processing for now. I remember that users complained about that.
Also, we have the OpenCV tracker and #4886 from @dschoerk. We will merge it soon.
The number of steps is big, but it will give you the best quality. A related article: https://arxiv.org/pdf/2003.07618.pdf |
@saurabheights i noticed you are using v2.1.0, when using any single object tracker you should experiment with the develop branch. until recently there was an off-by-one error in the frame number of the prediction. #4870 (the relevant commit is linked in this issue, use any newer version than this)
i would like to see this as a new feature! currently the single object tracking is stateless on the side of nuclio, which means that the tracker state is sent to nuclio for each frame. without testing i think this is a significant computation overhead. at some point i had a tracker state of ~3mb for the TransT implementation, but haven't investigated it further. for siamese trackers like SiamMask and also TransT this at least includes the cropped search region and template image in some shape or form. just an fyi: TransT is slightly slower for me than SiamMask (using an RTX 3090), but is far more accurate in my use case of pedestrian tracking. a neat benefit of siamese trackers over object detection is that they are typically class agnostic. |
My actions before raising this issue
Expected Behaviour
Current Behaviour
I need to track each object visible on 4 cameras. AFAIK, CVAT doesnt support multiple cameras. To workaround this problem, I have created a single video by tiling video from 4 cameras.
However, to annotate faster, I would prefer to have some form of automatic annotations, or atleast semi-automatic with minimal supervision. I have tested object detection model as well as SIAMMASK, but both comes with their own problems.
FasterRCNN doesnt generate tracks. Also, generating automatic annotations for 30 minutes video took a few hours[need to profile, but it is quite slow]. Q. Is there a way to speed this up? For example, increasing inference batch size.
SIAMMASK -
Q. Please can you provide any ideas to improve this process in general or correct me where you think I might be doing it wrong?
Another idea I have is to go for adding an object detection and tracking model that doesnt require seeds. Use it instead of SIAMMASK to generate automatic annotations before manual annotation process. However, I am not sure if tracking annotations generated by a model can be directly ingested into CVAT.
Your Environment
git log -1
):commit 3bd7c7e422d57986bd629da07214a3a3e666c68c (HEAD -> master, tag: v2.1.0, origin/master)
docker version
(e.g. Docker 17.0.05):20.10.9
The text was updated successfully, but these errors were encountered: