Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DistributedVideoSampler IndexError #62

Closed
imhgchoi opened this issue Mar 14, 2022 · 5 comments
Closed

DistributedVideoSampler IndexError #62

imhgchoi opened this issue Mar 14, 2022 · 5 comments

Comments

@imhgchoi
Copy link

Hi, I'm trying to train with "sh track_exps/crowdhuman_mot_trainhalf.sh" on MOT20 with the pretrained model "crowdhuman_final.pth"
My GPU env is 8 RTX 3090's, but I'm keep getting the below error.
Anyone with the same issue?
Thanks

Traceback (most recent call last):
  File "main_track.py", line 390, in <module>
    main(args)
  File "main_track.py", line 195, in main
    sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
    split_flags = [c[0] for c in chunks]
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
    split_flags = [c[0] for c in chunks]
IndexError: index 0 is out of bounds for axis 0 with size 0
Traceback (most recent call last):
  File "main_track.py", line 390, in <module>
    main(args)
  File "main_track.py", line 195, in main
    sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
    split_flags = [c[0] for c in chunks]
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
    split_flags = [c[0] for c in chunks]
IndexError: index 0 is out of bounds for axis 0 with size 0
Done (t=2.44s)
creating index...
Traceback (most recent call last):
  File "main_track.py", line 390, in <module>
    main(args)
  File "main_track.py", line 195, in main
    sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
    split_flags = [c[0] for c in chunks]
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
    split_flags = [c[0] for c in chunks]
IndexError: index 0 is out of bounds for axis 0 with size 0

@PeizeSun
Copy link
Owner

Hi~
You are using 8 GPUs, but MOT20 has less than 8 videos, so some GPUs have no input. You could try to reduce the number of GPUs to no larger than the number of videos.

@imhgchoi
Copy link
Author

Wow, that was fast 👍
I see, I'll try it out.
Thank you so much :)

@imhgchoi
Copy link
Author

The training works fine with 4 GPU's, but now I'm having trouble with evaluation.
The evaluation phase ends after 100 steps, and wouldn't proceed further and freezes.

Test:  [  0/829]  eta: 0:10:56    time: 0.7915  data: 0.4873  max mem: 8853
Test:  [ 10/829]  eta: 0:04:03    time: 0.2967  data: 0.0471  max mem: 8853
Test:  [ 20/829]  eta: 0:03:37    time: 0.2429  data: 0.0032  max mem: 8853
Test:  [ 30/829]  eta: 0:03:26    time: 0.2367  data: 0.0033  max mem: 8853
Test:  [ 40/829]  eta: 0:03:19    time: 0.2364  data: 0.0034  max mem: 8853
Test:  [ 50/829]  eta: 0:03:13    time: 0.2334  data: 0.0035  max mem: 8853
Test:  [ 60/829]  eta: 0:03:08    time: 0.2286  data: 0.0034  max mem: 8853
Test:  [ 70/829]  eta: 0:03:03    time: 0.2241  data: 0.0032  max mem: 8853
Test:  [ 80/829]  eta: 0:02:59    time: 0.2208  data: 0.0033  max mem: 8853
Test:  [ 90/829]  eta: 0:02:55    time: 0.2253  data: 0.0034  max mem: 8853
Test:  [100/829]  eta: 0:02:52    time: 0.2261  data: 0.0033  max mem: 8853
Test: Total time: 0:00:25 (0.0305 s / it)

I wonder if anyone encountered such a phenomenon?
Thanks again

@PeizeSun
Copy link
Owner

This is a display bug(number of images in different GPUs are different). Actually the program is still running.

@imhgchoi
Copy link
Author

Woops my bad.
Seems that aggregation takes a lot of time.
Evaluation works perfectly fine.

Thanks a lot Mr. Sun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants