New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bars of tqdm seems fixed at 0/50000 and fail to continue the Q-labeling #15
Comments
Just wait for longer. or if you have to eyeball the bar moving, set |
Thanks for your reply. I have tried to add this args, but still nothing happened, which shows the same message as before. And last time when I did |
Gotcha, yes one thing I forgot to mention is that if something is odd set |
Thanks! I have checked the link here https://github.com/dotchen/WorldOnRails/issues/6. I found when I set
|
I have been running the code for 3 days now on 6 workers. The script does not seem to end. I enable local mode and no error was shown in the logs. I used the released 1M dataset. How long did it take for the labeling? |
I need more information to help you debug. Can you tell me what the progress bar say? Or does the progress bar completely freeze? Also, do you see any GPU utilization while running the script? |
Also, you shouldn't use local_mode when |
Hi, When I set local_mode=False, I encounter the following issue with num_workers > 1. If I set local_mode=True, the program seems to run but does not use multiple GPU's. I think it is taking a lot of time to create the worker threads and I do not see the tqdm bar at all. Do you have any insight into this? @dotchen I am running the code on a very small subset of the data to figure out the issue. |
Can you check if this is related to #6?
By definition of local mode, the jobs run sequentially. |
Hi, I think I figured out the issue. The above error occurs if the cluster resources are not available. My admin settings probably block me from auto-balancing when using sbatch scripts. I think the issue can be closed. |
Make sure you have read FAQ before posting.
Thanks!
Hello,
After running all the above programs correctly as you suggested,I have trained the ego-model and successfully collected the
nocrash
data about 186GB, then what I need to do is label Q. So I run with$python -m rails.data_phase2 --num-workers=4
, it shows as follows:And I have checked my GPU, and it shows the ray and running now.
When I tried
CTRL+C
, it shows:It seems it just waiting now? (But we donot need to launch carla in this phase)
And the
data-dir
is set to the collected data direction, theconfig.yaml
is set to the no-crash config.( default='/home/shy/Desktop/WorldOnRails/experiments/config_nocrash.yaml')), just cp
config.yaml
to experiments file. Thanks a lot!The text was updated successfully, but these errors were encountered: