-
Notifications
You must be signed in to change notification settings - Fork 17
Closed
Description
Hello Authors, thanks for open-source the execllent work. I encountered a problem during evaluating the model on R2R val_unseen.
I setup the envirment follows the instruction, and uses the same evaluation scripts/configs as provided. The evaluation processes well at the begining, but would stop after evaluating 1 or two episodes randomlly.
After checking the output, I found the problem might come from the habitat simulator, which is not from this repo. However, I'm still wondering if you encountered similar problems before. The printed log contains the error messages are as follow:
......
↑↑↑↑<|im_end|>
[18:25:50.242867] actions [1, 1, 1, 1]
[18:26:10.990485] <|im_start|>assistant
↑↑↑←<|im_end|>
[18:26:10.990602] actions [1, 1, 1, 2]
[18:26:31.169344] 64 You are an autonomous navigation assistant. Your task is to Move forward to the doorway on the opposite side of the hall. Stop in the archway. Devise an action sequence to follow the instruction using the four actions: TURN LEFT (←) or TURN RIGHT (→) by 15 degrees, MOVE FORWARD (↑) by 25 centimeters, or STOP. These are your historical observations <memory>.
[18:26:31.730885] <|im_start|>assistant
↑STOP<|im_end|>
[18:26:31.730971] actions [1, 0]
Fatal Python error: Aborted
Thread 0x00007f4d8481b640 (most recent call first):
File ".../miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
File ".../miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
File ".../miniconda3/envs/streamvln/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
File ".../miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
File ".../miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
Thread 0x00007f4d8781c640 (most recent call first):
File ".../miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
File ".../miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
File ".../miniconda3/envs/streamvln/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
File ".../miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
File ".../miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
Current thread 0x00007f58ed2c2740 (most recent call first):
File ".../miniconda3/envs/streamvln/lib/python3.9/site-packages/habitat_sim-0.2.4-py3.9-linux-x86_64.egg/habitat_sim/simulator.py", line 780 in get_observation
File ".../miniconda3/envs/streamvln/lib/python3.9/site-packages/habitat_sim-0.2.4-py3.9-linux-x86_64.egg/habitat_sim/simulator.py", line 458 in get_sensor_observations
File ".../miniconda3/envs/streamvln/lib/python3.9/site-packages/habitat_sim-0.2.4-py3.9-linux-x86_64.egg/habitat_sim/simulator.py", line 522 in step
File ".../VLN_workspace/StreamVLN/habitat-lab/habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py", line 418 in step
File ".../VLN_workspace/StreamVLN/habitat-lab/habitat-lab/habitat/tasks/nav/nav.py", line 1051 in step
File ".../VLN_workspace/StreamVLN/habitat-lab/habitat-lab/habitat/core/embodied_task.py", line 311 in _step_single_action
File ".../VLN_workspace/StreamVLN/habitat-lab/habitat-lab/habitat/core/embodied_task.py", line 333 in step
File ".../VLN_workspace/StreamVLN/habitat-lab/habitat-lab/habitat/core/env.py", line 309 in step
File ".../VLN_workspace/StreamVLN/streamvln/streamvln_eval.py", line 344 in eval_action
File ".../VLN_workspace/StreamVLN/streamvln/streamvln_eval.py", line 553 in evaluate
File ".../VLN_workspace/StreamVLN/streamvln/streamvln_eval.py", line 534 in eval
File ".../VLN_workspace/StreamVLN/streamvln/streamvln_eval.py", line 584 in <module>
[2025-08-11 18:26:52,339] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -6) local_rank: 0 (pid: 649640) of binary: ...//miniconda3/envs/streamvln/bin/python3.9
Traceback (most recent call last):
File "...//miniconda3/envs/streamvln/bin/torchrun", line 8, in <module>
sys.exit(main())
File "...//miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "...//miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "...//miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File ".../miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File .../miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
=======================================================
streamvln/streamvln_eval.py FAILED
-------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
-------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-08-11_18:26:52
host : cudo-gpu-ai-2-cluster-4763e7cd
rank : 0 (local_rank: 0)
exitcode : -6 (pid: 649640)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 649640
=======================================================
Metadata
Metadata
Assignees
Labels
No labels