Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An EOF error on R2R-CE #13

Open
Bowen-sdu opened this issue Mar 27, 2024 · 7 comments
Open

An EOF error on R2R-CE #13

Bowen-sdu opened this issue Mar 27, 2024 · 7 comments

Comments

@Bowen-sdu
Copy link

When I made fine-tuning on R2R-CE, the program reported an EOF error, I don't know why? The command I executed is

CUDA_VISIBLE-DEVICES=0,1,2,3 bash run_r2r/main.bash train 2333

And the installation environment is consistent with environment.txt.

###### train mode ######
2024-03-27 16:30:19,659 Initializing dataset VLN-CE-v1
2024-03-27 16:30:20,222 SPLTI: train, NUMBER OF SCENES: 61
2024-03-27 16:30:23,015 Initializing dataset VLN-CE-v1
2024-03-27 16:30:23,565 initializing sim Sim-v1
2024-03-27 16:30:36,184 Initializing task VLN-v0
2024-03-27 16:30:36,372 LOCAL RANK: 0, ENV NUM: 1, DATASET LEN: 10819
2024-03-27 16:30:43,860 Agent parameters: 337.67 MB. Trainable: 180.98 MB.
2024-03-27 16:30:43,860 Finished setting up policy.
2024-03-27 16:30:43,863 Traning Starts... GOOD LUCK!
Traceback (most recent call last):
  File "run.py", line 114, in <module>
    main()
  File "run.py", line 50, in main
    run_exp(**vars(args))
  File "run.py", line 107, in run_exp
    trainer.train()
  File "/home/huangbw/navigation/BEVBert/bevbert_ce/vlnce_baselines/ss_trainer_BEV.py", line 666, in train
    logs = self._train_interval(interval, self.config.IL.ml_weight, sample_ratio)  # (200, 1.0, 0.75)
  File "/home/huangbw/navigation/BEVBert/bevbert_ce/vlnce_baselines/ss_trainer_BEV.py", line 698, in _train_interval
    self.rollout('train', ml_weight, sample_ratio)
  File "/home/huangbw/navigation/BEVBert/bevbert_ce/vlnce_baselines/ss_trainer_BEV.py", line 1095, in rollout
    teacher_actions = self._teacher_action_new(nav_inputs['gmap_vp_ids'], no_vp_left)
  File "/home/huangbw/navigation/BEVBert/bevbert_ce/vlnce_baselines/ss_trainer_BEV.py", line 322, in _teacher_action_new
    curr_dis_to_goal = self.envs.call_at(i, "current_dist_to_goal")
  File "/home/huangbw/navigation/habitat-lab/habitat/core/vector_env.py", line 515, in call_at
    result = self._connection_read_fns[index]()
  File "/home/huangbw/navigation/habitat-lab/habitat/core/vector_env.py", line 97, in __call__
    res = self.read_fn()
  File "/home/huangbw/navigation/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 68, in recv
    buf = self.recv_bytes()
  File "/home/huangbw/miniconda3/envs/python36/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/huangbw/miniconda3/envs/python36/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/huangbw/miniconda3/envs/python36/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7f6d72a15cc0>>
Traceback (most recent call last):
  File "/home/huangbw/navigation/habitat-lab/habitat/core/vector_env.py", line 589, in __del__
    self.close()
  File "/home/huangbw/navigation/habitat-lab/habitat/core/vector_env.py", line 456, in close
    read_fn()
  File "/home/huangbw/navigation/habitat-lab/habitat/core/vector_env.py", line 97, in __call__
    res = self.read_fn()
  File "/home/huangbw/navigation/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 68, in recv
    buf = self.recv_bytes()
  File "/home/huangbw/miniconda3/envs/python36/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/huangbw/miniconda3/envs/python36/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/huangbw/miniconda3/envs/python36/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError:
@MarSaKi
Copy link
Owner

MarSaKi commented Mar 27, 2024

Hi, there,
I frankly don't know why this bug happens. It may be an inherited bug from Habitat and I used to meet the same problem.
My solution is to restart the machine or change to another machine.

@Bowen-sdu
Copy link
Author

Thank you very much for your reply. I will try to do so.

Hi, there, I frankly don't know why this bug happens. It may be an inherited bug from Habitat and I used to meet the same problem. My solution is to restart the machine or change to another machine.

@Bowen-sdu
Copy link
Author

Hi, there, I frankly don't know why this bug happens. It may be an inherited bug from Habitat and I used to meet the same problem. My solution is to restart the machine or change to another machine.

Hello, I tried running this script on another server, but the error still occurred. Afterwards, I upgraded the version of habitat sim to 0.2.0. Although this issue was resolved, another error occurred as follows:

######Train mode######
March 30, 2024 11:16:24452 Initiating dataset VLN-CE-v1
2024-03-30 11:16:25056 SPLTI: train, NUMBER OF SCENES: 61
March 30, 2024 11:16:27929 Initiating dataset VLN-CE-v1
March 30, 2024 11:16:28513 initializing sim Sim v1
Warning: Logging before InitGoogleLogging() is written to STDERR
E0330 11:16:28.628017 90056 SemanticScene. h:155]:: loadSemanticScene Descriptor: File data/scene-datasets/mp3d/e9zR4mvMWw7/e9zR4mvMWw7. scn does not exist Aborting load
March 30, 2024 11:16:41681 Initiating task VLN-v0
2024-03-30 11:16:41859 LOCAL RANK: 0, ENV NUM: 1, DATASET LEN: 10819
2024-03-30 11:16:49562 Agent parameters: 337.67 MB Traineable: 180.98 MB
March 30, 2024 11:16:49562 Finished setting up policy
March 30, 2024 11:16:49568 Training Starts GOOD LUCK!
Traceback (most recent call last):
File "run. py", line 114, in<module>
Main()
File "run. py", line 50, in main
Run_exp (* * vars (args))
File "run. py", line 107, in run_exp
Trainer. train()
File "/home/huangbw/navigation/BEVBert/bevbertce/vlnce_baselines/ss_trainer_BEV. py", line 666, in train
Logs=self_ Train_interval (interval, self. config. IL. ml_weight, sample_ratio) # (200, 1.0, 0.75)
File "/home/huangbw/navigation/BEVBert/bevbertce/vlnce_baselines/ss_trainer_BEV. py", line 698, in_train_interval
Self. roll out ('train ', ml_weight, sample_ratio)
File "/home/huangbw/navigation/BEVBert/bevbertce/vlnce_baselines/ss_trainer_BEV. py", line 973, in roll out
Batch=batch_obs (observations, self. device)
File "/home/huangbw/miniconde3/envs/python36/lib/python3.6/site packages/torch/autorad/grad_mode. py", line 28, in decorate_context
Return fun (* args, * * kwargs)
File "/home/huangbw/miniconde3/envs/python36/lib/python3.6/contextlib. py", line 52, in inner
Return function (* args, * * kwds)
File "/home/huangbw/navigation/hat lat lab-0.2.0/hat lat baselines/utils/common. py", line 171, in batch obs
Reverse=True,
File "/home/huangbw/navigation/hat lat lab-0.2.0/hat lat baselines/utils/common. py", line 170, in<lambda>
Else np. prod (obs [name]. shape),
AttributeError: 'list' object has no attribute 'shape'
Exception ignored in:<bound method VectorEnv__ Del__ of<habitat. core. vector env VectorEnv object at 0x7f5e2ed90390>>
Traceback (most recent call last):
File "/home/huangbw/navigation/hat lat lab-0.2.0/hat/core/vector env. py", line 588, in __ del__
Self. close()
File "/home/huangbw/navigation/hat lat lab-0.2.0/hat/core/vector env. py", line 459, in close
Write.fn ((CLOSE-COMMAND, None))
File "/home/huangbw/navigation/hat lat lab-0.2.0/hat/core/vector env. py", line 118, in __ call__
Self. write.fn (data)
File "/home/huangbw/navigation/hat lat lab-0.2.0/hat/utils/pickle5_multiprocessing. py", line 63, in send
Self. send_bytes (buf. getvalue())
File "/home/huangbw/miniconda3/envs/python36/lib/python3.6/multiprocessing/connection. py", line 200, in send_bytes
Self_ Sendbytes (m [offset: offset+size])
File "/home/huangbw/miniconda3/envs/python36/lib/python3.6/multiprocessing/connection. py", line 404, in _send_bytes
Self_ Send (header+buf)
File "/home/huangbw/miniconda3/envs/python36/lib/python3.6/multiprocessing/connection. py", line 368, in _send
N=write (self. _handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

I couldn't find what the data/scene_datasets/mp3d/e9zR4mvMWw7/e9zR4mvMWw7.scn file is. Is it from the MP3D dataset or do I need to extract it myself? I am looking forward to your reply.

@MarSaKi
Copy link
Owner

MarSaKi commented Mar 30, 2024

No, this repo doesn't support Habitat 0.2.0 and it will result in some strange bugs.
I suggest you uncomment "export GLOG_minloglevel=2 export MAGNUM_LOG=quiet" in the bash script to see the detailed bug logs.

@Bowen-sdu
Copy link
Author

Thank you very much for your suggestions and help. The reason for this EOF error is that I did not put the . navmesh file from MP3D into the scene_datasets folder. Now I have solved this problem.

@dongxinfeng1
Copy link

Thank you very much for your suggestions and help. The reason for this EOF error is that I did not put the . navmesh file from MP3D into the scene_datasets folder. Now I have solved this problem.

Hi, I also encounter this problem and can you describe your solution in detail? Thanks a lot!

@Bowen-sdu
Copy link
Author

Thank you very much for your suggestions and help. The reason for this EOF error is that I did not put the . navmesh file from MP3D into the scene_datasets folder. Now I have solved this problem.

Hi, I also encounter this problem and can you describe your solution in detail? Thanks a lot!

Of course. The reason why I encountered this issue before was that I only placed the . glb file in the scene_datasets folder. Please ensure that you place the relevant configuration files for each scan in the scene_datasets folder, including files with suffixes such as . glb,. house,. navmesh, and. ply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants