-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"EOFError" during training for both Matterport and Gibson datasets #26
Comments
i also get the same error when i use Gibson datasets to train the model. I don't know how to solve this error. I0126 20:40:37.990522 8736 Simulator.cpp:42] Deconstructing Simulator
I0126 20:40:37.990520 8739 Simulator.cpp:42] Deconstructing Simulator
I0126 20:40:37.990571 8737 Simulator.cpp:42] Deconstructing Simulator
I0126 20:40:37.990518 8738 Simulator.cpp:42] Deconstructing Simulator
I0126 20:40:37.994117 8736 SemanticScene.h:40] Deconstructing SemanticScene
I0126 20:40:37.994128 8739 SemanticScene.h:40] Deconstructing SemanticScene
I0126 20:40:37.994138 8736 SceneManager.h:24] Deconstructing SceneManager
I0126 20:40:37.994138 8739 SceneManager.h:24] Deconstructing SceneManager
I0126 20:40:37.994135 8737 SemanticScene.h:40] Deconstructing SemanticScene
I0126 20:40:37.994141 8736 SceneGraph.h:20] Deconstructing SceneGraph
I0126 20:40:37.994143 8739 SceneGraph.h:20] Deconstructing SceneGraph
I0126 20:40:37.994148 8737 SceneManager.h:24] Deconstructing SceneManager
I0126 20:40:37.994151 8737 SceneGraph.h:20] Deconstructing SceneGraph
I0126 20:40:37.994186 8738 SemanticScene.h:40] Deconstructing SemanticScene
I0126 20:40:37.994195 8738 SceneManager.h:24] Deconstructing SceneManager
I0126 20:40:37.994202 8738 SceneGraph.h:20] Deconstructing SceneGraph
I0126 20:40:38.046005 8737 Renderer.cpp:38] Deconstructing Renderer
I0126 20:40:38.075238 8737 WindowlessContext.cpp:240] Deconstructing GL context
I0126 20:40:38.085269 8736 Renderer.cpp:38] Deconstructing Renderer
I0126 20:40:38.085384 8736 WindowlessContext.cpp:240] Deconstructing GL context
I0126 20:40:38.087087 8739 Renderer.cpp:38] Deconstructing Renderer
I0126 20:40:38.087186 8739 WindowlessContext.cpp:240] Deconstructing GL context
I0126 20:40:38.092942 8738 Renderer.cpp:38] Deconstructing Renderer
I0126 20:40:38.093048 8738 WindowlessContext.cpp:240] Deconstructing GL context
Process ForkServerProcess-4:
Process ForkServerProcess-3:
Process ForkServerProcess-2:
Process ForkServerProcess-1:
Killed
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/media/mhy/data/neural_slam/navigation_via_reinforcement_learning/env/habitat/habitat_api/habitat/core/vector_env.py", line 208, in _worker_env
command, data = connection_read_fn()
Traceback (most recent call last):
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
raise EOFError
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
EOFError
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/media/mhy/data/neural_slam/navigation_via_reinforcement_learning/env/habitat/habitat_api/habitat/core/vector_env.py", line 208, in _worker_env
command, data = connection_read_fn()
File "/media/mhy/data/neural_slam/navigation_via_reinforcement_learning/env/habitat/habitat_api/habitat/core/vector_env.py", line 208, in _worker_env
command, data = connection_read_fn()
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
raise EOFError
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
EOFError
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/media/mhy/data/neural_slam/navigation_via_reinforcement_learning/env/habitat/habitat_api/habitat/core/vector_env.py", line 208, in _worker_env
command, data = connection_read_fn()
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/mhy/anaconda3/envs/neural_slam/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError |
Try the solution here. I have had the same problem and I observed that it was due to out of CPU memory. I fix this by closing the |
@ZhuFengdaaa Thank you for the information. I'll try both your suggestion and the habitat-sim update (in two different and isolated environments). |
Thank you for ur reply, i will try to solve this problem via method you have provided!!! |
@ZhuFengdaaa Following your suggestions we are investigating the origin of the problem. As mentioned on your linked issue, we analyzed the output of the dmesg command after the error. It seems to be related to the virtual memory (RAM). During the latest run, we also monitored the GPU memory and the usage never reached critical values. [20192.302390] [ 3865] 1000 3865 3860896 643675 9486336 122973 0 python |
A way to reduce the memory footprint during training is to reduce the memory size for training the Neural SLAM module using the Please update the thread if you found a solution. |
Hi all, I successfully installed NeuralSlam, and I can run the test code. Everything seems to work.
Since I want to train the Network with different datasets, I downloaded Matterport3D and Gibson.
Following your tutorial, with minor changes, I can run the training code with both datasets but at a certain (random) episode i receive the following error message:
Since it happens with both datasets, I investigated the memory consumption ( my computer has two Nvidia GeForce RTX 2080 TI): the memory used during the training phase was acceptable, even when the error occurred.
I tried different configurations, in particular, the last one i used for Gibson is the following: --exp_name gibson_orig --save_periodic 2500 --slam_memory_size 100000 --exp_name gibson_orig --save_periodic 2500 --slam_memory_size 100000
Do you have any suggestions?
I really appreciate any help you can provide.
The text was updated successfully, but these errors were encountered: