Only one valid platform is required to run AI2-THOR #374

YYDS-cc · 2023-08-31T12:42:52Z

When I run command

PYTHONPATH=. python allenact/main.py training_a_pointnav_model -o storage/robothor-pointnav-rgb-resnet-resnet -b projects/tutorials

on a remote server with an attached display, I get error

Exception: The following builds were found, but had missing dependencies. Only one valid platform is required to run AI2-THOR.
Platform Linux64 failed validation with the following errors: Invalid display: :0.0. Failed to connect Can't connect to display ":0.0": b'No protocol specified\n'
Linux64 requires a X11 server to be running with GLX. The following valid displays were found :13.0

How can I solve this issue?
plz help me,
thanks!

YYDS-cc · 2023-08-31T12:52:04Z

Additionally, when I run
python main.py object_nav_ithor_ppo_one_object -b projects/tutorials -s 12345
the monitor goes black momentarily, I know this is to open the search window, but after the monitor is back up, the terminal's info is no longer updated.
I have also run
sudo python scripts/startx.py &
but it doesn't do anything.

jordis-ai2 · 2023-08-31T16:39:30Z

Hi @YDDS-cc,

Given your setup, I think it would be worth it to try using THOR in headless mode. For that, you need to pass a gpu_device instead of an x_display (using the CloudRendering platform). You can see an example here:

allenact/projects/objectnav_baselines/experiments/objectnav_thor_base.py

Line 255 in 9772eee

device_dict = dict(

Let us know if this unblocked you!

YYDS-cc · 2023-09-01T03:40:27Z

Hi @jordis-ai2 ,
i try to change the headless to True, it doesn't working.

allenact/projects/objectnav_baselines/experiments/objectnav_thor_base.py

Line 75 in 9772eee

headless: bool = False,

And i also try to comment out these code, It's still not working.

allenact/projects/objectnav_baselines/experiments/objectnav_thor_base.py

Line 236 in 9772eee

if not self.headless:

Did I change the code in the wrong place?

jordis-ai2 · 2023-09-01T07:40:07Z

I think it I need to see the output you get when using headless mode. Can you copy it here?

YYDS-cc · 2023-09-01T09:46:47Z

[09/01 17:24:13 INFO:] Running with args Namespace(approx_ckpt_step_interval=None, ... ,[main.py: 452]
[09/01 17:24:18 INFO:] Git diff saved to experiment_output/used_configs/ObjectNavThorPPO/2023-09-01_17-24-15 [runner.py: 890]
[09/01 17:24:18 INFO:] Config files saved to experiment_output/used_configs/ObjectNavThorPPO/2023-09-01_17-24-15 [runner.py: 935]
[09/01 17:24:18 INFO:] Using 1 train workers on devices (device(type='cuda', index=0),) [runner.py: 317]
[09/01 17:24:19 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:19 INFO:] Using local worker ids [0] (total 1 workers in machine 0) [runner.py: 326]
[09/01 17:24:19 INFO:] Started 1 train processes [runner.py: 595]
[09/01 17:24:19 INFO:] Using 1 valid workers on devices (device(type='cuda', index=1),) [runner.py: 317]
[09/01 17:24:19 INFO:] Started 1 valid processes [runner.py: 622]
[09/01 17:24:21 INFO:] valid 0 args [...][runner.py: 433]
[09/01 17:24:21 INFO:] train 0 args [...] [runner.py: 416]
[09/01 17:24:22 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:22 INFO:] there are 1 belief models: ['single_belief'] [visual_nav_models.py: 116]
[09/01 17:24:29 INFO:] Starting 0-th VectorSampledTask worker with args [...]
[09/01 17:24:31 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args [...]
[09/01 17:24:31 INFO:] Starting 1-th VectorSampledTask worker with args [...]
[09/01 17:24:33 INFO:] Starting 0-th SingleProcessVectorSampledTasks generator with args [...]
[09/01 17:29:33 ERROR:] [train worker 0 ] Encountered TimeoutError , exiting. [engine.py: 1858]
File "/allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", Line 272,in read_with_timeout
raise TimeError(
TimeouError: Did not receive output from 'VectorSampledTask' worker for 300 seconds.
[engine.py: 1861]
[09/01 17:29:34 ERROR:] Encountered Exception. Terminating runner. [runner.py: 1467]
[09/01 17:29:34 ERROR:] Traceback (most recent call last):
File "/allenact/allenact/algorithms/onpolicy_sync/runner.py", line 1434, in log_and_close
raise Exception(
Exception: Train worker 0 abnormally terminated
[runner.py: 1468]
Traceback (most recent call last):
File "/allenact/allenact/algorithms/onpolicy_sync/runner.py", line 1434, in log_and_close
raise Exception(
Exception: Train worker 0 abnormally terminated
[09/01 17:29:34 INFO:] Terminating train 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Terminating valid 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Termination signal sent to worker Train-0. Worker Train-0 is already closed, exiting. [runner.py: 348]
[09/01 17:29:34 INFO:] Joining train 0 [runner.py: 1543]
[09/01 17:29:34 INFO:] Termination signal sent to worker Valid-0. Forcing worker Valid-0 to close and exiting. [runner.py: 353]
[09/01 17:29:35 INFO:] Closed train 0 [runner.py: 1543]
[09/01 17:29:35 INFO:] Joining valid 0 [runner.py: 1543]
[09/01 17:29:35 INFO:] Closed valid 0 [runner.py: 1543]

jordis-ai2 · 2023-09-01T10:32:42Z

If you do export ALLENACT_DEBUG_VST_TIMEOUT=1000 before calling the command you are currently using to start your experiment, does it also fail (just after a longer period of waiting)?

YYDS-cc · 2023-09-01T11:16:21Z

Changing the waiting time doesn't work.
Actually, export ALLENACT_DEBUG_VST_TIMEOUT=1000 can't change the waiting time, it is still 300 seconds.
So I made the change in

allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py

Line 237 in 9772eee

    
           for space in read_fn(timeout_to_use=5 * self.read_timeout if self.read_timeout is not None else None)  # type: ignore

and I still get the same error, only the waiting time has changed.

jordis-ai2 · 2023-09-01T11:41:32Z

I assume at this point you must have already tried starting a standalone THOR controller to ensure everything is correctly installed, but just in case you haven't, can you try to run a script like:

from ai2thor.platform import CloudRendering
from ai2thor.controller import Controller
import cv2

c = Controller(platform=CloudRendering, gpu_device=0)
cv2.imwrite("/path/to/debug_output_image.png", c.last_event.frame[:,:,::-1])
c.stop()

?

YYDS-cc · 2023-09-01T12:19:26Z

The new code install the thor-CloudRendering platform and come a new issue, i meet the issue before when i run the PointNav task with command PYTHONPATH=. python allenact/main.py training_a_pointnav_model -o storage/robothor-pointnav-rgb-resnet-resnet -b projects/tutorials .

issue: RuntimeError: vulkaninfo failed to run, please ask your administrator to install vulkaninfo (e.g. on Ubuntu systems this requires running sudo apt install vulkan-tools).

But when i run the command sudo apt install vulkan-tools,
the server can't locate the package vulkan-tools
After using the sudo apt-get update, it still doesn't work.

I installed the same environment on my PC according to the tutorial (ubuntu18.04), both PointNav Task and ObjectNav Task have no problem.

jordis-ai2 · 2023-09-04T16:24:27Z

https://packages.ubuntu.com/search?keywords=vulkan-tools has a list of packages for different Ubuntu versions. It's possible that third parties provide vulkan-tools for other/older versions.

It sounds like this is out-of-scope for AllenAct, so I'm closing the issue.

jordis-ai2 closed this as completed Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only one valid platform is required to run AI2-THOR #374

Only one valid platform is required to run AI2-THOR #374

YYDS-cc commented Aug 31, 2023

YYDS-cc commented Aug 31, 2023

jordis-ai2 commented Aug 31, 2023

YYDS-cc commented Sep 1, 2023

jordis-ai2 commented Sep 1, 2023 •

edited

Loading

YYDS-cc commented Sep 1, 2023

jordis-ai2 commented Sep 1, 2023

YYDS-cc commented Sep 1, 2023

jordis-ai2 commented Sep 1, 2023

YYDS-cc commented Sep 1, 2023

jordis-ai2 commented Sep 4, 2023

Only one valid platform is required to run AI2-THOR #374

Only one valid platform is required to run AI2-THOR #374

Comments

YYDS-cc commented Aug 31, 2023

YYDS-cc commented Aug 31, 2023

jordis-ai2 commented Aug 31, 2023

YYDS-cc commented Sep 1, 2023

jordis-ai2 commented Sep 1, 2023 • edited Loading

YYDS-cc commented Sep 1, 2023

jordis-ai2 commented Sep 1, 2023

YYDS-cc commented Sep 1, 2023

jordis-ai2 commented Sep 1, 2023

YYDS-cc commented Sep 1, 2023

jordis-ai2 commented Sep 4, 2023

jordis-ai2 commented Sep 1, 2023 •

edited

Loading