Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CARLA: modeld crashes on startup #23666

Closed
maykonpacheco opened this issue Jan 30, 2022 · 17 comments · Fixed by #24271
Closed

CARLA: modeld crashes on startup #23666

maykonpacheco opened this issue Jan 30, 2022 · 17 comments · Fixed by #24271
Assignees
Labels
good first issue Feasible for new contributers PC Issues related to running openpilot on PC

Comments

@maykonpacheco
Copy link
Contributor

Describe the bug

As CARLA is open and I try to start the cruise by pressing key 1, an error message appears. openpilot Unavailable Interprocess communication problem

How to reproduce the error.
Open a terminal and run the command
cd tools/sim ./start_carla.sh

Open another terminal and run the command
cd tools/sim ./start_openpilot_docker.sh

Press 1 a few times while focused on bridge.py

image
image

OS Version

Ubuntu 20.04

openpilot version or commit

a584436

Additional info

No response

@maykonpacheco maykonpacheco added the PC Issues related to running openpilot on PC label Jan 30, 2022
@pd0wm
Copy link
Contributor

pd0wm commented Jan 31, 2022

Looks like modeld crashes.

Onnx using  ['CUDAExecutionProvider', 'CPUExecutionProvider']
ready to run onnx model ['input_imgs', 'desire', 'traffic_convention', 'initial_state'] [[1, 12, 128, 256], [1, 8], [1, 2], [1, 512]]
Traceback (most recent call last):
  File "/home/batman/openpilot/selfdrive/modeld/runners/onnx_runner.py", line 60, in <module>
    run_loop(ort_session)
  File "/home/batman/openpilot/selfdrive/modeld/runners/onnx_runner.py", line 34, in run_loop
    inputs.append(read(ts).reshape(shp))
  File "/home/batman/openpilot/selfdrive/modeld/runners/onnx_runner.py", line 17, in read
    assert(len(st) > 0)
AssertionError

@pd0wm pd0wm changed the title Error when pressing key 1 to start the cruise in CARLA CARLA: modeld crashes on startup Jan 31, 2022
@pd0wm pd0wm added the good first issue Feasible for new contributers label Jan 31, 2022
@maykonpacheco
Copy link
Contributor Author

Onnx available providers: ['CPUExecutionProvider'] Onnx selected provider: ['CPUExecutionProvider'] Traceback (most recent call last): File "/home/maykonpacheco/Develop/openpilot/selfdrive/modeld/runners/onnx_runner.py", line 59, in <module> ort_session = ort.InferenceSession(sys.argv[1], options, providers=[provider]) IndexError: list index out of range

@jackhong12
Copy link
Contributor

jackhong12 commented Feb 18, 2022

I guess the reason is caused by libzmq. After installing the newest version of libzmq, modeld can work properly.

@maykonpacheco
Copy link
Contributor Author

This is the version that is running here, but the problem still persists.
libzmq3-dev is already the newest version (4.3.2-2ubuntu1).
image

@jackhong12
Copy link
Contributor

jackhong12 commented Feb 18, 2022

Actually, I install libczmq from source code.
https://github.com/zeromq/libzmq

@maykonpacheco
Copy link
Contributor Author

@jackhong12 Could you describe in more detail how you do this? :)

@jackhong12
Copy link
Contributor

jackhong12 commented Feb 19, 2022

@maykonpacheco You can press ctrl-c to terminate bridge.py and launch_openplot in the docker. Then, follow steps below to reinstall libzmq.

  1. remove libzmq library
apt purge libzmq* -y
  1. compile and install from source code
git clone https://github.com/zeromq/libzmq.git /libzmq
cd /libzmq
./autogen.sh
./configure
make
make install
ldconfig
  1. run bridge.py and launch_openpilot.sh again

@maykonpacheco
Copy link
Contributor Author

Even doing all these steps, it's still giving the same problem. =/

image

@maykonpacheco
Copy link
Contributor Author

This is the error I have in bash 0
image

@maykonpacheco
Copy link
Contributor Author

I solved the problem by doing the libzmq library installation step by step, but doing it using both bash

@ebadi
Copy link
Contributor

ebadi commented Mar 16, 2022

Hi,
This comment used to work for me a few weeks ago but not anymore!
image
image

Has there been any change in the other part that broke this on master branch?

@maykonpacheco maykonpacheco reopened this Mar 16, 2022
@maykonpacheco
Copy link
Contributor Author

It stopped working for me too, I'm trying to find out why

@ebadi
Copy link
Contributor

ebadi commented Mar 17, 2022

Thanks @maykonpacheco , However this seems to be a new issue and not related to libzmq library. I therefore opened a new issue.

@jyoung8607
Copy link
Collaborator

jyoung8607 commented Mar 17, 2022

This isn't specific to CARLA, #23695 is a better description of the issue but doesn't show the earlier assert.

@jackhong12 is correct that it traces back to libzmq. Here's the sequence of events:

  1. Due to Assertion failed: (src/mailbox.cpp:99) zeromq/libzmq#3313, the current ubuntu package for libzmq was built without support for fork().
  2. Since it doesn't know about fork(), nothing happens to close ZMQ sockets after fork/exec calls.
  3. The ONNX model runner (and only the ONNX runner, not SNPE) forks a child process to actually run the model.
  4. The parent _modeld process asserts in libzmq because there are now two processes with the same ZMQ socket.
  5. The child onnx_runner.py process asserts because its parent isn't sending data.

The fix is conceptually simple, upgrade libzmq to current. Unfortunately the ubuntu package is a couple versions out of date, so we'd have to roll our own somewhere (I guess ubuntu_setup.sh, and also maybe the base openpilot or CARLA container Dockerfiles? I'm actually not sure what pulls libzmq5 into the sim docker). Perhaps a friendly nudge to the Ubuntu package maintainers would be in order.

Launching modeld alone:
Note the two separate assert events, one in the parent and one in the child.

root@veedub-hq:/openpilot/selfdrive/modeld# ./modeld
./modeld: 3: Bad substitution
platform[0] CL_PLATFORM_NAME: Intel(R) CPU Runtime for OpenCL(TM) Applications
vendor: Intel(R) Corporation
platform version: OpenCL 2.1 LINUX
profile: FULL_PROFILE
extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer cl_intel_vec_len_hint 
name :Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz
device version :OpenCL 2.1 (Build 0)
max work group size :8192
type = 2 = CL_DEVICE_TYPE_CPU
selfdrive/modeld/modeld.cc: models loaded, modeld starting
Assertion failed: ok (src/mailbox.cpp:99)
Aborted (core dumped)
root@veedub-hq:/openpilot/selfdrive/modeld# Onnx available providers:  ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
Onnx selected provider:  ['CUDAExecutionProvider']
2022-03-17 13:48:37.745188106 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
Onnx using  ['CPUExecutionProvider']
ready to run onnx model ['input_imgs', 'big_input_imgs', 'desire', 'traffic_convention', 'initial_state'] [[1, 12, 128, 256], [1, 12, 128, 256], [1, 8], [1, 2], [1, 512]]
Traceback (most recent call last):
  File "/openpilot/selfdrive/modeld/runners/onnx_runner.py", line 60, in <module>
    run_loop(ort_session)
  File "/openpilot/selfdrive/modeld/runners/onnx_runner.py", line 34, in run_loop
    inputs.append(read(ts).reshape(shp))
  File "/openpilot/selfdrive/modeld/runners/onnx_runner.py", line 17, in read
    assert(len(st) > 0)
AssertionError

@maykonpacheco
Copy link
Contributor Author

@jyoung8607
I understand that we need to add a newer version of libzmq, but when we follow the @jackhong12 jackhong12 step, we still have the same problem.

@Ming2888
Copy link

I have the same problem too. Openpilot, while engaged, is not accelerating in the CARLA simulator (engaged: True ; throttle: 0.0)

@ebadi
Copy link
Contributor

ebadi commented Mar 18, 2022

@jyoung8607 this pull request solves the issue with libzmq: https://github.com/commaai/openpilot/pull/23792/files

However we are now dealing with a new problem and therefore I openned a new issue (#23985)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Feasible for new contributers PC Issues related to running openpilot on PC
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants