Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Resizing terminal causes TVM RPC segfault #17063

Open
happyme531 opened this issue Jun 4, 2024 · 0 comments
Open

[Bug] Resizing terminal causes TVM RPC segfault #17063

happyme531 opened this issue Jun 4, 2024 · 0 comments
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug

Comments

@happyme531
Copy link

happyme531 commented Jun 4, 2024

As the title said, when I use TVM MetaSchdule and RPC to run tuning on another device, when I resize the terminal of host tuning proccess, a RPC runner process on host will immediately segfault.

Expected behavior

TVM won't segfault.

Actual behavior

2024-06-04 21:24:22 [INFO] [task_scheduler.cc:180] TaskScheduler picks Task #112: "conv2d21"
2024-06-04 21:24:31 [INFO] [task_scheduler.cc:193] Sending 64 sample(s) to builder
!!!!!!! TVM encountered a Segfault !!!!!!!
Stack trace:
  0: tvm::runtime::(anonymous namespace)::backtrace_handler(int)
        at /home/zt/rk3588-nn/tvm/src/runtime/logging.cc:214
  1: 0x00007f925569fadf
  2: tvm::runtime::EnvCAPIRegistry::CheckSignals()
        at /home/zt/rk3588-nn/tvm/src/runtime/registry.cc:186
  3: long tvm::support::RetryCallOnEINTR<tvm::support::TCPSocket::Recv(void*, unsigned long, int)::{lambda()#1}, int (*)()>(tvm::support::TCPSocket::Recv(void*, unsigned long, int)::{lambda()#1}, int (*)())
        at /home/zt/rk3588-nn/tvm/src/runtime/rpc/../../support/errno_handling.h:58
  4: tvm::support::TCPSocket::Recv(void*, unsigned long, int)
        at /home/zt/rk3588-nn/tvm/src/runtime/rpc/../../support/socket.h:481
  5: tvm::runtime::SockChannel::Recv(void*, unsigned long)
        at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_socket_impl.cc:56
  6: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)::$_1::operator()(void*, unsigned long) const
        at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:705
  7: unsigned long tvm::support::RingBuffer::WriteWithCallback<tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)::$_1>(tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)::$_1, unsigned long)
        at /home/zt/rk3588-nn/tvm/src/runtime/rpc/../../support/ring_buffer.h:174
  8: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)
        at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:704
  9: tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)
        at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:870
  10: tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
        at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:1087
  11: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_module.cc:129

2024-06-04 21:24:42 [INFO] [task_scheduler.cc:195] Sending 64 sample(s) to runner

Environment

Host:
Manjaro Linux 24.0.1
TVM master branch 78a1f80

Device:
RK3588 ARM SoC
Debian 11
TVM master branch 78a1f80

Steps to reproduce

# %%
import tvm
from tvm import relay
from tvm import relax
from tvm.relax.frontend.onnx import from_onnx
from tvm.relax.testing import relay_translator
from tvm.driver.tvmc.transform import apply_graph_transforms
import onnx
import tvm.testing
import tvm.topi.testing
from tvm.ir.module import IRModule
from tvm import meta_schedule as ms
import tvm.tir.tensor_intrin.arm_cpu 
from tvm.meta_schedule.runner import (
    EvaluatorConfig,
    LocalRunner,
    PyRunner,
    RPCConfig,
    RPCRunner,
)

# %%
target = tvm.target.Target("llvm -mtriple=aarch64-linux-gnu -mcpu=cortex-a76 -num-cores=1")
onnx_model_path = "yolov5s.onnx" 
shape_dict = {"images": (1, 3, 640, 640)}

# %%
onnx_model = onnx.load(onnx_model_path)
mod0, params = relay.frontend.from_onnx(onnx_model, shape_dict)
mod: IRModule = relay_translator.from_relay(mod0["main"], target, params)
mod = apply_graph_transforms(
    mod,
    {
        "mixed_precision": True,
        "mixed_precision_calculation_type": "float16",
        "mixed_precision_acc_type": "float16",
    },
)
rpc_config = RPCConfig(
    tracker_host="127.0.0.1",
    tracker_port=9190,
    tracker_key="rk3588", 
    session_priority=1,
    session_timeout_sec=10,
)
evaluator_config = EvaluatorConfig(
    number=1,
    repeat=1,
    min_repeat_ms=5,
    enable_cpu_cache_flush=True,
)
runner = RPCRunner(rpc_config, evaluator_config)
database = ms.relax_integration.tune_relax(
    mod=mod,
    params=params,
    target=target,
    max_trials_global=10000, 
    runner=runner,
    work_dir="./work2",
    seed=0
)

# %%
# Compile the best schedule
lib = ms.relay_integration.compile_relay(
    database=database,
    mod=mod,
    params=params,
    target=target,
)

# %%
import tvm.driver.tvmc.model as tvmc_model
model = tvmc_model.TVMCModel(mod, params)
model.export_package(lib, onnx_model_path.replace(".onnx", ".tar"), "aarch64-linux-gnu-gcc") 

Triage

  • core:rpc
@happyme531 happyme531 added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug labels Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug
Projects
None yet
Development

No branches or pull requests

1 participant