Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic with IllegalAddress error in scan kernel #1

Open
dssgabriel opened this issue Jun 14, 2023 · 0 comments
Open

Panic with IllegalAddress error in scan kernel #1

dssgabriel opened this issue Jun 14, 2023 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@dssgabriel
Copy link
Collaborator

dssgabriel commented Jun 14, 2023

Description

When running the scan algorithm benchmark, the program panics with an IllegalAddress error while trying to synchronize the first call to the scan kernel (before doing any recursion).

Expected behavior

The scan kernel should not cause an IllegalAddress panic and it should compute the correct result.

Current behavior

The program panics with the following error:

thread 'main' panicked at 'failed to synchronize kernel `scan` at depth 0: IllegalAddress', src/kernels.rs:163:14

The relevant code snippet is the following:

HARP/src/kernels.rs

Lines 147 to 163 in 93c9643

// Launch first step of the kernel
unsafe {
launch!(
scan_kernel<<<grid_size, block_size, smem_size * size_of::<i32>() as u32, stream>>>(
d_in.as_device_ptr(),
d_in.len(),
d_out.as_device_ptr(),
block_sums.as_device_ptr(),
max_elems_per_block,
smem_size
)
)
.expect("failed to launch kernel `scan`");
}
stream
.synchronize()
.expect(format!("failed to synchronize kernel `scan` at depth {depth}").as_str());

Additional error information

When running with RUST_BACKTRACE=1, the call stack is the following:

thread 'main' panicked at 'failed to synchronize kernel `scan` at depth 0: IllegalAddress', src/kernels.rs:163:14
stack backtrace:
   0: rust_begin_unwind
             at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/core/src/result.rs:1651:5
   3: harp::kernels::device::scan
   4: harp::drivers::device::cuda_scan
   5: harp::drivers::scan
   6: harp::main
For the full backtrace:
thread 'main' panicked at 'failed to synchronize kernel `scan` at depth 0: IllegalAddress', src/kernels.rs:163:1
4
stack backtrace:
   0:     0x55f9f71d3891 - std::backtrace_rs::backtrace::libunwind::trace::h4e5cd7155e2ebaac
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/../../backtrac
e/src/backtrace/libunwind.rs:93:5
   1:     0x55f9f71d3891 - std::backtrace_rs::backtrace::trace_unsynchronized::hb4d504f8def07b70
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/../../backtrac
e/src/backtrace/mod.rs:66:5
   2:     0x55f9f71d3891 - std::sys_common::backtrace::_print_fmt::h270ee65403a6a640
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/sys_common/bac
ktrace.rs:65:5
   3:     0x55f9f71d3891 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hf
a127bbe4d370ae8
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/sys_common/bac
ktrace.rs:44:22
   4:     0x55f9f71f60cf - core::fmt::rt::Argument::fmt::h975c0825ea1bb836
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/core/src/fmt/rt.rs:138
:9
   5:     0x55f9f71f60cf - core::fmt::write::hb200bbda235147d0
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/core/src/fmt/mod.rs:10
94:21
   6:     0x55f9f71d1691 - std::io::Write::write_fmt::hf4eeaa80392fd692
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/io/mod.rs:1713
:15
   7:     0x55f9f71d36a5 - std::sys_common::backtrace::_print::h2427e2e0721aca68
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/sys_common/bac
ktrace.rs:47:5
   8:     0x55f9f71d36a5 - std::sys_common::backtrace::print::h8c074174f5a65b94
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/sys_common/bac
ktrace.rs:34:9
   9:     0x55f9f71d4b67 - std::panicking::default_hook::{{closure}}::habdecd03f278805d
  10:     0x55f9f71d4954 - std::panicking::default_hook::hfeee4c9ec6e7984a
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/panicking.rs:2
88:9
  11:     0x55f9f71d501c - std::panicking::rust_panic_with_hook::h50748255142a0809
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/panicking.rs:7
05:13
  12:     0x55f9f71d4f17 - std::panicking::begin_panic_handler::{{closure}}::h1532befb1017034b
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/panicking.rs:5
97:13
  13:     0x55f9f71d3cc6 - std::sys_common::backtrace::__rust_end_short_backtrace::h36f919598d3260ac
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/sys_common/bac
ktrace.rs:151:18
  14:     0x55f9f71d4c62 - rust_begin_unwind
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/panicking.rs:593:5
  15:     0x55f9f70f9703 - core::panicking::panic_fmt::h637089c9b9878b43
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/core/src/panicking.rs:67:14
  16:     0x55f9f70f9b43 - core::result::unwrap_failed::h18e2f5da912951f3
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/core/src/result.rs:1651:5
  17:     0x55f9f7131c3d - harp::kernels::device::scan::h847138cb3248648f
  18:     0x55f9f71028fe - harp::drivers::device::cuda_scan::h8108c82dbccdbc57
  19:     0x55f9f71270f4 - harp::drivers::scan::h89ef3c614e92ed26
  20:     0x55f9f713c4c8 - harp::main::h8849d6b7566eb6a4
  21:     0x55f9f7118d13 - std::sys_common::backtrace::__rust_begin_short_backtrace::h5177598e656e9a5e
  22:     0x55f9f7118d29 - std::rt::lang_start::{{closure}}::h6d198479f9d90738
  23:     0x55f9f71cc225 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hea64a749880f8ff2
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/core/src/ops/function.rs:284:13
  24:     0x55f9f71cc225 - std::panicking::try::do_call::h767f64e3f6e064fb
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/panicking.rs:500:40
  25:     0x55f9f71cc225 - std::panicking::try::h90cf534a1e5ea4ae
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/panicking.rs:464:19
  26:     0x55f9f71cc225 - std::panic::catch_unwind::h9bac3c528abd1cb9
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/panic.rs:142:14
  27:     0x55f9f71cc225 - std::rt::lang_start_internal::{{closure}}::h8709a6d2fd226842
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/rt.rs:148:48
  28:     0x55f9f71cc225 - std::panicking::try::do_call::h1408f9ff8d60cf9d
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/panicking.rs:500:40
  29:     0x55f9f71cc225 - std::panicking::try::hc2659d179f01b076
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/panicking.rs:464:19
  30:     0x55f9f71cc225 - std::panic::catch_unwind::h8e83755629085503
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/panic.rs:142:14
  31:     0x55f9f71cc225 - std::rt::lang_start_internal::hd3b3887afec46100
                               at /rustc/371994e0d8380600ddda78ca1be937c7fb179b49/library/std/src/rt.rs:148:20
  32:     0x55f9f713c4f5 - main
  33:     0x7f9431229d90 - __libc_start_call_main
                               at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  34:     0x7f9431229e40 - __libc_start_main_impl
                               at ./csu/../csu/libc-start.c:392:3
  35:     0x55f9f70f9e05 - _start
  36:                0x0 - <unknown>

Steps to reproduce

  1. git checkout scan-illegal-address
  2. cargo run --release -- iscan --lengths <VECTOR_LENGTH>
  3. The program panics with the error described above

Environment

OS: Ubuntu 22.04
Kernel: 5.19.0-43
Toolchains:

  • rustc 1.72.0-nightly (371994e0d 2023-06-13)
  • gcc 11.3.0
  • nvcc 12.1.105 (CUDA SDK 12.1)

Additional general information

The equivalent C++ code in the cpp_scan directory does not crash with a similar error. However, it does not produce the expected result either as it does not seem to update the subsequent thread blocks with the computed partial sums.

To run the code:

cd cpp_scan
make run
@dssgabriel dssgabriel self-assigned this Jun 14, 2023
@dssgabriel dssgabriel added the bug Something isn't working label Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant