Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nvmf_device_removal_pci_remove] bdevperf hit by SIGSEGV during device removal #3354

Open
mikeBashStuff opened this issue Apr 25, 2024 · 8 comments
Assignees

Comments

@mikeBashStuff
Copy link
Contributor

CI Intermittent Failure

Probably related to #3345 but without triggering the kernel panic.

https://ci.spdk.io/results/autotest-spdk-master-vs-dpdk-v22.11/builds/1744/archive/nvmf-phy-autotest_70239/index.html

Point of failure:

[2024-04-25T10:13:58.955Z] [2024-04-25 12:13:58.773395] rdma.c:3563:nvmf_rdma_handle_device_removal: *NOTICE*: Port 192.168.100.9:4420 on device mlx5_1 is being removed.
[2024-04-25T10:13:58.955Z] [2024-04-25 12:13:58.773467] rdma_verbs.c: 113:spdk_rdma_qp_disconnect: *ERROR*: rdma_disconnect failed, errno Invalid argument (22)
[2024-04-25T10:13:58.955Z] [2024-04-25 12:13:58.776443] rdma_verbs.c: 113:spdk_rdma_qp_disconnect: *ERROR*: rdma_disconnect failed, errno Invalid argument (22)
[2024-04-25T10:13:58.955Z] [2024-04-25 12:13:58.776461] rdma.c: 845:nvmf_rdma_qpair_destroy: *WARNING*: Destroying qpair when queue depth is 97
[2024-04-25T10:14:03.146Z]   12:14:02	-- target/device_removal.sh@147 -- # seq 1 10
[2024-04-25T10:14:03.146Z]  12:14:02	-- target/device_removal.sh@147 -- # for i in $(seq 1 10)
[2024-04-25T10:14:03.147Z]  12:14:02	-- target/device_removal.sh@148 -- # check_rdma_dev_exists_in_nvmf_tgt mlx5_1
[2024-04-25T10:14:03.147Z]  12:14:02	-- target/device_removal.sh@76 -- # local rdma_dev_name=mlx5_1
[2024-04-25T10:14:03.147Z]  12:14:02	-- target/device_removal.sh@77 -- # rpc_cmd nvmf_get_stats
[2024-04-25T10:14:03.147Z]  12:14:02	-- target/device_removal.sh@77 -- # jq -r '.poll_groups[0].transports[].devices[].name'
[2024-04-25T10:14:03.147Z]  12:14:02	-- target/device_removal.sh@77 -- # grep mlx5_1
[2024-04-25T10:14:03.147Z]  12:14:02	-- common/autotest_common.sh@549 -- # xtrace_disable
[2024-04-25T10:14:03.147Z]  12:14:02	-- common/autotest_common.sh@10 -- # set +x
[2024-04-25T10:14:03.147Z]  12:14:02	-- common/autotest_common.sh@577 -- # [[ 0 == 0 ]]
[2024-04-25T10:14:03.147Z]  12:14:02	-- target/device_removal.sh@78 -- # return 1
[2024-04-25T10:14:03.147Z]  12:14:02	-- target/device_removal.sh@149 -- # break
[2024-04-25T10:14:03.147Z]   12:14:02	-- target/device_removal.sh@158 -- # get_rdma_dev_count_in_nvmf_tgt
[2024-04-25T10:14:03.147Z]   12:14:02	-- target/device_removal.sh@82 -- # local rdma_dev_name=
[2024-04-25T10:14:03.147Z]   12:14:02	-- target/device_removal.sh@83 -- # rpc_cmd nvmf_get_stats
[2024-04-25T10:14:03.147Z]   12:14:02	-- target/device_removal.sh@83 -- # jq -r '.poll_groups[0].transports[].devices | length'
[2024-04-25T10:14:03.147Z]   12:14:02	-- common/autotest_common.sh@549 -- # xtrace_disable
[2024-04-25T10:14:03.147Z]   12:14:02	-- common/autotest_common.sh@10 -- # set +x
[2024-04-25T10:14:03.147Z]   12:14:02	-- common/autotest_common.sh@577 -- # [[ 0 == 0 ]]
[2024-04-25T10:14:03.147Z]  12:14:02	-- target/device_removal.sh@158 -- # ib_count_after_remove=1
[2024-04-25T10:14:03.147Z]  12:14:02	-- target/device_removal.sh@160 -- # rescan_pci
[2024-04-25T10:14:03.147Z]  12:14:02	-- target/device_removal.sh@57 -- # echo 1
[2024-04-25T10:14:03.713Z] Connection closed with partial response:
[2024-04-25T10:14:03.713Z] 
[2024-04-25T10:14:03.713Z] 
[2024-04-25T10:14:04.648Z] [2024-04-25 12:14:04.477108] rdma.c:3252:nvmf_rdma_rescan_devices: *WARNING*: Failed to init ibv device 0x1e9ff60, err 11. Skip rescan.
[2024-04-25T10:14:04.648Z] test/nvmf/target/device_removal.sh: line 163: 1583737 Segmentation fault      (core dumped) $rootdir/build/examples/bdevperf -m $bdevperf_core_mask -z -r $bdevperf_rpc_sock -q 128 -o 4096 -w verify -t 90 &> $testdir/try.txt
[2024-04-25T10:14:04.648Z]   12:14:04	-- target/device_removal.sh@162 -- # seq 1 10
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@162 -- # for i in $(seq 1 10)
[2024-04-25T10:14:04.648Z]   12:14:04	-- target/device_removal.sh@163 -- # ls /sys/devices/pci0000:ae/0000:ae:02.0/0000:af:00.1/net
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@163 -- # new_net_dev=mlx_0_1
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@164 -- # [[ -z mlx_0_1 ]]
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@166 -- # [[ mlx_0_1 != \m\l\x\_\0\_\1 ]]
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@171 -- # break
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@175 -- # [[ -z mlx_0_1 ]]
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@179 -- # ip link set mlx_0_1 up
[2024-04-25T10:14:04.648Z]   12:14:04	-- target/device_removal.sh@180 -- # get_ip_address mlx_0_1
[2024-04-25T10:14:04.648Z]   12:14:04	-- nvmf/common.sh@112 -- # interface=mlx_0_1
[2024-04-25T10:14:04.648Z]   12:14:04	-- nvmf/common.sh@113 -- # ip -o -4 addr show mlx_0_1
[2024-04-25T10:14:04.648Z]   12:14:04	-- nvmf/common.sh@113 -- # awk '{print $4}'
[2024-04-25T10:14:04.648Z]   12:14:04	-- nvmf/common.sh@113 -- # cut -d/ -f1
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@180 -- # [[ -z '' ]]
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@181 -- # ip addr add 192.168.100.9/24 dev mlx_0_1
[2024-04-25T10:14:04.648Z]   12:14:04	-- target/device_removal.sh@186 -- # seq 1 10
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@186 -- # for i in $(seq 1 10)
[2024-04-25T10:14:04.648Z]   12:14:04	-- target/device_removal.sh@187 -- # get_rdma_dev_count_in_nvmf_tgt
[2024-04-25T10:14:04.648Z]   12:14:04	-- target/device_removal.sh@82 -- # local rdma_dev_name=
[2024-04-25T10:14:04.648Z]   12:14:04	-- target/device_removal.sh@83 -- # rpc_cmd nvmf_get_stats
[2024-04-25T10:14:04.648Z]   12:14:04	-- target/device_removal.sh@83 -- # jq -r '.poll_groups[0].transports[].devices | length'
[2024-04-25T10:14:04.648Z]   12:14:04	-- common/autotest_common.sh@549 -- # xtrace_disable
[2024-04-25T10:14:04.648Z]   12:14:04	-- common/autotest_common.sh@10 -- # set +x
[2024-04-25T10:14:04.648Z] [2024-04-25 12:14:04.592606] rdma.c:2562:create_ib_device: *NOTICE*: Create IB device mlx5_1(0x1eb5750/0x1ef7f70) succeed.
[2024-04-25T10:14:04.648Z] [2024-04-25 12:14:04.592989] rdma.c:3018:nvmf_rdma_listen: *NOTICE*: *** NVMe/RDMA Target Listening on 192.168.100.9 port 4420 ***
[2024-04-25T10:14:04.648Z] [2024-04-25 12:14:04.593005] rdma.c:3311:nvmf_rdma_retry_listen_port: *NOTICE*: Port 192.168.100.9:4420 come back
[2024-04-25T10:14:04.648Z] [2024-04-25 12:14:04.593018] rdma.c:3841:nvmf_process_ib_event: *NOTICE*: Async event: GID table change
[2024-04-25T10:14:04.648Z] [2024-04-25 12:14:04.593033] rdma.c:3841:nvmf_process_ib_event: *NOTICE*: Async event: GID table change
[2024-04-25T10:14:04.648Z]   12:14:04	-- common/autotest_common.sh@577 -- # [[ 0 == 0 ]]
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@187 -- # ib_count=2
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@188 -- # (( ib_count > ib_count_after_remove ))
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@189 -- # break
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@200 -- # stop_bdevperf
[2024-04-25T10:14:04.648Z]  12:14:04	-- target/device_removal.sh@116 -- # wait 1583864
[2024-04-25T10:14:04.648Z]   12:14:04	-- target/device_removal.sh@116 -- # trap - ERR
[2024-04-25T10:14:04.648Z]   12:14:04	-- target/device_removal.sh@116 -- # print_backtrace
[2024-04-25T10:14:04.648Z]   12:14:04	-- common/autotest_common.sh@1139 -- # [[ ehxBET =~ e ]]
[2024-04-25T10:14:04.648Z]   12:14:04	-- common/autotest_common.sh@1141 -- # args=('test_remove_and_rescan' 'nvmf_device_removal_pci_remove' '--transport=rdma')
[2024-04-25T10:14:04.648Z]   12:14:04	-- common/autotest_common.sh@1141 -- # local args
[2024-04-25T10:14:04.648Z]   12:14:04	-- common/autotest_common.sh@1143 -- # xtrace_disable
[2024-04-25T10:14:04.648Z]   12:14:04	-- common/autotest_common.sh@10 -- # set +x
[2024-04-25T10:14:04.648Z] ========== Backtrace start: ==========
[2024-04-25T10:14:04.648Z] 
[2024-04-25T10:14:04.648Z] in test/nvmf/target/device_removal.sh:116 -> stop_bdevperf([])
[2024-04-25T10:14:04.648Z]      ...
[2024-04-25T10:14:04.648Z]    111 	
[2024-04-25T10:14:04.648Z]    112 		sleep 5
[2024-04-25T10:14:04.648Z]    113 	}
[2024-04-25T10:14:04.648Z]    114 	
[2024-04-25T10:14:04.648Z]    115 	function stop_bdevperf() {
[2024-04-25T10:14:04.648Z] => 116 		wait $bdevperf_rpc_pid
[2024-04-25T10:14:04.648Z]    117 	
[2024-04-25T10:14:04.648Z]    118 		killprocess $bdevperf_pid
[2024-04-25T10:14:04.648Z]    119 		bdevperf_pid=
[2024-04-25T10:14:04.648Z]    120 	
[2024-04-25T10:14:04.648Z]    121 		cat $testdir/try.txt
[2024-04-25T10:14:04.648Z]      ...
[2024-04-25T10:14:04.914Z] in test/nvmf/target/device_removal.sh:200 -> test_remove_and_rescan([])
[2024-04-25T10:14:04.914Z]      ...
[2024-04-25T10:14:04.914Z]    195 				fi
[2024-04-25T10:14:04.914Z]    196 				sleep 2
[2024-04-25T10:14:04.914Z]    197 			done
[2024-04-25T10:14:04.914Z]    198 		done
[2024-04-25T10:14:04.914Z]    199 	
[2024-04-25T10:14:04.914Z] => 200 		stop_bdevperf
[2024-04-25T10:14:04.914Z]    201 	
[2024-04-25T10:14:04.914Z]    202 		killprocess $nvmfpid
[2024-04-25T10:14:04.914Z]    203 		nvmfpid=
[2024-04-25T10:14:04.914Z]    204 	
[2024-04-25T10:14:04.914Z]    205 		return 0
[2024-04-25T10:14:04.914Z]      ...
[2024-04-25T10:14:04.914Z] in /var/jenkins/workspace/nvmf-phy-autotest/spdk/test/common/autotest_common.sh:1111 -> run_test(["nvmf_device_removal_pci_remove"],["test_remove_and_rescan"])
[2024-04-25T10:14:04.914Z]      ...
[2024-04-25T10:14:04.914Z]    1106		timing_enter $test_name
[2024-04-25T10:14:04.914Z]    1107		echo "************************************"
[2024-04-25T10:14:04.914Z]    1108		echo "START TEST $test_name"
[2024-04-25T10:14:04.914Z]    1109		echo "************************************"
[2024-04-25T10:14:04.914Z]    1110		xtrace_restore
[2024-04-25T10:14:04.914Z]    1111		time "$@"
[2024-04-25T10:14:04.914Z]    1112		xtrace_disable
[2024-04-25T10:14:04.914Z]    1113		echo "************************************"
[2024-04-25T10:14:04.914Z]    1114		echo "END TEST $test_name"
[2024-04-25T10:14:04.914Z]    1115		echo "************************************"
[2024-04-25T10:14:04.914Z]    1116		timing_exit $test_name
[2024-04-25T10:14:04.914Z]      ...
[2024-04-25T10:14:04.914Z] in test/nvmf/target/device_removal.sh:312 -> main(["--transport=rdma"])
[2024-04-25T10:14:04.914Z]      ...
[2024-04-25T10:14:04.914Z]    307 		fi
[2024-04-25T10:14:04.914Z]    308 		test_bonding_slaves_on_nics "${target_nics[@]}"
[2024-04-25T10:14:04.914Z]    309 	}
[2024-04-25T10:14:04.914Z]    310 	
[2024-04-25T10:14:04.914Z]    311 	run_test "nvmf_device_removal_pci_remove_no_srq" test_remove_and_rescan --no-srq
[2024-04-25T10:14:04.914Z] => 312 	run_test "nvmf_device_removal_pci_remove" test_remove_and_rescan
[2024-04-25T10:14:04.914Z]    313 	# bond slaves case needs lag_master & vport_manager are enabled by mlxconfig
[2024-04-25T10:14:04.914Z]    314 	# and not work on CI machine currently.
[2024-04-25T10:14:04.914Z]    315 	# run_test "nvmf_device_removal_bond_slaves" test_bond_slaves
[2024-04-25T10:14:04.914Z]    316 	
[2024-04-25T10:14:04.914Z]    317 	nvmftestfini
[2024-04-25T10:14:04.914Z]      ...
[2024-04-25T10:14:04.914Z] 
[2024-04-25T10:14:04.914Z] ========== Backtrace end ==========
...
[2024-04-25T10:14:20.176Z] 1583693 /var/jenkins/workspace/nvmf-phy-autotest/spdk/build/bin/nvmf_tgt is still up, killing

bt:

[2024-04-25T10:14:14.900Z] Core was generated by `/var/jenkins/workspace/nvmf-phy-autotest/spdk/build/examples/bdevperf -m 0x4 -z'.
[2024-04-25T10:14:14.900Z] Program terminated with signal SIGSEGV, Segmentation fault.
[2024-04-25T10:14:14.900Z] #0  0x00007fdfcb8e5132 in unlink_chunk.isra () from /usr/lib64/libc.so.6
[2024-04-25T10:14:14.900Z] [Current thread is 1 (Thread 0x7fdfcad3ba00 (LWP 1583737))]
[2024-04-25T10:14:14.900Z] 
[2024-04-25T10:14:14.900Z] Thread 2 (Thread 0x7fdfcac006c0 (LWP 1583739)):
[2024-04-25T10:14:14.900Z] #0  0x00007fdfcb95ec62 in epoll_wait () from /usr/lib64/libc.so.6
[2024-04-25T10:14:14.900Z] No symbol table info available.
[2024-04-25T10:14:14.900Z] #1  0x00007fdfcc59edc4 in eal_intr_handle_interrupts (totalfds=<optimized out>, pfd=<optimized out>) at ../lib/eal/linux/eal_interrupts.c:1077
[2024-04-25T10:14:14.900Z]         events = 0x7fdfcabfeb80
[2024-04-25T10:14:14.900Z]         nfds = <optimized out>
[2024-04-25T10:14:14.900Z]         events = <optimized out>
[2024-04-25T10:14:14.900Z]         nfds = <optimized out>
[2024-04-25T10:14:14.900Z] #2  eal_intr_thread_main (arg=<optimized out>) at ../lib/eal/linux/eal_interrupts.c:1163
[2024-04-25T10:14:14.900Z]         pipe_event = {events = 3, data = {ptr = 0x4, fd = 4, u32 = 4, u64 = 4}}
[2024-04-25T10:14:14.900Z]         src = <optimized out>
[2024-04-25T10:14:14.900Z]         numfds = <optimized out>
[2024-04-25T10:14:14.900Z]         pfd = <optimized out>
[2024-04-25T10:14:14.900Z]         __func__ = "eal_intr_thread_main"
[2024-04-25T10:14:14.900Z] #3  0x00007fdfcb8d8947 in start_thread () from /usr/lib64/libc.so.6
[2024-04-25T10:14:14.900Z] No symbol table info available.
[2024-04-25T10:14:14.900Z] #4  0x00007fdfcb95e860 in clone3 () from /usr/lib64/libc.so.6
[2024-04-25T10:14:14.900Z] No symbol table info available.
[2024-04-25T10:14:14.900Z] 
[2024-04-25T10:14:14.900Z] Thread 1 (Thread 0x7fdfcad3ba00 (LWP 1583737)):
[2024-04-25T10:14:14.900Z] #0  0x00007fdfcb8e5132 in unlink_chunk.isra () from /usr/lib64/libc.so.6
[2024-04-25T10:14:14.900Z] No symbol table info available.
[2024-04-25T10:14:14.900Z] #1  0x00007fdfcb8e683b in _int_free () from /usr/lib64/libc.so.6
[2024-04-25T10:14:14.900Z] No symbol table info available.
[2024-04-25T10:14:14.900Z] #2  0x00007fdfcb8e920e in free () from /usr/lib64/libc.so.6
[2024-04-25T10:14:14.900Z] No symbol table info available.
[2024-04-25T10:14:14.900Z] #3  0x00007fdfcb83aff2 in ibv_close_device () from /usr/lib64/libibverbs.so.1
[2024-04-25T10:14:14.900Z] No symbol table info available.
[2024-04-25T10:14:14.900Z] #4  0x00007fdfcb811ec5 in remove_cma_dev.part () from /usr/lib64/librdmacm.so.1
[2024-04-25T10:14:14.900Z] No symbol table info available.
[2024-04-25T10:14:14.900Z] #5  0x00007fdfcb814321 in ucma_free_id () from /usr/lib64/librdmacm.so.1
[2024-04-25T10:14:14.900Z] No symbol table info available.
[2024-04-25T10:14:14.900Z] #6  0x00007fdfcb815867 in rdma_destroy_id () from /usr/lib64/librdmacm.so.1
[2024-04-25T10:14:14.900Z] No symbol table info available.
[2024-04-25T10:14:14.900Z] #7  0x00007fdfcd2eb2e6 in nvme_rdma_qpair_destroy (rqpair=0x2000192d7500) at nvme_rdma.c:1917
[2024-04-25T10:14:14.900Z]         qpair = 0x2000192d7500
[2024-04-25T10:14:14.900Z]         rctrlr = 0x2000192e03c0
[2024-04-25T10:14:14.900Z]         entry = 0x0
[2024-04-25T10:14:14.900Z]         tmp = 0x2000192d7500
[2024-04-25T10:14:14.900Z]         __PRETTY_FUNCTION__ = "nvme_rdma_qpair_destroy"
[2024-04-25T10:14:14.900Z] #8  0x00007fdfcd2ebcc8 in nvme_rdma_qpair_wait_until_quiet (rqpair=0x2000192d7500) at nvme_rdma.c:1975
[2024-04-25T10:14:14.900Z] No locals.
[2024-04-25T10:14:14.900Z] #9  0x00007fdfcd2ec23d in nvme_rdma_ctrlr_disconnect_qpair_poll (ctrlr=0x2000192e03c0, qpair=0x2000192d7500) at nvme_rdma.c:2028
[2024-04-25T10:14:14.900Z]         rqpair = 0x2000192d7500
[2024-04-25T10:14:14.900Z]         rc = 0
[2024-04-25T10:14:14.900Z]         __PRETTY_FUNCTION__ = "nvme_rdma_ctrlr_disconnect_qpair_poll"
[2024-04-25T10:14:14.900Z] #10 0x00007fdfcd2f998d in nvme_rdma_poll_group_process_completions (tgroup=0xcb6a10, completions_per_qpair=128, disconnected_qpair_cb=0x7fdfcd84233c <bdev_nvme_disconnected_qpair_cb>) at nvme_rdma.c:3170
[2024-04-25T10:14:14.900Z]         qpair = 0x2000192d7500
[2024-04-25T10:14:14.900Z]         tmp_qpair = 0x0
[2024-04-25T10:14:14.900Z]         rqpair = 0x0
[2024-04-25T10:14:14.900Z]         tmp_rqpair = 0x0
[2024-04-25T10:14:14.900Z]         group = 0xcb6a10
[2024-04-25T10:14:14.900Z]         poller = 0x0
[2024-04-25T10:14:14.900Z]         batch_size = 1
[2024-04-25T10:14:14.900Z]         rc = 0
[2024-04-25T10:14:14.900Z]         rc2 = 0
[2024-04-25T10:14:14.900Z]         total_completions = 0
[2024-04-25T10:14:14.900Z]         completions_allowed = 0
[2024-04-25T10:14:14.900Z]         completions_per_poller = 0
[2024-04-25T10:14:14.900Z]         poller_completions = 0
[2024-04-25T10:14:14.900Z]         rdma_completions = 0
[2024-04-25T10:14:14.900Z]         __func__ = "nvme_rdma_poll_group_process_completions"
[2024-04-25T10:14:14.900Z] #11 0x00007fdfcd271873 in nvme_transport_poll_group_process_completions (tgroup=0xcb6a10, completions_per_qpair=0, disconnected_qpair_cb=0x7fdfcd84233c <bdev_nvme_disconnected_qpair_cb>) at nvme_transport.c:715
[2024-04-25T10:14:14.900Z] No locals.
[2024-04-25T10:14:14.900Z] #12 0x00007fdfcd2b754e in spdk_nvme_poll_group_process_completions (group=0xede740, completions_per_qpair=0, disconnected_qpair_cb=0x7fdfcd84233c <bdev_nvme_disconnected_qpair_cb>) at nvme_poll_group.c:157
[2024-04-25T10:14:14.900Z]         tgroup = 0xcb6a10
[2024-04-25T10:14:14.900Z]         local_completions = 0
[2024-04-25T10:14:14.900Z]         error_reason = 0
[2024-04-25T10:14:14.900Z]         num_completions = 0
[2024-04-25T10:14:14.900Z]         __PRETTY_FUNCTION__ = "spdk_nvme_poll_group_process_completions"
[2024-04-25T10:14:14.900Z] #13 0x00007fdfcd842e23 in bdev_nvme_poll (arg=0xf68f70) at bdev_nvme.c:1638
[2024-04-25T10:14:14.900Z]         group = 0xf68f70
[2024-04-25T10:14:14.900Z]         num_completions = 0
[2024-04-25T10:14:14.900Z] #14 0x00007fdfcc838f08 in thread_execute_poller (thread=0xcb3eb0, poller=0xcb6870) at thread.c:953
[2024-04-25T10:14:14.900Z]         rc = 0
[2024-04-25T10:14:14.900Z]         __PRETTY_FUNCTION__ = "thread_execute_poller"
[2024-04-25T10:14:14.900Z]         __func__ = "thread_execute_poller"
[2024-04-25T10:14:14.900Z] #15 0x00007fdfcc83ac8e in thread_poll (thread=0xcb3eb0, max_msgs=0, now=15266876399874836) at thread.c:1079
[2024-04-25T10:14:14.900Z]         poller_rc = 0
[2024-04-25T10:14:14.900Z]         msg_count = 0
[2024-04-25T10:14:14.900Z]         poller = 0xcb6870
[2024-04-25T10:14:14.900Z]         tmp = 0x0
[2024-04-25T10:14:14.900Z]         critical_msg = 0x0
[2024-04-25T10:14:14.900Z]         rc = 0
[2024-04-25T10:14:14.900Z] #16 0x00007fdfcc83b895 in spdk_thread_poll (thread=0xcb3eb0, max_msgs=0, now=15266876399874836) at thread.c:1163
[2024-04-25T10:14:14.900Z]         orig_thread = 0x0
[2024-04-25T10:14:14.900Z]         rc = 0
[2024-04-25T10:14:14.900Z] #17 0x00007fdfccc0b8a9 in _reactor_run (reactor=0xbfab80) at reactor.c:914
[2024-04-25T10:14:14.900Z]         thread = 0xcb3eb0
[2024-04-25T10:14:14.900Z]         lw_thread = 0xcb41f8
[2024-04-25T10:14:14.900Z]         tmp = 0x0
[2024-04-25T10:14:14.900Z]         now = 15266876399874836
[2024-04-25T10:14:14.900Z]         rc = 0
[2024-04-25T10:14:14.900Z] #18 0x00007fdfccc0bef9 in reactor_run (arg=0xbfab80) at reactor.c:952
[2024-04-25T10:14:14.900Z]         reactor = 0xbfab80
[2024-04-25T10:14:14.900Z]         thread = 0x7fdfcc583768 <rte_lcore_is_enabled+8>
[2024-04-25T10:14:14.900Z]         lw_thread = 0x7fdfcc83c3c2 <thread_send_msg_notification+34>
[2024-04-25T10:14:14.900Z]         tmp = 0x0
[2024-04-25T10:14:14.900Z]         thread_name = "reactor_2\000X\314\337\177\000\000\000\000\000\000\000\000\000\000\0378\317\314\002\000\000"
[2024-04-25T10:14:14.900Z]         last_sched = 0
[2024-04-25T10:14:14.900Z]         __func__ = "reactor_run"
[2024-04-25T10:14:14.900Z] #19 0x00007fdfccc0ca06 in spdk_reactors_start () at reactor.c:1068
[2024-04-25T10:14:14.900Z]         reactor = 0xbfab80
[2024-04-25T10:14:14.900Z]         i = 4294967295
[2024-04-25T10:14:14.901Z]         current_core = 2
[2024-04-25T10:14:14.901Z]         rc = 0
[2024-04-25T10:14:14.901Z]         __func__ = "spdk_reactors_start"
[2024-04-25T10:14:14.901Z]         __PRETTY_FUNCTION__ = "spdk_reactors_start"
[2024-04-25T10:14:14.901Z] #20 0x00007fdfccc00ebe in spdk_app_start (opts_user=0x7fff6820ce90, start_fn=0x41cda6 <bdevperf_run>, arg1=0x0) at app.c:893
[2024-04-25T10:14:14.901Z]         rc = 0
[2024-04-25T10:14:14.901Z]         tty = 0x0
[2024-04-25T10:14:14.901Z]         tmp_cpumask = {str = '\000' <repeats 256 times>, cpus = "\004", '\000' <repeats 126 times>}
[2024-04-25T10:14:14.901Z]         g_env_was_setup = false
[2024-04-25T10:14:14.901Z]         opts_local = {name = 0x44bc7b "bdevperf", json_config_file = 0x0, json_config_ignore_errors = false, reserved17 = "\000\000\000\000\000\000", rpc_addr = 0x7fff6820de11 "/var/tmp/bdevperf.sock", reactor_mask = 0x7fff6820de07 "0x4", tpoint_group_mask = 0x0, shm_id = -1, reserved52 = "\000\000\000", shutdown_cb = 0x41d790 <spdk_bdevperf_shutdown_cb>, enable_coredump = true, reserved65 = "\000\000", mem_channel = -1, main_core = -1, mem_size = -1, no_pci = false, hugepage_single_segments = false, unlink_hugepage = false, no_huge = false, reserved84 = "\000\000\000", hugedir = 0x0, print_level = SPDK_LOG_INFO, reserved100 = "\000\000\000", num_pci_addr = 0, pci_blocked = 0x0, pci_allowed = 0x0, iova_mode = 0x0, delay_subsystem_init = false, reserved137 = "\000\000\000\000\000\000", num_entries = 32768, env_context = 0x0, log = 0x0, base_virtaddr = 35184372088832, opts_size = 252, disable_signal_handlers = false, interrupt_mode = false, reserved186 = "\000\000\000\000\000", msg_mempool_size = 262143, rpc_allowlist = 0x0, vf_token = 0x0, lcore_map = 0x0, rpc_log_level = SPDK_LOG_DISABLED, rpc_log_file = 0x0, json_data = 0x0, json_data_size = 0}
[2024-04-25T10:14:14.901Z]         opts = 0x7fff6820cbb0
[2024-04-25T10:14:14.901Z]         i = 128
[2024-04-25T10:14:14.901Z]         __func__ = "spdk_app_start"
[2024-04-25T10:14:14.901Z] #21 0x000000000041ec57 in main (argc=14, argv=0x7fff6820d0c8) at bdevperf.c:2886
[2024-04-25T10:14:14.901Z]         opts = {name = 0x44bc7b "bdevperf", json_config_file = 0x0, json_config_ignore_errors = false, reserved17 = "\000\000\000\000\000\000", rpc_addr = 0x7fff6820de11 "/var/tmp/bdevperf.sock", reactor_mask = 0x7fff6820de07 "0x4", tpoint_group_mask = 0x0, shm_id = -1, reserved52 = "\000\000\000", shutdown_cb = 0x41d790 <spdk_bdevperf_shutdown_cb>, enable_coredump = true, reserved65 = "\000\000", mem_channel = -1, main_core = -1, mem_size = -1, no_pci = false, hugepage_single_segments = false, unlink_hugepage = false, no_huge = false, reserved84 = "\000\000\000", hugedir = 0x0, print_level = SPDK_LOG_INFO, reserved100 = "\000\000\000", num_pci_addr = 0, pci_blocked = 0x0, pci_allowed = 0x0, iova_mode = 0x0, delay_subsystem_init = false, reserved137 = "\000\000\000\000\000\000", num_entries = 32768, env_context = 0x0, log = 0x0, base_virtaddr = 35184372088832, opts_size = 252, disable_signal_handlers = false, interrupt_mode = false, reserved186 = "\000\000\000\000\000", msg_mempool_size = 0, rpc_allowlist = 0x0, vf_token = 0x0, lcore_map = 0x0, rpc_log_level = SPDK_LOG_DISABLED, rpc_log_file = 0x0, json_data = 0x0, json_data_size = 0}
[2024-04-25T10:14:14.901Z]         rc = 1
[2024-04-25T10:14:14.901Z] 
[2024-04-25T10:14:14.901Z] --
@jimharris
Copy link
Member

[Bug Scrub] Assigning to @AlekseyMarchuk to take a look at if we see any more reports.

@spdkci
Copy link

spdkci commented May 9, 2024

Another instance of this failure. Reported by @apaszkie. log: https://ci.spdk.io/public_build/autotest-per-patch_121916.html

@spdkci
Copy link

spdkci commented May 9, 2024

Another instance of this failure. Reported by @spdkci / Known Issue Detector. log: https://ci.spdk.io/public_build/autotest-nightly_3684.html

@xjjx
Copy link
Contributor

xjjx commented May 10, 2024

Another instance of this issue, but different test - nvmf_device_removal_pci_remove_no_srq
https://ci.spdk.io/public_build/autotest-spdk-master-vs-dpdk-main_3458.html

@spdkci
Copy link

spdkci commented May 14, 2024

Another instance of this failure. Reported by @spdkci / Known Issue Detector. log: https://ci.spdk.io/public_build/autotest-spdk-master-vs-dpdk-main_3472.html

@spdkci
Copy link

spdkci commented May 15, 2024

Another instance of this failure. Reported by @spdkci / Known Issue Detector. log: https://ci.spdk.io/public_build/autotest-spdk-master-vs-dpdk-main_3476.html

@spdkci
Copy link

spdkci commented May 15, 2024

Another instance of this failure. Reported by @DamiaSan. log: https://ci.spdk.io/public_build/autotest-per-patch_122926.html

@spdkci
Copy link

spdkci commented May 16, 2024

Another instance of this failure. Reported by @spdkci / Known Issue Detector. log: https://ci.spdk.io/public_build/autotest-spdk-master-vs-dpdk-v23.11_429.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants