Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

got a soft lockup when i try to use it #23

Closed
ltgoter opened this issue Aug 25, 2020 · 1 comment
Closed

got a soft lockup when i try to use it #23

ltgoter opened this issue Aug 25, 2020 · 1 comment

Comments

@ltgoter
Copy link

ltgoter commented Aug 25, 2020

I try to use it in cloudlab, , but got a soft lockup. Do you have any idea aout it?
The detail is as follow:

environmont:
infiniswap, i use the lastest master branch version
ubuntu14.04(3.13.0-168-generic)
MLNX_OFED_LINUX-3.3-1.0.4.0-ubuntu14.04-x86_64
docker-ce 17.06.0~ce-0~ubuntu

hardware
i use two m510 nodes, with ConnectX-3.

I do just as the readme.md said, use the script to install and run successfully. and then use a docker to run app. But when i try to run tpcc benchmark in voltdb, the container and node will down. After i change /proc/sys/kernel/softlockup_panic to 1 to make the node reachable, and rerun it with a simple program just allocting mem continuously in container. When it try to use swap, it get the same error again. I use dmesg, and get the follow message:

[ 680.398599] BUG: soft lockup - CPU#6 stuck for 22s! [docker:11971]
[ 680.426590] Modules linked in: infiniswap(OX) ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs nfsv3 ipod(OX) ib_iser(OX) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi gpio_ich x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd lpc_ich ipmi_si shpchp nfsd knem(OX) mac_hid wmi acpi_power_meter auth_rpcgss nfs_acl lp parport nfs lockd sunrpc fscache rdma_ucm(OX) ib_ucm(OX) rdma_cm(OX) iw_cm(OX) configfs ib_ipoib(OX) ib_cm(OX) ib_uverbs(OX) ib_umad(OX) mlx5_ib(OX) mlx5_core(OX) mlx4_ib(OX) ib_sa(OX) ib_mad(OX) ib_core(OX) ib_addr(OX) ib_netlink(OX) mlx4_en(OX) vxlan ip_tunnel ptp pps_core mlx4_core(OX) nvme mlx_compat(OX)
[ 680.791448] CPU: 6 PID: 11971 Comm: docker Tainted: G D OX 3.13.0-168-generic #218-Ubuntu
[ 680.833909] Hardware name: HP ProLiant m510 Server Cartridge/ProLiant m510 Server Cartridge, BIOS H05 05/09/2016
[ 680.882638] task: ffff880ffe6fc800 ti: ffff880fdab22000 task.ti: ffff880fdab22000
[ 680.918015] RIP: 0010:[] [] smp_call_function_many+0x28e/0x2f0
[ 680.958596] RSP: 0000:ffff880fdab23c30 EFLAGS: 00000202
[ 680.982393] RAX: 000000000000000d RBX: ffff88107fcd46a8 RCX: ffff88107fdb7ab8
[ 681.014626] RDX: 000000000000000d RSI: 0000000000000100 RDI: 0000000000000000
[ 681.046588] RBP: ffff880fdab23c80 R08: ffff88107fcd4688 R09: 0000000000000004
[ 681.078563] R10: ffff88107fcd4688 R11: 0000000000000000 R12: 0000000000000006
[ 681.110655] R13: 0000010000000006 R14: 000000000000faa0 R15: 000000fc00013b80
[ 681.142801] FS: 00007f4f157fa700(0000) GS:ffff88107fcc0000(0000) knlGS:0000000000000000
[ 681.179177] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 681.205209] CR2: 00007f4f1e5676c0 CR3: 0000001024c26000 CR4: 0000000000360770
[ 681.237166] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 681.269155] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[ 681.301400] Stack:
[ 681.310363] ffff88107fcd46a8 0000000000014640 ffff880fdab23c90 ffffffff81060e20
[ 681.343509] 0000010000000001 ffff8810237c33c0 00007f4f1e5686c0 ffff8810237c3100
[ 681.376719] 00007f4f1e5676c0 ffff880fdab5fb38 ffff880fdab23ca8 ffffffff8106103e
[ 681.410197] Call Trace:
[ 681.421135] [] ? do_kernel_range_flush+0x40/0x40
[ 681.449502] [] native_flush_tlb_others+0x2e/0x30
[ 681.477577] [] flush_tlb_mm_range+0x8a/0x120
[ 681.504522] [] ptep_clear_flush+0x53/0x60
[ 681.529867] [] do_wp_page+0x2a5/0x860
[ 681.554043] [] handle_mm_fault+0x6fb/0xfb0
[ 681.579771] [] ? n_tty_read+0x40c/0xc00
[ 681.604430] [] __do_page_fault+0x183/0x570
[ 681.630373] [] ? wake_up_state+0x20/0x20
[ 681.655321] [] do_page_fault+0x1a/0x70
[ 681.680765] [] page_fault+0x28/0x30
[ 681.703836] Code: 3b 05 cf 52 c3 00 89 c2 0f 8d fd fd ff ff 48 98 49 8b 4d 00 48 03 0c c5 40 7c d1 81 f6 41 20 01 74 cb 0f 1f 00 f3 90 f6 41 20 01 <75> f8 eb be 0f b6 4d d0 48 8b 55 c0 44 89 ef 48 8b 75 c8 e8 ca

Hoping for your help,
thanks.

@ltgoter
Copy link
Author

ltgoter commented Apr 2, 2021

It seems some node could not make this module work in cloudlab. I change to use c6220, it could work.

@ltgoter ltgoter closed this as completed Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant