Xeon D QuickData engine behavior issue #3

nufosmatic · 2021-07-29T22:10:59Z

We have a Red Hat 7.7 system where the IOATDMA driver is not quite built for this particular IOAT DMA engine. We managed to get it to build and run, but we have an issue where transfers are not completing and the COMPLETION TIMEOUT is being invoked. The transfer is happening, but the DMA engine doesn't think so.

We're suspicious of a caching issue, but I haven't found the magic twiddle to get past it. If our data buffer is >8MB, then the transfer completes successfully with good transfer speeds.

We discovered your page (actually was here once chasing down NTB/IOAT stuff), and decided to grab a late-breaking snapshot. However, it isn't working either, and it's not working in a different way. You had made a change with regard to completion in this class of DMA engine. It seems like the point where the other code is not getting a completion, your code is complaining about errors but not errors...

This is the device in question:

03:00.0 System peripheral: Intel(R) Xeon Processor D Family QuickData Technology Register DMA Channel 0
03:00.0 0880: 8086:6f50

And once a channel is allocated and a DMA_MEMCPY transfer is submitted, we get this spew on the console:

[ 6138.674756] ioatdma 0000:03:00.0: Errors:
[ 6138.778480] ioatdma 0000:03:00.0: CHANSTS: 0x80701d941 CHANERR: 0x0
[ 6138.784761] ioatdma 0000:03:00.0: Errors:
[ 6138.888486] ioatdma 0000:03:00.0: CHANSTS: 0x80701d981 CHANERR: 0x0
[ 6138.894767] ioatdma 0000:03:00.0: Errors:
[ 6138.998491] ioatdma 0000:03:00.0: CHANSTS: 0x80701d9c1 CHANERR: 0x0
[ 6139.004772] ioatdma 0000:03:00.0: Errors:
[ 6139.108496] ioatdma 0000:03:00.0: CHANSTS: 0x80701da01 CHANERR: 0x0
[ 6139.114778] ioatdma 0000:03:00.0: Errors:
[ 6139.218501] ioatdma 0000:03:00.0: CHANSTS: 0x80701da41 CHANERR: 0x0
[ 6139.224782] ioatdma 0000:03:00.0: Errors:
[ 6139.328506] ioatdma 0000:03:00.0: CHANSTS: 0x80701da81 CHANERR: 0x0
[ 6139.334787] ioatdma 0000:03:00.0: Errors:
[ 6139.438511] ioatdma 0000:03:00.0: CHANSTS: 0x80701dac1 CHANERR: 0x0
[ 6139.444792] ioatdma 0000:03:00.0: Errors:
[ 6139.548516] ioatdma 0000:03:00.0: CHANSTS: 0x80701db01 CHANERR: 0x0
[ 6139.554798] ioatdma 0000:03:00.0: Errors:
<>

This is happening with any of your code after about Jun 2017...

Can you give us a hint where to look next?

vendor_id : GenuineIntel
cpu family : 6
model : 86
model name : Intel(R) Xeon(R) CPU D-1559 @ 1.50GHz
stepping : 4
microcode : 0xf000015
cpu MHz : 1500.000
cache size : 18432 KB
physical id : 0
siblings : 24
core id : 13
cpu cores : 12
apicid : 27
initial apicid : 27
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
bogomips : 2999.85
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

davejiang · 2021-07-29T22:23:28Z

Are you doing mem->mem transfer or mem->mmio transfer? Have you tried increasing the timeout value if target is MMIO?

Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing it to siphon off UDP packets early in the handling of received UDP packets thereby avoiding the packet going through the UDP receive queue, it doesn't get ICMP packets through the UDP ->sk_error_report() callback. In fact, it doesn't appear that there's any usable option for getting hold of ICMP packets. Fix this by adding a new UDP encap hook to distribute error messages for UDP tunnels. If the hook is set, then the tunnel driver will be able to see ICMP packets. The hook provides the offset into the packet of the UDP header of the original packet that caused the notification. An alternative would be to call the ->error_handler() hook - but that requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error() do, though isn't really necessary or desirable in rxrpc's case is we want to parse them there and then, not queue them). Changes ======= ver #3) - Fixed an uninitialised variable. ver #2) - Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals. Fixes: 5271953 ("rxrpc: Use the UDP encap_rcv hook") Signed-off-by: David Howells <dhowells@redhat.com>

The SRv6 layer allows defining HMAC data that can later be used to sign IPv6 Segment Routing Headers. This configuration is realised via netlink through four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual length of the SECRET attribute, it is possible to provide invalid combinations (e.g., secret = "", secretlen = 64). This case is not checked in the code and with an appropriately crafted netlink message, an out-of-bounds read of up to 64 bytes (max secret length) can occur past the skb end pointer and into skb_shared_info: Breakpoint 1, seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 208 memcpy(hinfo->secret, secret, slen); (gdb) bt #0 seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 #1 0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600, extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 <init_net>, family=<optimized out>, family=<optimized out>) at net/netlink/genetlink.c:731 #2 0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00, family=0xffffffff82fef6c0 <seg6_genl_family>) at net/netlink/genetlink.c:775 #3 genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792 #4 0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 <genl_rcv_msg>) at net/netlink/af_netlink.c:2501 #5 0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803 #6 0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000) at net/netlink/af_netlink.c:1319 #7 netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=<optimized out>) at net/netlink/af_netlink.c:1345 #8 0xffffffff81dff9a4 in netlink_sendmsg (sock=<optimized out>, msg=0xffffc90000ba7e48, len=<optimized out>) at net/netlink/af_netlink.c:1921 ... (gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end $1 = 0xffff88800b1b76c0 (gdb) p/x secret $2 = 0xffff88800b1b76c0 (gdb) p slen $3 = 64 '@' The OOB data can then be read back from userspace by dumping HMAC state. This commit fixes this by ensuring SECRETLEN cannot exceed the actual length of SECRET. Reported-by: Lucas Leong <wmliang.tw@gmail.com> Tested: verified that EINVAL is correctly returned when secretlen > len(secret) Fixes: 4f4853d ("ipv6: sr: implement API to control SR HMAC structure") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>

During stress testing with CONFIG_SMP disabled, KASAN reports as below: ================================================================== BUG: KASAN: use-after-free in __mutex_lock+0xe5/0xc30 Read of size 8 at addr ffff8881094223f8 by task stress/7789 CPU: 0 PID: 7789 Comm: stress Not tainted 6.0.0-rc1-00002-g0d53d2e882f9 #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <TASK> .. __mutex_lock+0xe5/0xc30 .. z_erofs_do_read_page+0x8ce/0x1560 .. z_erofs_readahead+0x31c/0x580 .. Freed by task 7787 kasan_save_stack+0x1e/0x40 kasan_set_track+0x20/0x30 kasan_set_free_info+0x20/0x40 __kasan_slab_free+0x10c/0x190 kmem_cache_free+0xed/0x380 rcu_core+0x3d5/0xc90 __do_softirq+0x12d/0x389 Last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0x97/0xb0 call_rcu+0x3d/0x3f0 erofs_shrink_workstation+0x11f/0x210 erofs_shrink_scan+0xdc/0x170 shrink_slab.constprop.0+0x296/0x530 drop_slab+0x1c/0x70 drop_caches_sysctl_handler+0x70/0x80 proc_sys_call_handler+0x20a/0x2f0 vfs_write+0x555/0x6c0 ksys_write+0xbe/0x160 do_syscall_64+0x3b/0x90 The root cause is that erofs_workgroup_unfreeze() doesn't reset to orig_val thus it causes a race that the pcluster reuses unexpectedly before freeing. Since UP platforms are quite rare now, such path becomes unnecessary. Let's drop such specific-designed path directly instead. Fixes: 73f5c66 ("staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}'") Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220902045710.109530-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>

davejiang closed this as completed Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xeon D QuickData engine behavior issue #3

Xeon D QuickData engine behavior issue #3

nufosmatic commented Jul 29, 2021

davejiang commented Jul 29, 2021

Xeon D QuickData engine behavior issue #3

Xeon D QuickData engine behavior issue #3

Comments

nufosmatic commented Jul 29, 2021

davejiang commented Jul 29, 2021