Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDS 4.16 hsu fixes #23

Closed
wants to merge 3 commits into from
Closed

Conversation

htot
Copy link

@htot htot commented Apr 28, 2018

Hi Andy, here is what I have working now. I don't expect you to merge this yet. Although it works, it is still a hack enabled on ttyS1 only. But it is a big improvement, allowing 4Mb/s even while loading the kernel (I used iperf3 over usb->eth converter). But, I would like to here your comments and make it perfect, maybe eventually you could upstream this.

The 1st (linear buffer for transmit) you've seen before. The 2nd silences the hsu dma interrupt storm. The 3rd makes rx dma setup prior to reception instead of on interrupt of reception of first char. At 2Mb/s the interrupt latency was to large for the FIFO length (on x86_64 at least), even without load.

The hack is of course I enabled this only on ttyS1. For some reason it breaks ttyS0 and ttyS2 if I enable on all. I think on the console ttyS0 dma is not enabled and ttyS2 is maybe a serdev and initialized different?

8250_dma used the circular xmit->buf as DMA output buffer. This causes messages that wrap around
in the circular buffer to be transmitted using 2 DMA transfers. Depending on baud rate and processor
load this can cause an interchar gap in the middle of the message. On the receiving end the gap
may cause a short receive timeout, possibly to short to restart a receive DMA transfer and causing
a receive buffer overrun.
Fix this but creating a linear tx_buffer and copying all of xmit->buf into it.

Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>
On Intel Tangier B0 and Anniedale the interrupt line, disregarding to have
different numbers, is shared between HSU DMA and UART IPs. In that case
IRQ handler is called in UART driver only clearing the interrupt source.

But as the interrupt is level triggered an interrupt storm happens on the
HSU DMA until the interrupt is cleared in the UART driver. To prevent this,
don't enable the HSU DMA interrupt at all.

Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>
Instead of initiating rx dma when the first char arrives in the UART, arm rx dma
continously. Dma transfers automatically empty rx fifo, preventing overruns. As
before, transfer terminates when the dma buffer is full or when a 4 char
interchar gap is received (timeout). After timeout we arm the dma again after
a short delay from a work queue.

Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>
@htot
Copy link
Author

htot commented May 2, 2018

I tested this on x86_32. Now retested this on x86_64 and it disappointingly doesn't work at all. I need to go back to to x86_32 to figure out why it worked or seemed to work.

In the mean time, don't spend time here. I'll close this for now.

@htot htot closed this May 2, 2018
@htot htot deleted the eds-4.16-hsu-fixes branch June 16, 2018 19:08
htot pushed a commit to edison-fw/linux that referenced this pull request Nov 9, 2020
The atomic check hooks must look up the encoder to be used with a
connector from the connector's atomic state, and not assume that it's
the connector's current attached encoder. The latter one can change
under the atomic check func, or can be unset yet as in the case of MST
connectors.

This fixes
[    7.940719] Oops: 0000 [andy-shev#1] SMP NOPTI
[    7.944407] CPU: 2 PID: 143 Comm: kworker/2:2 Not tainted 5.6.0-1023-oem andy-shev#23-Ubuntu
[    7.952102] Hardware name: Dell Inc. Latitude 7320/, BIOS 88.87.11 09/07/2020
[    7.959278] Workqueue: events output_poll_execute [drm_kms_helper]
[    7.965511] RIP: 0010:intel_psr_atomic_check+0x37/0xa0 [i915]
[    7.971327] Code: 80 2d 06 00 00 20 74 42 80 b8 34 71 00 00 00 74 39 48 8b 72 08 48 85 f6 74 30 80 b8 f8 71 00 00 00 74 27 4c 8b 87 80 04 00 00 <41> 8b 78 78 83 ff 08 77 19 31 c9 83 ff 05 77 19 48 81 c1 20 01 00
[    7.977541] input: PS/2 Generic Mouse as /devices/platform/i8042/serio1/input/input5
[    7.990154] RSP: 0018:ffffb864c073fac8 EFLAGS: 00010202
[    7.990155] RAX: ffff8c5d55ce0000 RBX: ffff8c5d54519000 RCX: 0000000000000000
[    7.990155] RDX: ffff8c5d55cb30c0 RSI: ffff8c5d89a0c800 RDI: ffff8c5d55fcf800
[    7.990156] RBP: ffffb864c073fac8 R08: 0000000000000000 R09: ffff8c5d55d9f3a0
[    7.990156] R10: ffff8c5d55cb30c0 R11: 0000000000000009 R12: ffff8c5d55fcf800
[    7.990156] R13: ffff8c5d55cb30c0 R14: ffff8c5d56989cc0 R15: ffff8c5d56989cc0
[    7.990158] FS:  0000000000000000(0000) GS:ffff8c5d8e480000(0000) knlGS:0000000000000000
[    8.047193] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.052970] CR2: 0000000000000078 CR3: 0000000856500005 CR4: 0000000000760ee0
[    8.060137] PKRU: 55555554
[    8.062867] Call Trace:
[    8.065361]  intel_digital_connector_atomic_check+0x53/0x130 [i915]
[    8.071703]  intel_dp_mst_atomic_check+0x5b/0x200 [i915]
[    8.077074]  drm_atomic_helper_check_modeset+0x1db/0x790 [drm_kms_helper]
[    8.083942]  intel_atomic_check+0x92/0xc50 [i915]
[    8.088705]  ? drm_plane_check_pixel_format+0x4f/0xb0 [drm]
[    8.094345]  ? drm_atomic_plane_check+0x7a/0x3a0 [drm]
[    8.099548]  drm_atomic_check_only+0x2b1/0x450 [drm]
[    8.104573]  drm_atomic_commit+0x18/0x50 [drm]
[    8.109070]  drm_client_modeset_commit_atomic+0x1c9/0x200 [drm]
[    8.115056]  drm_client_modeset_commit_force+0x55/0x160 [drm]
[    8.120866]  drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper]
[    8.128415]  drm_fb_helper_set_par+0x34/0x50 [drm_kms_helper]
[    8.134225]  drm_fb_helper_hotplug_event.part.0+0xb4/0xe0 [drm_kms_helper]
[    8.141150]  drm_fb_helper_hotplug_event+0x1c/0x30 [drm_kms_helper]
[    8.147481]  intel_fbdev_output_poll_changed+0x6f/0xa0 [i915]
[    8.153287]  drm_kms_helper_hotplug_event+0x2c/0x40 [drm_kms_helper]
[    8.159709]  output_poll_execute+0x1aa/0x1c0 [drm_kms_helper]
[    8.165506]  process_one_work+0x1e8/0x3b0
[    8.169561]  worker_thread+0x4d/0x400
[    8.173249]  kthread+0x104/0x140
[    8.176515]  ? process_one_work+0x3b0/0x3b0
[    8.180726]  ? kthread_park+0x90/0x90
[    8.184416]  ret_from_fork+0x1f/0x40

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2361
References: https://gitlab.freedesktop.org/drm/intel/-/issues/2486
Reported-by: William Tseng <william.tseng@intel.com>
Reported-by: Cooper Chiou <cooper.chiou@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201027160928.3665377-1-imre.deak@intel.com
(cherry picked from commit 00e5deb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
htot pushed a commit to edison-fw/linux that referenced this pull request Nov 24, 2020
This fix is for a failure that occurred in the DWARF unwind perf test.

Stack unwinders may probe memory when looking for frames.

Memory sanitizer will poison and track uninitialized memory on the
stack, and on the heap if the value is copied to the heap.

This can lead to false memory sanitizer failures for the use of an
uninitialized value.

Avoid this problem by removing the poison on the copied stack.

The full msan failure with track origins looks like:

==2168==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8
    andy-shev#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    andy-shev#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    andy-shev#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    andy-shev#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    andy-shev#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    andy-shev#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    andy-shev#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    andy-shev#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    andy-shev#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    andy-shev#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    andy-shev#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    andy-shev#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    andy-shev#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    andy-shev#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    andy-shev#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    andy-shev#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    andy-shev#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    andy-shev#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    andy-shev#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    andy-shev#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    andy-shev#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    andy-shev#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    andy-shev#23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22
    andy-shev#1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13
    andy-shev#2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    andy-shev#3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    andy-shev#4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    andy-shev#5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    andy-shev#6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    andy-shev#7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    andy-shev#8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    andy-shev#9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    andy-shev#10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    andy-shev#11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    andy-shev#12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    andy-shev#13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    andy-shev#14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    andy-shev#15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    andy-shev#16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    andy-shev#17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    andy-shev#18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    andy-shev#19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    andy-shev#20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    andy-shev#21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    andy-shev#22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    andy-shev#23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    andy-shev#24 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9
    andy-shev#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    andy-shev#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    andy-shev#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    andy-shev#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    andy-shev#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    andy-shev#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    andy-shev#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    andy-shev#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    andy-shev#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    andy-shev#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    andy-shev#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    andy-shev#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    andy-shev#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    andy-shev#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    andy-shev#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    andy-shev#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    andy-shev#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    andy-shev#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    andy-shev#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    andy-shev#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    andy-shev#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    andy-shev#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    andy-shev#23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10
    andy-shev#1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13
    andy-shev#2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18
    andy-shev#3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    andy-shev#4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    andy-shev#5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    andy-shev#6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    andy-shev#7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    andy-shev#8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    andy-shev#9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    andy-shev#10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    andy-shev#11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    andy-shev#12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    andy-shev#13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    andy-shev#14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    andy-shev#15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    andy-shev#16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    andy-shev#17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    andy-shev#18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    andy-shev#19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    andy-shev#20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    andy-shev#21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    andy-shev#22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    andy-shev#23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    andy-shev#24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    andy-shev#25 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3
    andy-shev#1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2
    andy-shev#2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9
    andy-shev#3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6
    andy-shev#4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    andy-shev#5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    andy-shev#6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    andy-shev#7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    andy-shev#8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    andy-shev#9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    andy-shev#10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    andy-shev#11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    andy-shev#12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    andy-shev#13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    andy-shev#14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    andy-shev#15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    andy-shev#16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    andy-shev#17 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events'
    #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445

SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: clang-built-linux@googlegroups.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandeep Dasgupta <sdasgup@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
htot pushed a commit to htot/linux that referenced this pull request Feb 5, 2021
Zygo reported the following KASAN splat:

  BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
  Read of size 8 at addr ffff888112402950 by task btrfs/28836

  CPU: 0 PID: 28836 Comm: btrfs Tainted: G        W         5.10.0-e35f27394290-for-next+ andy-shev#23
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  Call Trace:
   dump_stack+0xbc/0xf9
   ? btrfs_backref_cleanup_node+0x18a/0x420
   print_address_description.constprop.8+0x21/0x210
   ? record_print_text.cold.34+0x11/0x11
   ? btrfs_backref_cleanup_node+0x18a/0x420
   ? btrfs_backref_cleanup_node+0x18a/0x420
   kasan_report.cold.10+0x20/0x37
   ? btrfs_backref_cleanup_node+0x18a/0x420
   __asan_load8+0x69/0x90
   btrfs_backref_cleanup_node+0x18a/0x420
   btrfs_backref_release_cache+0x83/0x1b0
   relocate_block_group+0x394/0x780
   ? merge_reloc_roots+0x4a0/0x4a0
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   ? check_flags.part.50+0x6c/0x1e0
   ? btrfs_relocate_chunk+0x120/0x120
   ? kmem_cache_alloc_trace+0xa06/0xcb0
   ? _copy_from_user+0x83/0xc0
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   ? __kasan_check_read+0x11/0x20
   ? check_chain_key+0x1f4/0x2f0
   ? __asan_loadN+0xf/0x20
   ? btrfs_ioctl_get_supported_features+0x30/0x30
   ? kvm_sched_clock_read+0x18/0x30
   ? check_chain_key+0x1f4/0x2f0
   ? lock_downgrade+0x3f0/0x3f0
   ? handle_mm_fault+0xad6/0x2150
   ? do_vfs_ioctl+0xfc/0x9d0
   ? ioctl_file_clone+0xe0/0xe0
   ? check_flags.part.50+0x6c/0x1e0
   ? check_flags.part.50+0x6c/0x1e0
   ? check_flags+0x26/0x30
   ? lock_is_held_type+0xc3/0xf0
   ? syscall_enter_from_user_mode+0x1b/0x60
   ? do_syscall_64+0x13/0x80
   ? rcu_read_lock_sched_held+0xa1/0xd0
   ? __kasan_check_read+0x11/0x20
   ? __fget_light+0xae/0x110
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f4c4bdfe427

  Allocated by task 28836:
   kasan_save_stack+0x21/0x50
   __kasan_kmalloc.constprop.18+0xbe/0xd0
   kasan_kmalloc+0x9/0x10
   kmem_cache_alloc_trace+0x410/0xcb0
   btrfs_backref_alloc_node+0x46/0xf0
   btrfs_backref_add_tree_node+0x60d/0x11d0
   build_backref_tree+0xc5/0x700
   relocate_tree_blocks+0x2be/0xb90
   relocate_block_group+0x2eb/0x780
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Freed by task 28836:
   kasan_save_stack+0x21/0x50
   kasan_set_track+0x20/0x30
   kasan_set_free_info+0x1f/0x30
   __kasan_slab_free+0xf3/0x140
   kasan_slab_free+0xe/0x10
   kfree+0xde/0x200
   btrfs_backref_error_cleanup+0x452/0x530
   build_backref_tree+0x1a5/0x700
   relocate_tree_blocks+0x2be/0xb90
   relocate_block_group+0x2eb/0x780
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This occurred because we freed our backref node in
btrfs_backref_error_cleanup(), but then tried to free it again in
btrfs_backref_release_cache().  This is because
btrfs_backref_release_cache() will cycle through all of the
cache->leaves nodes and free them up.  However
btrfs_backref_error_cleanup() freed the backref node with
btrfs_backref_free_node(), which simply kfree()d the backref node
without unlinking it from the cache.  Change this to a
btrfs_backref_drop_node(), which does the appropriate cleanup and
removes the node from the cache->leaves list, so when we go to free the
remaining cache we don't trip over items we've already dropped.

Fixes: 75bfb9a ("Btrfs: cleanup error handling in build_backref_tree")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
htot pushed a commit to edison-fw/linux that referenced this pull request Mar 13, 2021
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.

Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.

+ bigeasy later changed to schedule_hrtimeout()
|If a task calls cpu_chill() and gets woken up by a regular or spurious
|wakeup and has a signal pending, then it exits the sleep loop in
|do_nanosleep() and sets up the restart block. If restart->nanosleep.type is
|not TI_NONE then this results in accessing a stale user pointer from a
|previously interrupted syscall and a copy to user based on the stale
|pointer or a BUG() when 'type' is not supported in nanosleep_copyout().

+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 andy-shev#23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
|  #0:  (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 andy-shev#23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
htot pushed a commit to edison-fw/linux that referenced this pull request Mar 13, 2021
[ Upstream commit cf9bf87 ]

According to Errata andy-shev#23 "The per-CPU GbE interrupt is limited to Core
0", we can't use the per-cpu interrupt mechanism on the Armada 3700
familly.

This is correctly checked for RSS configuration, but the initial queue
mapping is still done by having the queues spread across all the CPUs in
the system, both in the init path and in the cpu_hotplug path.

Fixes: 2636ac3 ("net: mvneta: Add network support for Armada 3700 SoC")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
htot pushed a commit to edison-fw/linux that referenced this pull request Mar 13, 2021
commit eddda68 upstream.

A weird KASAN problem that Zygo reported could have been easily caught
if we checked for basic things in our backref freeing code.  We have two
methods of freeing a backref node

- btrfs_backref_free_node: this just is kfree() essentially.
- btrfs_backref_drop_node: this actually unlinks the node and cleans up
  everything and then calls btrfs_backref_free_node().

We should mostly be using btrfs_backref_drop_node(), to make sure the
node is properly unlinked from the backref cache, and only use
btrfs_backref_free_node() when we know the node isn't actually linked to
the backref cache.  We made a mistake here and thus got the KASAN splat.

Make this style of issue easier to find by adding some ASSERT()'s to
btrfs_backref_free_node() and adjusting our deletion stuff to properly
init the list so we can rely on list_empty() checks working properly.

  BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
  Read of size 8 at addr ffff888112402950 by task btrfs/28836

  CPU: 0 PID: 28836 Comm: btrfs Tainted: G        W         5.10.0-e35f27394290-for-next+ andy-shev#23
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  Call Trace:
   dump_stack+0xbc/0xf9
   ? btrfs_backref_cleanup_node+0x18a/0x420
   print_address_description.constprop.8+0x21/0x210
   ? record_print_text.cold.34+0x11/0x11
   ? btrfs_backref_cleanup_node+0x18a/0x420
   ? btrfs_backref_cleanup_node+0x18a/0x420
   kasan_report.cold.10+0x20/0x37
   ? btrfs_backref_cleanup_node+0x18a/0x420
   __asan_load8+0x69/0x90
   btrfs_backref_cleanup_node+0x18a/0x420
   btrfs_backref_release_cache+0x83/0x1b0
   relocate_block_group+0x394/0x780
   ? merge_reloc_roots+0x4a0/0x4a0
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   ? check_flags.part.50+0x6c/0x1e0
   ? btrfs_relocate_chunk+0x120/0x120
   ? kmem_cache_alloc_trace+0xa06/0xcb0
   ? _copy_from_user+0x83/0xc0
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   ? __kasan_check_read+0x11/0x20
   ? check_chain_key+0x1f4/0x2f0
   ? __asan_loadN+0xf/0x20
   ? btrfs_ioctl_get_supported_features+0x30/0x30
   ? kvm_sched_clock_read+0x18/0x30
   ? check_chain_key+0x1f4/0x2f0
   ? lock_downgrade+0x3f0/0x3f0
   ? handle_mm_fault+0xad6/0x2150
   ? do_vfs_ioctl+0xfc/0x9d0
   ? ioctl_file_clone+0xe0/0xe0
   ? check_flags.part.50+0x6c/0x1e0
   ? check_flags.part.50+0x6c/0x1e0
   ? check_flags+0x26/0x30
   ? lock_is_held_type+0xc3/0xf0
   ? syscall_enter_from_user_mode+0x1b/0x60
   ? do_syscall_64+0x13/0x80
   ? rcu_read_lock_sched_held+0xa1/0xd0
   ? __kasan_check_read+0x11/0x20
   ? __fget_light+0xae/0x110
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f4c4bdfe427
  RSP: 002b:00007fff33ee6df8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007fff33ee6e98 RCX: 00007f4c4bdfe427
  RDX: 00007fff33ee6e98 RSI: 00000000c4009420 RDI: 0000000000000003
  RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
  R10: fffffffffffff59d R11: 0000000000000202 R12: 0000000000000001
  R13: 0000000000000000 R14: 00007fff33ee8a34 R15: 0000000000000001

  Allocated by task 28836:
   kasan_save_stack+0x21/0x50
   __kasan_kmalloc.constprop.18+0xbe/0xd0
   kasan_kmalloc+0x9/0x10
   kmem_cache_alloc_trace+0x410/0xcb0
   btrfs_backref_alloc_node+0x46/0xf0
   btrfs_backref_add_tree_node+0x60d/0x11d0
   build_backref_tree+0xc5/0x700
   relocate_tree_blocks+0x2be/0xb90
   relocate_block_group+0x2eb/0x780
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Freed by task 28836:
   kasan_save_stack+0x21/0x50
   kasan_set_track+0x20/0x30
   kasan_set_free_info+0x1f/0x30
   __kasan_slab_free+0xf3/0x140
   kasan_slab_free+0xe/0x10
   kfree+0xde/0x200
   btrfs_backref_error_cleanup+0x452/0x530
   build_backref_tree+0x1a5/0x700
   relocate_tree_blocks+0x2be/0xb90
   relocate_block_group+0x2eb/0x780
   btrfs_relocate_block_group+0x26e/0x4c0
   btrfs_relocate_chunk+0x52/0x120
   btrfs_balance+0xe2e/0x1900
   btrfs_ioctl_balance+0x3a7/0x460
   btrfs_ioctl+0x24c8/0x4360
   __x64_sys_ioctl+0xc3/0x100
   do_syscall_64+0x37/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  The buggy address belongs to the object at ffff888112402900
   which belongs to the cache kmalloc-128 of size 128
  The buggy address is located 80 bytes inside of
   128-byte region [ffff888112402900, ffff888112402980)
  The buggy address belongs to the page:
  page:0000000028b1cd08 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888131c810c0 pfn:0x112402
  flags: 0x17ffe0000000200(slab)
  raw: 017ffe0000000200 ffffea000424f308 ffffea0007d572c8 ffff888100040440
  raw: ffff888131c810c0 ffff888112402000 0000000100000009 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff888112402800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff888112402880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  >ffff888112402900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                   ^
   ffff888112402980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   ffff888112402a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Link: https://lore.kernel.org/linux-btrfs/20201208194607.GI31381@hungrycats.org/
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
andy-shev pushed a commit that referenced this pull request Jul 15, 2021
On Fri, Jul 09, 2021 at 07:10:14PM +0300, Andy Shevchenko wrote:
> On Fri, Jul 9, 2021 at 5:40 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> > Parallel (4 bits)  panel display stopped working.
>
> This part appears to be a configuration issue. So, we have only one
> left, i.e. oops on remove.

Could you please test, if this little change fixes the oops ?

-- >8 --

Fix this oops:
[  218.825445] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  218.832965] BUG: unable to handle page fault for address: ffff8f8f06559dc0
[  218.839863] #PF: supervisor instruction fetch in kernel mode
[  218.845540] #PF: error_code(0x0011) - permissions violation
[  218.851132] PGD e601067 P4D e601067 PUD e602067 PMD 645a063 PTE
8000000006559063
[  218.858587] Oops: 0011 [#1] SMP PTI
[  218.862099] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G         C
     5.13.0+ #23
[  218.869870] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
BIOS 542 2015.01.21:18.19.48
[  218.878681] Workqueue: kacpi_hotplug acpi_device_del_work_fn
[  218.884380] RIP: 0010:0xffff8f8f06559dc0
[  218.888328] Code: ff ff d2 2b 21 8c ff ff ff ff 08 00 00 00 00 00
00 00 78 34 fa 02 8f 8f ff ff 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 <72> 65 67 75 6c 61 74 6f 72 3a 72 65
67 75 6c 61 74 6f 72 2e 30 2d
[  218.907138] RSP: 0000:ffffad36c0043c90 EFLAGS: 00010246
[  218.912387] RAX: ffff8f8f06559dc0 RBX: ffff8f8f062cbc00 RCX: ffff8f8f01239fc8
[  218.919542] RDX: 000000002a3cccf8 RSI: 0000000000000001 RDI: ffff8f8f06559480
[  218.926701] RBP: ffffffffc037e279 R08: 00000000d5832520 R09: 0000000000000001
[  218.933856] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8f062cbc00
[  218.941010] R13: ffffffffc038e028 R14: ffffffff8c5e0b60 R15: 00000000fffffffd
[  218.948166] FS:  0000000000000000(0000) GS:ffff8f8f3e200000(0000)
knlGS:0000000000000000
[  218.956286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  218.962053] CR2: ffff8f8f06559dc0 CR3: 0000000002ef4000 CR4: 00000000001006f0
[  218.969208] Call Trace:
[  218.971678]  ? hd44780_common_clear_display+0x17/0x30 [hd44780_common]
[  218.978252]  ? charlcd_write_char+0x21a/0x810 [charlcd]
[  218.983519]  ? charlcd_puts+0x30/0x60 [charlcd]
[  218.988083]  ? charlcd_unregister+0x24/0x70 [charlcd]
[  218.993167]  ? hd44780_remove+0x1e/0x30 [hd44780]
[  218.997901]  ? platform_remove+0x1f/0x40

Reported-By: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
andy-shev pushed a commit that referenced this pull request Jul 19, 2021
On Fri, Jul 09, 2021 at 07:10:14PM +0300, Andy Shevchenko wrote:
> On Fri, Jul 9, 2021 at 5:40 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> > Parallel (4 bits)  panel display stopped working.
>
> This part appears to be a configuration issue. So, we have only one
> left, i.e. oops on remove.

Could you please test, if this little change fixes the oops ?

-- >8 --

Fix this oops:
[  218.825445] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  218.832965] BUG: unable to handle page fault for address: ffff8f8f06559dc0
[  218.839863] #PF: supervisor instruction fetch in kernel mode
[  218.845540] #PF: error_code(0x0011) - permissions violation
[  218.851132] PGD e601067 P4D e601067 PUD e602067 PMD 645a063 PTE
8000000006559063
[  218.858587] Oops: 0011 [#1] SMP PTI
[  218.862099] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G         C
     5.13.0+ #23
[  218.869870] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
BIOS 542 2015.01.21:18.19.48
[  218.878681] Workqueue: kacpi_hotplug acpi_device_del_work_fn
[  218.884380] RIP: 0010:0xffff8f8f06559dc0
[  218.888328] Code: ff ff d2 2b 21 8c ff ff ff ff 08 00 00 00 00 00
00 00 78 34 fa 02 8f 8f ff ff 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 <72> 65 67 75 6c 61 74 6f 72 3a 72 65
67 75 6c 61 74 6f 72 2e 30 2d
[  218.907138] RSP: 0000:ffffad36c0043c90 EFLAGS: 00010246
[  218.912387] RAX: ffff8f8f06559dc0 RBX: ffff8f8f062cbc00 RCX: ffff8f8f01239fc8
[  218.919542] RDX: 000000002a3cccf8 RSI: 0000000000000001 RDI: ffff8f8f06559480
[  218.926701] RBP: ffffffffc037e279 R08: 00000000d5832520 R09: 0000000000000001
[  218.933856] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8f062cbc00
[  218.941010] R13: ffffffffc038e028 R14: ffffffff8c5e0b60 R15: 00000000fffffffd
[  218.948166] FS:  0000000000000000(0000) GS:ffff8f8f3e200000(0000)
knlGS:0000000000000000
[  218.956286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  218.962053] CR2: ffff8f8f06559dc0 CR3: 0000000002ef4000 CR4: 00000000001006f0
[  218.969208] Call Trace:
[  218.971678]  ? hd44780_common_clear_display+0x17/0x30 [hd44780_common]
[  218.978252]  ? charlcd_write_char+0x21a/0x810 [charlcd]
[  218.983519]  ? charlcd_puts+0x30/0x60 [charlcd]
[  218.988083]  ? charlcd_unregister+0x24/0x70 [charlcd]
[  218.993167]  ? hd44780_remove+0x1e/0x30 [hd44780]
[  218.997901]  ? platform_remove+0x1f/0x40

Reported-By: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
andy-shev pushed a commit that referenced this pull request Aug 4, 2021
On Fri, Jul 09, 2021 at 07:10:14PM +0300, Andy Shevchenko wrote:
> On Fri, Jul 9, 2021 at 5:40 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> > Parallel (4 bits)  panel display stopped working.
>
> This part appears to be a configuration issue. So, we have only one
> left, i.e. oops on remove.

Could you please test, if this little change fixes the oops ?

-- >8 --

Fix this oops:
[  218.825445] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  218.832965] BUG: unable to handle page fault for address: ffff8f8f06559dc0
[  218.839863] #PF: supervisor instruction fetch in kernel mode
[  218.845540] #PF: error_code(0x0011) - permissions violation
[  218.851132] PGD e601067 P4D e601067 PUD e602067 PMD 645a063 PTE
8000000006559063
[  218.858587] Oops: 0011 [#1] SMP PTI
[  218.862099] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G         C
     5.13.0+ #23
[  218.869870] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
BIOS 542 2015.01.21:18.19.48
[  218.878681] Workqueue: kacpi_hotplug acpi_device_del_work_fn
[  218.884380] RIP: 0010:0xffff8f8f06559dc0
[  218.888328] Code: ff ff d2 2b 21 8c ff ff ff ff 08 00 00 00 00 00
00 00 78 34 fa 02 8f 8f ff ff 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 <72> 65 67 75 6c 61 74 6f 72 3a 72 65
67 75 6c 61 74 6f 72 2e 30 2d
[  218.907138] RSP: 0000:ffffad36c0043c90 EFLAGS: 00010246
[  218.912387] RAX: ffff8f8f06559dc0 RBX: ffff8f8f062cbc00 RCX: ffff8f8f01239fc8
[  218.919542] RDX: 000000002a3cccf8 RSI: 0000000000000001 RDI: ffff8f8f06559480
[  218.926701] RBP: ffffffffc037e279 R08: 00000000d5832520 R09: 0000000000000001
[  218.933856] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8f062cbc00
[  218.941010] R13: ffffffffc038e028 R14: ffffffff8c5e0b60 R15: 00000000fffffffd
[  218.948166] FS:  0000000000000000(0000) GS:ffff8f8f3e200000(0000)
knlGS:0000000000000000
[  218.956286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  218.962053] CR2: ffff8f8f06559dc0 CR3: 0000000002ef4000 CR4: 00000000001006f0
[  218.969208] Call Trace:
[  218.971678]  ? hd44780_common_clear_display+0x17/0x30 [hd44780_common]
[  218.978252]  ? charlcd_write_char+0x21a/0x810 [charlcd]
[  218.983519]  ? charlcd_puts+0x30/0x60 [charlcd]
[  218.988083]  ? charlcd_unregister+0x24/0x70 [charlcd]
[  218.993167]  ? hd44780_remove+0x1e/0x30 [hd44780]
[  218.997901]  ? platform_remove+0x1f/0x40

Reported-By: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
andy-shev pushed a commit that referenced this pull request Aug 13, 2021
On Fri, Jul 09, 2021 at 07:10:14PM +0300, Andy Shevchenko wrote:
> On Fri, Jul 9, 2021 at 5:40 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> > Parallel (4 bits)  panel display stopped working.
>
> This part appears to be a configuration issue. So, we have only one
> left, i.e. oops on remove.

Could you please test, if this little change fixes the oops ?

-- >8 --

Fix this oops:
[  218.825445] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  218.832965] BUG: unable to handle page fault for address: ffff8f8f06559dc0
[  218.839863] #PF: supervisor instruction fetch in kernel mode
[  218.845540] #PF: error_code(0x0011) - permissions violation
[  218.851132] PGD e601067 P4D e601067 PUD e602067 PMD 645a063 PTE
8000000006559063
[  218.858587] Oops: 0011 [#1] SMP PTI
[  218.862099] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G         C
     5.13.0+ #23
[  218.869870] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
BIOS 542 2015.01.21:18.19.48
[  218.878681] Workqueue: kacpi_hotplug acpi_device_del_work_fn
[  218.884380] RIP: 0010:0xffff8f8f06559dc0
[  218.888328] Code: ff ff d2 2b 21 8c ff ff ff ff 08 00 00 00 00 00
00 00 78 34 fa 02 8f 8f ff ff 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 <72> 65 67 75 6c 61 74 6f 72 3a 72 65
67 75 6c 61 74 6f 72 2e 30 2d
[  218.907138] RSP: 0000:ffffad36c0043c90 EFLAGS: 00010246
[  218.912387] RAX: ffff8f8f06559dc0 RBX: ffff8f8f062cbc00 RCX: ffff8f8f01239fc8
[  218.919542] RDX: 000000002a3cccf8 RSI: 0000000000000001 RDI: ffff8f8f06559480
[  218.926701] RBP: ffffffffc037e279 R08: 00000000d5832520 R09: 0000000000000001
[  218.933856] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8f062cbc00
[  218.941010] R13: ffffffffc038e028 R14: ffffffff8c5e0b60 R15: 00000000fffffffd
[  218.948166] FS:  0000000000000000(0000) GS:ffff8f8f3e200000(0000)
knlGS:0000000000000000
[  218.956286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  218.962053] CR2: ffff8f8f06559dc0 CR3: 0000000002ef4000 CR4: 00000000001006f0
[  218.969208] Call Trace:
[  218.971678]  ? hd44780_common_clear_display+0x17/0x30 [hd44780_common]
[  218.978252]  ? charlcd_write_char+0x21a/0x810 [charlcd]
[  218.983519]  ? charlcd_puts+0x30/0x60 [charlcd]
[  218.988083]  ? charlcd_unregister+0x24/0x70 [charlcd]
[  218.993167]  ? hd44780_remove+0x1e/0x30 [hd44780]
[  218.997901]  ? platform_remove+0x1f/0x40

Reported-By: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
andy-shev pushed a commit that referenced this pull request Sep 9, 2021
On Fri, Jul 09, 2021 at 07:10:14PM +0300, Andy Shevchenko wrote:
> On Fri, Jul 9, 2021 at 5:40 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> > Parallel (4 bits)  panel display stopped working.
>
> This part appears to be a configuration issue. So, we have only one
> left, i.e. oops on remove.

Could you please test, if this little change fixes the oops ?

-- >8 --

Fix this oops:
[  218.825445] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  218.832965] BUG: unable to handle page fault for address: ffff8f8f06559dc0
[  218.839863] #PF: supervisor instruction fetch in kernel mode
[  218.845540] #PF: error_code(0x0011) - permissions violation
[  218.851132] PGD e601067 P4D e601067 PUD e602067 PMD 645a063 PTE
8000000006559063
[  218.858587] Oops: 0011 [#1] SMP PTI
[  218.862099] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G         C
     5.13.0+ #23
[  218.869870] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
BIOS 542 2015.01.21:18.19.48
[  218.878681] Workqueue: kacpi_hotplug acpi_device_del_work_fn
[  218.884380] RIP: 0010:0xffff8f8f06559dc0
[  218.888328] Code: ff ff d2 2b 21 8c ff ff ff ff 08 00 00 00 00 00
00 00 78 34 fa 02 8f 8f ff ff 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 <72> 65 67 75 6c 61 74 6f 72 3a 72 65
67 75 6c 61 74 6f 72 2e 30 2d
[  218.907138] RSP: 0000:ffffad36c0043c90 EFLAGS: 00010246
[  218.912387] RAX: ffff8f8f06559dc0 RBX: ffff8f8f062cbc00 RCX: ffff8f8f01239fc8
[  218.919542] RDX: 000000002a3cccf8 RSI: 0000000000000001 RDI: ffff8f8f06559480
[  218.926701] RBP: ffffffffc037e279 R08: 00000000d5832520 R09: 0000000000000001
[  218.933856] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8f062cbc00
[  218.941010] R13: ffffffffc038e028 R14: ffffffff8c5e0b60 R15: 00000000fffffffd
[  218.948166] FS:  0000000000000000(0000) GS:ffff8f8f3e200000(0000)
knlGS:0000000000000000
[  218.956286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  218.962053] CR2: ffff8f8f06559dc0 CR3: 0000000002ef4000 CR4: 00000000001006f0
[  218.969208] Call Trace:
[  218.971678]  ? hd44780_common_clear_display+0x17/0x30 [hd44780_common]
[  218.978252]  ? charlcd_write_char+0x21a/0x810 [charlcd]
[  218.983519]  ? charlcd_puts+0x30/0x60 [charlcd]
[  218.988083]  ? charlcd_unregister+0x24/0x70 [charlcd]
[  218.993167]  ? hd44780_remove+0x1e/0x30 [hd44780]
[  218.997901]  ? platform_remove+0x1f/0x40

Reported-By: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
andy-shev pushed a commit that referenced this pull request Sep 10, 2021
On Fri, Jul 09, 2021 at 07:10:14PM +0300, Andy Shevchenko wrote:
> On Fri, Jul 9, 2021 at 5:40 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> > Parallel (4 bits)  panel display stopped working.
>
> This part appears to be a configuration issue. So, we have only one
> left, i.e. oops on remove.

Could you please test, if this little change fixes the oops ?

-- >8 --

Fix this oops:
[  218.825445] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  218.832965] BUG: unable to handle page fault for address: ffff8f8f06559dc0
[  218.839863] #PF: supervisor instruction fetch in kernel mode
[  218.845540] #PF: error_code(0x0011) - permissions violation
[  218.851132] PGD e601067 P4D e601067 PUD e602067 PMD 645a063 PTE
8000000006559063
[  218.858587] Oops: 0011 [#1] SMP PTI
[  218.862099] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G         C
     5.13.0+ #23
[  218.869870] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
BIOS 542 2015.01.21:18.19.48
[  218.878681] Workqueue: kacpi_hotplug acpi_device_del_work_fn
[  218.884380] RIP: 0010:0xffff8f8f06559dc0
[  218.888328] Code: ff ff d2 2b 21 8c ff ff ff ff 08 00 00 00 00 00
00 00 78 34 fa 02 8f 8f ff ff 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 <72> 65 67 75 6c 61 74 6f 72 3a 72 65
67 75 6c 61 74 6f 72 2e 30 2d
[  218.907138] RSP: 0000:ffffad36c0043c90 EFLAGS: 00010246
[  218.912387] RAX: ffff8f8f06559dc0 RBX: ffff8f8f062cbc00 RCX: ffff8f8f01239fc8
[  218.919542] RDX: 000000002a3cccf8 RSI: 0000000000000001 RDI: ffff8f8f06559480
[  218.926701] RBP: ffffffffc037e279 R08: 00000000d5832520 R09: 0000000000000001
[  218.933856] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f8f062cbc00
[  218.941010] R13: ffffffffc038e028 R14: ffffffff8c5e0b60 R15: 00000000fffffffd
[  218.948166] FS:  0000000000000000(0000) GS:ffff8f8f3e200000(0000)
knlGS:0000000000000000
[  218.956286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  218.962053] CR2: ffff8f8f06559dc0 CR3: 0000000002ef4000 CR4: 00000000001006f0
[  218.969208] Call Trace:
[  218.971678]  ? hd44780_common_clear_display+0x17/0x30 [hd44780_common]
[  218.978252]  ? charlcd_write_char+0x21a/0x810 [charlcd]
[  218.983519]  ? charlcd_puts+0x30/0x60 [charlcd]
[  218.988083]  ? charlcd_unregister+0x24/0x70 [charlcd]
[  218.993167]  ? hd44780_remove+0x1e/0x30 [hd44780]
[  218.997901]  ? platform_remove+0x1f/0x40

Reported-By: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
andy-shev pushed a commit that referenced this pull request Jan 6, 2024
 WARNING: CPU: 3 PID: 2840 at arch/x86/kvm/vmx.c:10966 nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
 CPU: 3 PID: 2840 Comm: qemu-system-x86 Tainted: G           OE   4.12.0-rc3+ #23
 RIP: 0010:nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
 Call Trace:
  ? kvm_check_async_pf_completion+0xef/0x120 [kvm]
  ? rcu_read_lock_sched_held+0x79/0x80
  vmx_queue_exception+0x104/0x160 [kvm_intel]
  ? vmx_queue_exception+0x104/0x160 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0x1171/0x1ce0 [kvm]
  ? kvm_arch_vcpu_load+0x47/0x240 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x240 [kvm]
  kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? __fget+0xf3/0x210
  do_vfs_ioctl+0xa4/0x700
  ? __fget+0x114/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x81/0x220
  entry_SYSCALL64_slow_path+0x25/0x25

This is triggered occasionally by running both win7 and win2016 in L2, in
addition, EPT is disabled on both L1 and L2. It can't be reproduced easily.

Commit 0b6ac34 (KVM: nVMX: Correct handling of exception injection) mentioned
that "KVM wants to inject page-faults which it got to the guest. This function
assumes it is called with the exit reason in vmcs02 being a #PF exception".
Commit e011c66 (KVM: nVMX: Check all exceptions for intercept during delivery to
L2) allows to check all exceptions for intercept during delivery to L2. However,
there is no guarantee the exit reason is exception currently, when there is an
external interrupt occurred on host, maybe a time interrupt for host which should
not be injected to guest, and somewhere queues an exception, then the function
nested_vmx_check_exception() will be called and the vmexit emulation codes will
try to emulate the "Acknowledge interrupt on exit" behavior, the warning is
triggered.

Reusing the exit reason from the L2->L0 vmexit is wrong in this case,
the reason must always be EXCEPTION_NMI when injecting an exception into
L1 as a nested vmexit.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Fixes: e011c66 ("KVM: nVMX: Check all exceptions for intercept during delivery to L2")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants