New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vc4: 7" LCD touchscreen not supported #8

Closed
anholt opened this Issue Feb 15, 2016 · 34 comments

Comments

Projects
None yet
@anholt
Owner

anholt commented Feb 15, 2016

The DSI1 connector isn't enabled yet. drm-vc4-dsi-boot is the WIP for it, but currently writes to the device don't appear to take effect.

anholt pushed a commit that referenced this issue Feb 27, 2016

Adrian Hunter Arnaldo Carvalho de Melo
perf tools: tracepoint_error() can receive e=NULL, robustify it
Fixes segmentation fault using, for instance:

  (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".

 Program received signal SIGSEGV, Segmentation fault.
  0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  (gdb) bt
  #0  0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  #1  0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:433
  #2  0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:498
  #3  0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:936
  #4  0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391
  #5  0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361
  #6  0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401
  #7  0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253
  #8  0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364
  #9  0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664
  #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539
  #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264
  #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390
  #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451
  #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495
  #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618
(gdb)

Intel PT attempts to find the sched:sched_switch tracepoint but that seg
faults if tracefs is not readable, because the error reporting structure
is null, as errors are not reported when automatically adding
tracepoints.  Fix by checking before using.

Committer note:

This doesn't take place in a kernel that supports
perf_event_attr.context_switch, that is the default way that will be
used for tracking context switches, only in older kernels, like 4.2, in
a machine with Intel PT (e.g. Broadwell) for non-priviledged users.

Further info from a similar patch by Wang:

The error is in tracepoint_error: it assumes the 'e' parameter is valid.

However, there are many situation a parse_event() can be called without
parse_events_error. See result of

  $ grep 'parse_events(.*NULL)' ./tools/perf/ -r'

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tong Zhang <ztong@vt.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output")
Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
@destans

This comment has been minimized.

Show comment
Hide comment

destans commented Mar 2, 2016

+1

@kika123

This comment has been minimized.

Show comment
Hide comment
@kika123

kika123 Mar 9, 2016

Some registers are locked down by the firmware, like the ones used for OTP access

kika123 commented Mar 9, 2016

Some registers are locked down by the firmware, like the ones used for OTP access

@nnexai

This comment has been minimized.

Show comment
Hide comment

nnexai commented Mar 15, 2016

+1

@LittleKita

This comment has been minimized.

Show comment
Hide comment

+1

@Cardhu

This comment has been minimized.

Show comment
Hide comment

Cardhu commented Mar 22, 2016

+1

@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Apr 1, 2016

Owner

Progress on this: drm-vc4-dsi-boot configures most of the of the vc4 hardware for the panel, and console and X work.

It doesn't successfully initialize if the panel wasn't enabled at boot (tested using ignore_lcd=1 in config.txt) and doesn't control the Toshiba chip on the panel (so DPMS results in the panel flipping out due to pixels no longer showing up when they should).

Owner

anholt commented Apr 1, 2016

Progress on this: drm-vc4-dsi-boot configures most of the of the vc4 hardware for the panel, and console and X work.

It doesn't successfully initialize if the panel wasn't enabled at boot (tested using ignore_lcd=1 in config.txt) and doesn't control the Toshiba chip on the panel (so DPMS results in the panel flipping out due to pixels no longer showing up when they should).

@AndreeeCZ

This comment has been minimized.

Show comment
Hide comment

+1

@zarr

This comment has been minimized.

Show comment
Hide comment
@zarr

zarr Apr 3, 2016

Any guesstimates on this hitting upstream? I tried merging with rpi-4.6.y to test, but I probably failed as the result doesn't boot. It gets to a black screen with some random flashing characters every few seconds.

zarr commented Apr 3, 2016

Any guesstimates on this hitting upstream? I tried merging with rpi-4.6.y to test, but I probably failed as the result doesn't boot. It gets to a black screen with some random flashing characters every few seconds.

@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Apr 4, 2016

Owner

@zarr since the code isn't done, just merging upstream wouldn't work.

Owner

anholt commented Apr 4, 2016

@zarr since the code isn't done, just merging upstream wouldn't work.

@bird43

This comment has been minimized.

Show comment
Hide comment
@bird43

bird43 Apr 19, 2016

Any progress on this?

bird43 commented Apr 19, 2016

Any progress on this?

anholt pushed a commit that referenced this issue Apr 20, 2016

perf tools: tracepoint_error() can receive e=NULL, robustify it
commit ec183d2 upstream.

Fixes segmentation fault using, for instance:

  (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".

 Program received signal SIGSEGV, Segmentation fault.
  0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  (gdb) bt
  #0  0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  #1  0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:433
  #2  0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:498
  #3  0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:936
  #4  0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391
  #5  0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361
  #6  0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401
  #7  0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253
  #8  0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364
  #9  0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664
  #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539
  #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264
  #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390
  #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451
  #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495
  #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618
(gdb)

Intel PT attempts to find the sched:sched_switch tracepoint but that seg
faults if tracefs is not readable, because the error reporting structure
is null, as errors are not reported when automatically adding
tracepoints.  Fix by checking before using.

Committer note:

This doesn't take place in a kernel that supports
perf_event_attr.context_switch, that is the default way that will be
used for tracking context switches, only in older kernels, like 4.2, in
a machine with Intel PT (e.g. Broadwell) for non-priviledged users.

Further info from a similar patch by Wang:

The error is in tracepoint_error: it assumes the 'e' parameter is valid.

However, there are many situation a parse_event() can be called without
parse_events_error. See result of

  $ grep 'parse_events(.*NULL)' ./tools/perf/ -r'

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tong Zhang <ztong@vt.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output")
Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@greyltc

This comment has been minimized.

Show comment
Hide comment
@greyltc

greyltc May 16, 2016

Hi @anholt,
I have an official DSI display connected to my pi3 using the latest rasbian (with HDMI not connected)
Is it still the case that there's no way to use your VC4 driver here?
When I boot with this configuration, I get a blank screen.

Is there anything that I should add to my config.txt (besides what rasbi-config adds when I enable your driver via that)?

Would it help if I provided any logs or diagnostic output for you?

greyltc commented May 16, 2016

Hi @anholt,
I have an official DSI display connected to my pi3 using the latest rasbian (with HDMI not connected)
Is it still the case that there's no way to use your VC4 driver here?
When I boot with this configuration, I get a blank screen.

Is there anything that I should add to my config.txt (besides what rasbi-config adds when I enable your driver via that)?

Would it help if I provided any logs or diagnostic output for you?

@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt May 17, 2016

Owner

@greyltc This bug is that the DSI display is not supported yet, so there's no way to get it to work.

Owner

anholt commented May 17, 2016

@greyltc This bug is that the DSI display is not supported yet, so there's no way to get it to work.

@sirdel

This comment has been minimized.

Show comment
Hide comment

sirdel commented Jun 7, 2016

+1

@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Jun 14, 2016

Owner

New rpi-4.4.y-dsi branch is up porting the upstream-targeted DSI work to downstream. bcm2709_defconfig-only, and requires this in config.txt:

ignore_lcd=1
(Otherwise we lose our i2c0 pin setup frequently, and our transaction interrupts get stolen)

It may also need:

mask_gpu_interrupt1=0x1000
(more transaction interrupt theft prevention)

This branch does not work! The panel at best gets turned on but scans out white instead of pixel data. The transactions are timing out, and after doing them, even the i2c reads from the atmel start going from mostly-not-timing-out to always failing.

We should probably be using i2c-gpio on the pins to avoid clashing with the i2c0 that the firmware also uses. Even still, the write (not read!) transactions were working on my upstream branch, and are failing here, so something went wrong in the backport. Also the tc358762 reads from i2c at boot are bad looking, and I don't know how to do tc358762 writes over i2c.

I'm hoping @ghollingworth could take a look at this at some point while I'm on vacation and fill in how to do tc358762 writes over i2c.

Owner

anholt commented Jun 14, 2016

New rpi-4.4.y-dsi branch is up porting the upstream-targeted DSI work to downstream. bcm2709_defconfig-only, and requires this in config.txt:

ignore_lcd=1
(Otherwise we lose our i2c0 pin setup frequently, and our transaction interrupts get stolen)

It may also need:

mask_gpu_interrupt1=0x1000
(more transaction interrupt theft prevention)

This branch does not work! The panel at best gets turned on but scans out white instead of pixel data. The transactions are timing out, and after doing them, even the i2c reads from the atmel start going from mostly-not-timing-out to always failing.

We should probably be using i2c-gpio on the pins to avoid clashing with the i2c0 that the firmware also uses. Even still, the write (not read!) transactions were working on my upstream branch, and are failing here, so something went wrong in the backport. Also the tc358762 reads from i2c at boot are bad looking, and I don't know how to do tc358762 writes over i2c.

I'm hoping @ghollingworth could take a look at this at some point while I'm on vacation and fill in how to do tc358762 writes over i2c.

@greysAcademicCode

This comment has been minimized.

Show comment
Hide comment

Any progress on this? @ghollingworth ? @anholt ?

@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Jul 7, 2016

Owner

I just sent a pull request for cut down DSI support (no power management, requires firmware initialization). Working on touchscreen input now.

Owner

anholt commented Jul 7, 2016

I just sent a pull request for cut down DSI support (no power management, requires firmware initialization). Working on touchscreen input now.

@sandersr

This comment has been minimized.

Show comment
Hide comment
@sandersr

sandersr Aug 1, 2016

Thank you for your work on this!

sandersr commented Aug 1, 2016

Thank you for your work on this!

@sandersr

This comment has been minimized.

Show comment
Hide comment
@sandersr

sandersr Aug 5, 2016

Eric, do you have a rough idea where this could be implemented? I understand you've got plenty of other stuff on your plate and it's not a high priority.

sandersr commented Aug 5, 2016

Eric, do you have a rough idea where this could be implemented? I understand you've got plenty of other stuff on your plate and it's not a high priority.

@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Aug 10, 2016

Owner

The stub implementation is now merged downstream, so the LCD should now minimally work with the open driver. Keeping this open to track development and merge for upstream.

Owner

anholt commented Aug 10, 2016

The stub implementation is now merged downstream, so the LCD should now minimally work with the open driver. Keeping this open to track development and merge for upstream.

@sandersr

This comment has been minimized.

Show comment
Hide comment
@sandersr

sandersr Aug 11, 2016

Hi Eric,

That's fantastic news! I've upgraded the kernel and the display indeed lit up. Unfortunately I didn't have much luck with Xorg server. According to the log, it all starts fine, but nothing on the screen. Switching back to VT works fine.

Xorg Log:
http://dpaste.com/1FSCMZ2

Hi Eric,

That's fantastic news! I've upgraded the kernel and the display indeed lit up. Unfortunately I didn't have much luck with Xorg server. According to the log, it all starts fine, but nothing on the screen. Switching back to VT works fine.

Xorg Log:
http://dpaste.com/1FSCMZ2

@sandersr

This comment has been minimized.

Show comment
Hide comment
@sandersr

sandersr Aug 11, 2016

I also tried kwin_wayland --drm to see if I can get plasma5 DRM backend running via EGL, but it stopped with:

FATAL ERROR: Creating connecting to XServer failed: 5

dmesg gave me this:

[ 535.818190] disable

I also tried kwin_wayland --drm to see if I can get plasma5 DRM backend running via EGL, but it stopped with:

FATAL ERROR: Creating connecting to XServer failed: 5

dmesg gave me this:

[ 535.818190] disable

@sandersr

This comment has been minimized.

Show comment
Hide comment
@sandersr

sandersr Aug 11, 2016

Creating connection*

Creating connection*

@sandersr

This comment has been minimized.

Show comment
Hide comment
@sandersr

sandersr Aug 11, 2016

can this be down to:

[Thu Aug 11 16:58:24 2016] [drm] Initialized drm 1.1.0 20060810
[Thu Aug 11 16:58:24 2016] random: nonblocking pool is initialized
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
[Thu Aug 11 16:58:24 2016] usbcore: registered new interface driver brcmfmac
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f700000.dsi (ops vc4_dsi_ops [vc4])
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
[Thu Aug 11 16:58:24 2016] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[Thu Aug 11 16:58:24 2016] [drm] No driver support for vblank timestamp query.
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: No connectors reported connected with modes
[Thu Aug 11 16:58:24 2016] [drm] Cannot find any crtc or sizes - going 1024x768
[Thu Aug 11 16:58:24 2016] Console: switching to colour frame buffer device 128x48

[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: fb0: frame buffer device

can this be down to:

[Thu Aug 11 16:58:24 2016] [drm] Initialized drm 1.1.0 20060810
[Thu Aug 11 16:58:24 2016] random: nonblocking pool is initialized
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
[Thu Aug 11 16:58:24 2016] usbcore: registered new interface driver brcmfmac
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f700000.dsi (ops vc4_dsi_ops [vc4])
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
[Thu Aug 11 16:58:24 2016] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[Thu Aug 11 16:58:24 2016] [drm] No driver support for vblank timestamp query.
[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: No connectors reported connected with modes
[Thu Aug 11 16:58:24 2016] [drm] Cannot find any crtc or sizes - going 1024x768
[Thu Aug 11 16:58:24 2016] Console: switching to colour frame buffer device 128x48

[Thu Aug 11 16:58:24 2016] vc4-drm soc:gpu: fb0: frame buffer device

@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Aug 11, 2016

Owner

@sandersr It would be great to open your own bug report so that we don't interleave too many conversations (and I'll close this bug whenever I do the upstream support, anyway). Also, please attach full logs, not snippets or pastebin links.

The panel gets probed late due to being a module. That dmesg snippet makes sense for the panel not having been probed at that point. Then later the panel got probed, and it looks like that was done by the time X started. Does xrandr say DSI is on (asterisk next to the mode) and connected when it's displaying black? Is the HDMI showing the desktop correctly? When you switch to another VT and DSI comes on, does switching back to X go back to black?

Owner

anholt commented Aug 11, 2016

@sandersr It would be great to open your own bug report so that we don't interleave too many conversations (and I'll close this bug whenever I do the upstream support, anyway). Also, please attach full logs, not snippets or pastebin links.

The panel gets probed late due to being a module. That dmesg snippet makes sense for the panel not having been probed at that point. Then later the panel got probed, and it looks like that was done by the time X started. Does xrandr say DSI is on (asterisk next to the mode) and connected when it's displaying black? Is the HDMI showing the desktop correctly? When you switch to another VT and DSI comes on, does switching back to X go back to black?

@sandersr

This comment has been minimized.

Show comment
Hide comment
@sandersr

sandersr Aug 12, 2016

@anholt you're absolutely right, I should have opened a new bug. I'm sorry, I'll get it all sorted.

@anholt you're absolutely right, I should have opened a new bug. I'm sorry, I'll get it all sorted.

andreamerello pushed a commit to andreamerello/linux-mach-bcm that referenced this issue Sep 21, 2016

brcmfmac: Check rtnl_lock is locked when removing interface
Check rtnl_lock is locked in brcmf_p2p_ifp_removed() by passing
rtnl_locked flag. Actually the caller brcmf_del_if() checks whether
the rtnl_lock is locked, but doesn't pass it to brcmf_p2p_ifp_removed().

Without this fix, wpa_supplicant goes softlockup with rtnl_lock
holding (this means all other process using netlink are locked up too)

e.g.
[ 4495.876627] INFO: task wpa_supplicant:7307 blocked for more than 10 seconds.
[ 4495.876632]       Tainted: G        W       4.8.0-rc1+ #8
[ 4495.876635] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4495.876638] wpa_supplicant  D ffff974c647b39a0     0  7307      1 0x00000000
[ 4495.876644]  ffff974c647b39a0 0000000000000000 ffff974c00000000 ffff974c7dc59c58
[ 4495.876651]  ffff974c6b7417c0 ffff974c645017c0 ffff974c647b4000 ffffffff86f16c08
[ 4495.876657]  ffff974c645017c0 0000000000000246 00000000ffffffff ffff974c647b39b8
[ 4495.876664] Call Trace:
[ 4495.876671]  [<ffffffff868aeccc>] schedule+0x3c/0x90
[ 4495.876676]  [<ffffffff868af065>] schedule_preempt_disabled+0x15/0x20
[ 4495.876682]  [<ffffffff868b0996>] mutex_lock_nested+0x176/0x3b0
[ 4495.876686]  [<ffffffff867a2067>] ? rtnl_lock+0x17/0x20
[ 4495.876690]  [<ffffffff867a2067>] rtnl_lock+0x17/0x20
[ 4495.876720]  [<ffffffffc0ae9a5d>] brcmf_p2p_ifp_removed+0x4d/0x70 [brcmfmac]
[ 4495.876741]  [<ffffffffc0aebde6>] brcmf_remove_interface+0x196/0x1b0 [brcmfmac]
[ 4495.876760]  [<ffffffffc0ae9901>] brcmf_p2p_del_vif+0x111/0x220 [brcmfmac]
[ 4495.876777]  [<ffffffffc0adefab>] brcmf_cfg80211_del_iface+0x21b/0x270 [brcmfmac]
[ 4495.876820]  [<ffffffffc097b39e>] nl80211_del_interface+0xfe/0x3a0 [cfg80211]
[ 4495.876825]  [<ffffffff867ca335>] genl_family_rcv_msg+0x1b5/0x370
[ 4495.876832]  [<ffffffff860e5d8d>] ? trace_hardirqs_on+0xd/0x10
[ 4495.876836]  [<ffffffff867ca56d>] genl_rcv_msg+0x7d/0xb0
[ 4495.876839]  [<ffffffff867ca4f0>] ? genl_family_rcv_msg+0x370/0x370
[ 4495.876846]  [<ffffffff867c9a47>] netlink_rcv_skb+0x97/0xb0
[ 4495.876849]  [<ffffffff867ca168>] genl_rcv+0x28/0x40
[ 4495.876854]  [<ffffffff867c93c3>] netlink_unicast+0x1d3/0x2f0
[ 4495.876860]  [<ffffffff867c933b>] ? netlink_unicast+0x14b/0x2f0
[ 4495.876866]  [<ffffffff867c97cb>] netlink_sendmsg+0x2eb/0x3a0
[ 4495.876870]  [<ffffffff8676dad8>] sock_sendmsg+0x38/0x50
[ 4495.876874]  [<ffffffff8676e4df>] ___sys_sendmsg+0x27f/0x290
[ 4495.876882]  [<ffffffff8628b935>] ? mntput_no_expire+0x5/0x3f0
[ 4495.876888]  [<ffffffff8628b9be>] ? mntput_no_expire+0x8e/0x3f0
[ 4495.876894]  [<ffffffff8628b935>] ? mntput_no_expire+0x5/0x3f0
[ 4495.876899]  [<ffffffff8628bd44>] ? mntput+0x24/0x40
[ 4495.876904]  [<ffffffff86267830>] ? __fput+0x190/0x200
[ 4495.876909]  [<ffffffff8676f125>] __sys_sendmsg+0x45/0x80
[ 4495.876914]  [<ffffffff8676f172>] SyS_sendmsg+0x12/0x20
[ 4495.876918]  [<ffffffff868b5680>] entry_SYSCALL_64_fastpath+0x23/0xc1
[ 4495.876924]  [<ffffffff860e2b8f>] ? trace_hardirqs_off_caller+0x1f/0xc0

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

andreamerello pushed a commit to andreamerello/linux-mach-bcm that referenced this issue Sep 21, 2016

brcmfmac: Change vif_event_lock to spinlock
Change vif_event_lock to spinlock from mutex, since this lock is
used in wait_event_timeout() via vif_event_equals(). This caused
a warning report as below.

As far as I can see, this lock protects regions where updating
structure members, not function calls. Also, since those
regions are not called from interrupt handlers (of course, it
was a mutex), spin_lock is used instead of spin_lock_irqsave.

[  186.678550] ------------[ cut here ]------------
[  186.678556] WARNING: CPU: 2 PID: 7140 at /home/mhiramat/ksrc/linux/kernel/sched/core.c:7545 __might_sleep+0x7c/0x80
[  186.678560] do not call blocking ops when !TASK_RUNNING; state=2 set at [<ffffffff980d9090>] prepare_to_wait_event+0x60/0x100
[  186.678560] Modules linked in: brcmfmac xt_CHECKSUM rfcomm ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype br_netfilter xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_raw iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables bnep nls_iso8859_1 i2c_designware_platform i2c_designware_core snd_hda_codec_hdmi snd_hda_codec_realtek dcdbas snd_hda_codec_generic snd_hda_intel snd_hda_codec intel_rapl snd_hda_core x86_pkg_temp_thermal intel_powerclamp coretemp
[  186.678594]  snd_pcm crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 joydev glue_helper snd_hwdep lrw gf128mul uvcvideo ablk_helper snd_seq_midi cryptd snd_seq_midi_event snd_rawmidi videobuf2_vmalloc videobuf2_memops snd_seq input_leds videobuf2_v4l2 cfg80211 videobuf2_core snd_timer videodev serio_raw btusb snd_seq_device media btrtl rtsx_pci_ms snd mei_me memstick hid_multitouch mei soundcore brcmutil idma64 virt_dma intel_lpss_pci processor_thermal_device intel_soc_dts_iosf hci_uart btbcm btqca btintel bluetooth int3403_thermal dell_smo8800 intel_lpss_acpi intel_lpss int3402_thermal int340x_thermal_zone intel_hid mac_hid int3400_thermal shpchp sparse_keymap acpi_pad acpi_thermal_rel acpi_als kfifo_buf industrialio kvm_intel kvm irqbypass parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq
[  186.678631]  usbhid nouveau ttm i915 rtsx_pci_sdmmc mxm_wmi i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops psmouse drm ahci rtsx_pci nvme nvme_core libahci i2c_hid hid pinctrl_sunrisepoint video wmi pinctrl_intel fjes [last unloaded: brcmfmac]
[  186.678646] CPU: 2 PID: 7140 Comm: wpa_supplicant Not tainted 4.8.0-rc1+ #8
[  186.678647] Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 01.02.00 04/07/2016
[  186.678648]  0000000000000000 ffff9d8c64b5b900 ffffffff98442f23 ffff9d8c64b5b950
[  186.678651]  0000000000000000 ffff9d8c64b5b940 ffffffff9808b22b 00001d790000000d
[  186.678653]  ffffffff98c75e78 000000000000026c 0000000000000000 ffff9d8c2706d058
[  186.678655] Call Trace:
[  186.678659]  [<ffffffff98442f23>] dump_stack+0x85/0xc2
[  186.678666]  [<ffffffff9808b22b>] __warn+0xcb/0xf0
[  186.678668]  [<ffffffff9808b29f>] warn_slowpath_fmt+0x4f/0x60
[  186.678671]  [<ffffffff980d9090>] ? prepare_to_wait_event+0x60/0x100
[  186.678672]  [<ffffffff980d9090>] ? prepare_to_wait_event+0x60/0x100
[  186.678674]  [<ffffffff980b922c>] __might_sleep+0x7c/0x80
[  186.678680]  [<ffffffff988b0853>] mutex_lock_nested+0x33/0x3b0
[  186.678682]  [<ffffffff980e5d8d>] ? trace_hardirqs_on+0xd/0x10
[  186.678689]  [<ffffffffc0c57d2d>] brcmf_cfg80211_wait_vif_event+0xcd/0x130 [brcmfmac]
[  186.678691]  [<ffffffff980d9190>] ? wake_atomic_t_function+0x60/0x60
[  186.678697]  [<ffffffffc0c628e9>] brcmf_p2p_del_vif+0xf9/0x220 [brcmfmac]
[  186.678702]  [<ffffffffc0c57fab>] brcmf_cfg80211_del_iface+0x21b/0x270 [brcmfmac]
[  186.678716]  [<ffffffffc0b0539e>] nl80211_del_interface+0xfe/0x3a0 [cfg80211]
[  186.678718]  [<ffffffff987ca335>] genl_family_rcv_msg+0x1b5/0x370
[  186.678720]  [<ffffffff980e5d8d>] ? trace_hardirqs_on+0xd/0x10
[  186.678721]  [<ffffffff987ca56d>] genl_rcv_msg+0x7d/0xb0
[  186.678722]  [<ffffffff987ca4f0>] ? genl_family_rcv_msg+0x370/0x370
[  186.678724]  [<ffffffff987c9a47>] netlink_rcv_skb+0x97/0xb0
[  186.678726]  [<ffffffff987ca168>] genl_rcv+0x28/0x40
[  186.678727]  [<ffffffff987c93c3>] netlink_unicast+0x1d3/0x2f0
[  186.678729]  [<ffffffff987c933b>] ? netlink_unicast+0x14b/0x2f0
[  186.678731]  [<ffffffff987c97cb>] netlink_sendmsg+0x2eb/0x3a0
[  186.678733]  [<ffffffff9876dad8>] sock_sendmsg+0x38/0x50
[  186.678734]  [<ffffffff9876e4df>] ___sys_sendmsg+0x27f/0x290
[  186.678737]  [<ffffffff9828b935>] ? mntput_no_expire+0x5/0x3f0
[  186.678739]  [<ffffffff9828b9be>] ? mntput_no_expire+0x8e/0x3f0
[  186.678741]  [<ffffffff9828b935>] ? mntput_no_expire+0x5/0x3f0
[  186.678743]  [<ffffffff9828bd44>] ? mntput+0x24/0x40
[  186.678744]  [<ffffffff98267830>] ? __fput+0x190/0x200
[  186.678746]  [<ffffffff9876f125>] __sys_sendmsg+0x45/0x80
[  186.678748]  [<ffffffff9876f172>] SyS_sendmsg+0x12/0x20
[  186.678749]  [<ffffffff988b5680>] entry_SYSCALL_64_fastpath+0x23/0xc1
[  186.678751]  [<ffffffff980e2b8f>] ? trace_hardirqs_off_caller+0x1f/0xc0
[  186.678752] ---[ end trace e224d66c5d8408b5 ]---

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

andreamerello pushed a commit to andreamerello/linux-mach-bcm that referenced this issue Sep 21, 2016

ALSA: timer: Fix zero-division by continue of uninitialized instance
When a user timer instance is continued without the explicit start
beforehand, the system gets eventually zero-division error like:

  divide error: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
  CPU: 1 PID: 27320 Comm: syz-executor Not tainted 4.8.0-rc3-next-20160825+ #8
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
   task: ffff88003c9b2280 task.stack: ffff880027280000
   RIP: 0010:[<ffffffff858e1a6c>]  [<     inline     >] ktime_divns include/linux/ktime.h:195
   RIP: 0010:[<ffffffff858e1a6c>]  [<ffffffff858e1a6c>] snd_hrtimer_callback+0x1bc/0x3c0 sound/core/hrtimer.c:62
  Call Trace:
   <IRQ>
   [<     inline     >] __run_hrtimer kernel/time/hrtimer.c:1238
   [<ffffffff81504335>] __hrtimer_run_queues+0x325/0xe70 kernel/time/hrtimer.c:1302
   [<ffffffff81506ceb>] hrtimer_interrupt+0x18b/0x420 kernel/time/hrtimer.c:1336
   [<ffffffff8126d8df>] local_apic_timer_interrupt+0x6f/0xe0 arch/x86/kernel/apic/apic.c:933
   [<ffffffff86e13056>] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:957
   [<ffffffff86e1210c>] apic_timer_interrupt+0x8c/0xa0 arch/x86/entry/entry_64.S:487
   <EOI>
   .....

Although a similar issue was spotted and a fix patch was merged in
commit [6b760bb: ALSA: timer: fix division by zero after
SNDRV_TIMER_IOCTL_CONTINUE], it seems covering only a part of
iceberg.

In this patch, we fix the issue a bit more drastically.  Basically the
continue of an uninitialized timer is supposed to be a fresh start, so
we do it for user timers.  For the direct snd_timer_continue() call,
there is no way to pass the initial tick value, so we kick out for the
uninitialized case.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

anholt pushed a commit that referenced this issue Oct 17, 2016

IB/core: correctly handle rdma_rw_init_mrs() failure
Function ib_create_qp() was failing to return an error when
rdma_rw_init_mrs() fails, causing a crash further down in ib_create_qp()
when trying to dereferece the qp pointer which was actually a negative
errno.

The crash:

crash> log|grep BUG
[  136.458121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
crash> bt
PID: 3736   TASK: ffff8808543215c0  CPU: 2   COMMAND: "kworker/u64:2"
 #0 [ffff88084d323340] machine_kexec at ffffffff8105fbb0
 #1 [ffff88084d3233b0] __crash_kexec at ffffffff81116758
 #2 [ffff88084d323480] crash_kexec at ffffffff8111682d
 #3 [ffff88084d3234b0] oops_end at ffffffff81032bd6
 #4 [ffff88084d3234e0] no_context at ffffffff8106e431
 #5 [ffff88084d323530] __bad_area_nosemaphore at ffffffff8106e610
 #6 [ffff88084d323590] bad_area_nosemaphore at ffffffff8106e6f4
 #7 [ffff88084d3235a0] __do_page_fault at ffffffff8106ebdc
 #8 [ffff88084d323620] do_page_fault at ffffffff8106f057
 #9 [ffff88084d323660] page_fault at ffffffff816e3148
    [exception RIP: ib_create_qp+427]
    RIP: ffffffffa02554fb  RSP: ffff88084d323718  RFLAGS: 00010246
    RAX: 0000000000000004  RBX: fffffffffffffff4  RCX: 000000018020001f
    RDX: ffff880830997fc0  RSI: 0000000000000001  RDI: ffff88085f407200
    RBP: ffff88084d323778   R8: 0000000000000001   R9: ffffea0020bae210
    R10: ffffea0020bae218  R11: 0000000000000001  R12: ffff88084d3237c8
    R13: 00000000fffffff4  R14: ffff880859fa5000  R15: ffff88082eb89800
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff88084d323780] rdma_create_qp at ffffffffa0782681 [rdma_cm]
#11 [ffff88084d3237b0] nvmet_rdma_create_queue_ib at ffffffffa07c43f3 [nvmet_rdma]
#12 [ffff88084d323860] nvmet_rdma_alloc_queue at ffffffffa07c5ba9 [nvmet_rdma]
#13 [ffff88084d323900] nvmet_rdma_queue_connect at ffffffffa07c5c96 [nvmet_rdma]
#14 [ffff88084d323980] nvmet_rdma_cm_handler at ffffffffa07c6450 [nvmet_rdma]
#15 [ffff88084d3239b0] iw_conn_req_handler at ffffffffa0787480 [rdma_cm]
#16 [ffff88084d323a60] cm_conn_req_handler at ffffffffa0775f06 [iw_cm]
#17 [ffff88084d323ab0] process_event at ffffffffa0776019 [iw_cm]
#18 [ffff88084d323af0] cm_work_handler at ffffffffa0776170 [iw_cm]
#19 [ffff88084d323cb0] process_one_work at ffffffff810a1483
#20 [ffff88084d323d90] worker_thread at ffffffff810a211d
#21 [ffff88084d323ec0] kthread at ffffffff810a6c5c
#22 [ffff88084d323f50] ret_from_fork at ffffffff816e1ebf

Fixes: 632bc3f ("IB/core, RDMA RW API: Do not exceed QP SGE send limit")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

anholt pushed a commit that referenced this issue Nov 17, 2016

reiserfs: Unlock superblock before calling reiserfs_quota_on_mount()
commit 420902c upstream.

If we hold the superblock lock while calling reiserfs_quota_on_mount(), we can
deadlock our own worker - mount blocks kworker/3:2, sleeps forever more.

crash> ps|grep UN
    715      2   3  ffff880220734d30  UN   0.0       0      0  [kworker/3:2]
   9369   9341   2  ffff88021ffb7560  UN   1.3  493404 123184  Xorg
   9665   9664   3  ffff880225b92ab0  UN   0.0   47368    812  udisks-daemon
  10635  10403   3  ffff880222f22c70  UN   0.0   14904    936  mount
crash> bt ffff880220734d30
PID: 715    TASK: ffff880220734d30  CPU: 3   COMMAND: "kworker/3:2"
 #0 [ffff8802244c3c20] schedule at ffffffff8144584b
 #1 [ffff8802244c3cc8] __rt_mutex_slowlock at ffffffff814472b3
 #2 [ffff8802244c3d28] rt_mutex_slowlock at ffffffff814473f5
 #3 [ffff8802244c3dc8] reiserfs_write_lock at ffffffffa05f28fd [reiserfs]
 #4 [ffff8802244c3de8] flush_async_commits at ffffffffa05ec91d [reiserfs]
 #5 [ffff8802244c3e08] process_one_work at ffffffff81073726
 #6 [ffff8802244c3e68] worker_thread at ffffffff81073eba
 #7 [ffff8802244c3ec8] kthread at ffffffff810782e0
 #8 [ffff8802244c3f48] kernel_thread_helper at ffffffff81450064
crash> rd ffff8802244c3cc8 10
ffff8802244c3cc8:  ffffffff814472b3 ffff880222f23250   .rD.....P2."....
ffff8802244c3cd8:  0000000000000000 0000000000000286   ................
ffff8802244c3ce8:  ffff8802244c3d30 ffff880220734d80   0=L$.....Ms ....
ffff8802244c3cf8:  ffff880222e8f628 0000000000000000   (.."............
ffff8802244c3d08:  0000000000000000 0000000000000002   ................
crash> struct rt_mutex ffff880222e8f628
struct rt_mutex {
  wait_lock = {
    raw_lock = {
      slock = 65537
    }
  },
  wait_list = {
    node_list = {
      next = 0xffff8802244c3d48,
      prev = 0xffff8802244c3d48
    }
  },
  owner = 0xffff880222f22c71,
  save_state = 0
}
crash> bt 0xffff880222f22c70
PID: 10635  TASK: ffff880222f22c70  CPU: 3   COMMAND: "mount"
 #0 [ffff8802216a9868] schedule at ffffffff8144584b
 #1 [ffff8802216a9910] schedule_timeout at ffffffff81446865
 #2 [ffff8802216a99a0] wait_for_common at ffffffff81445f74
 #3 [ffff8802216a9a30] flush_work at ffffffff810712d3
 #4 [ffff8802216a9ab0] schedule_on_each_cpu at ffffffff81074463
 #5 [ffff8802216a9ae0] invalidate_bdev at ffffffff81178aba
 #6 [ffff8802216a9af0] vfs_load_quota_inode at ffffffff811a3632
 #7 [ffff8802216a9b50] dquot_quota_on_mount at ffffffff811a375c
 #8 [ffff8802216a9b80] finish_unfinished at ffffffffa05dd8b0 [reiserfs]
 #9 [ffff8802216a9cc0] reiserfs_fill_super at ffffffffa05de825 [reiserfs]
    RIP: 00007f7b9303997a  RSP: 00007ffff443c7a8  RFLAGS: 00010202
    RAX: 00000000000000a5  RBX: ffffffff8144ef12  RCX: 00007f7b932e9ee0
    RDX: 00007f7b93d9a400  RSI: 00007f7b93d9a3e0  RDI: 00007f7b93d9a3c0
    RBP: 00007f7b93d9a2c0   R8: 00007f7b93d9a550   R9: 0000000000000001
    R10: ffffffffc0ed040e  R11: 0000000000000202  R12: 000000000000040e
    R13: 0000000000000000  R14: 00000000c0ed040e  R15: 00007ffff443ca20
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Dec 6, 2016

Owner

I've updated the upstream-targeting branch again recently. Still not working: it just displays a white screen, like the backlight is on but no particular pixel data has latched. Some open questions:

  • Is our i2c-gpio working properly?

I'm not getting errors from the transactions, and the backlight does flash, but running through fading the backlight in and out might be useful for increasing confidence in the I2C side of things.

  • Can we tell what is failing?

We did some debug with a scope at the Pi offices, and the conclusion there was that we had HSYNC/VSYNC going into the bridge but not coming out. I don't have a scope myself to test more as I make changes, and I never got confident enough in the scope at the office that I understood what we were seeing.

  • Is it possible to hook up different panels?

Could we run the FPC out to a breadboard and wire up somebody else's panel, maybe something with an existing panel driver in the kernel? It would be great to be able to bisect the bug space between the panel driver (driving the fine atmel chip) and the DSI driver itself.

  • With recent bugfixes in the branch, could we get things working with firmware starting up the panel and then disabling/enabling it from there?

I used to have a mode in this driver where we inherited from the firmware and got things scanning out, but encoder disable/enable would fail. Trying to go from the downstream tree's use_firmware_setup mode incrementally toward reprogramming everything on disable/enable might be informative.

Owner

anholt commented Dec 6, 2016

I've updated the upstream-targeting branch again recently. Still not working: it just displays a white screen, like the backlight is on but no particular pixel data has latched. Some open questions:

  • Is our i2c-gpio working properly?

I'm not getting errors from the transactions, and the backlight does flash, but running through fading the backlight in and out might be useful for increasing confidence in the I2C side of things.

  • Can we tell what is failing?

We did some debug with a scope at the Pi offices, and the conclusion there was that we had HSYNC/VSYNC going into the bridge but not coming out. I don't have a scope myself to test more as I make changes, and I never got confident enough in the scope at the office that I understood what we were seeing.

  • Is it possible to hook up different panels?

Could we run the FPC out to a breadboard and wire up somebody else's panel, maybe something with an existing panel driver in the kernel? It would be great to be able to bisect the bug space between the panel driver (driving the fine atmel chip) and the DSI driver itself.

  • With recent bugfixes in the branch, could we get things working with firmware starting up the panel and then disabling/enabling it from there?

I used to have a mode in this driver where we inherited from the firmware and got things scanning out, but encoder disable/enable would fail. Trying to go from the downstream tree's use_firmware_setup mode incrementally toward reprogramming everything on disable/enable might be informative.

@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Dec 14, 2016

Owner

The drm-vc4-dsi branch now has a working driver stack against 4.9-rc1. I'm hoping to merge this for 4.11. The transactions over the DSI bus hadn't been working, and it took writing some nasty firmware workaround code to figure it out. I'd love to be using DSI transactions instead of I2C, but I'm about out of debug energy for DSI at this point.

Owner

anholt commented Dec 14, 2016

The drm-vc4-dsi branch now has a working driver stack against 4.9-rc1. I'm hoping to merge this for 4.11. The transactions over the DSI bus hadn't been working, and it took writing some nasty firmware workaround code to figure it out. I'd love to be using DSI transactions instead of I2C, but I'm about out of debug energy for DSI at this point.

@greyltc

This comment has been minimized.

Show comment
Hide comment
@greyltc

greyltc Dec 14, 2016

I have an oscilloscope that can decode I2C, a pi3, the official display and I can compile and deploy the kernel. Is there anything I can do to help?

greyltc commented Dec 14, 2016

I have an oscilloscope that can decode I2C, a pi3, the official display and I can compile and deploy the kernel. Is there anything I can do to help?

@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Dec 14, 2016

Owner

At this point what's left to debug is why the ULPS latching fails (the warning thrown at boot on that branch) and why the DSI transactions fail (see #if 0 code in panel-raspberrypi-touchscreen.c). I2C seems to be stable.

Owner

anholt commented Dec 14, 2016

At this point what's left to debug is why the ULPS latching fails (the warning thrown at boot on that branch) and why the DSI transactions fail (see #if 0 code in panel-raspberrypi-touchscreen.c). I2C seems to be stable.

anholt pushed a commit that referenced this issue Feb 2, 2017

xhci: workaround for hosts missing CAS bit
commit 346e997 upstream.

If a device is unplugged and replugged during Sx system suspend
some  Intel xHC hosts will overwrite the CAS (Cold attach status) flag
and no device connection is noticed in resume.

A device in this state can be identified in resume if its link state
is in polling or compliance mode, and the current connect status is 0.
A device in this state needs to be warm reset.

Intel 100/c230 series PCH specification update Doc #332692-006 Errata #8

Observed on Cherryview and Apollolake as they go into compliance mode
if LFPS times out during polling, and re-plugged devices are not
discovered at resume.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Feb 28, 2017

vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null
The recently added mediated VFIO driver doesn't know about powerpc iommu.
It thus doesn't register a struct iommu_table_group in the iommu group
upon device creation. The iommu_data pointer hence remains null.

This causes a kernel oops when userspace tries to set the iommu type of a
container associated with a mediated device to VFIO_SPAPR_TCE_v2_IOMMU.

[   82.585440] mtty mtty: MDEV: Registered
[   87.655522] iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group 10
[   87.655527] vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id = 10
[  116.297184] Unable to handle kernel paging request for data at address 0x00000030
[  116.297389] Faulting instruction address: 0xd000000007870524
[  116.297465] Oops: Kernel access of bad area, sig: 11 [#1]
[  116.297611] SMP NR_CPUS=2048
[  116.297611] NUMA
[  116.297627] PowerNV
...
[  116.297954] CPU: 33 PID: 7067 Comm: qemu-system-ppc Not tainted 4.10.0-rc5-mdev-test #8
[  116.297993] task: c000000e7718b680 task.stack: c000000e77214000
[  116.298025] NIP: d000000007870524 LR: d000000007870518 CTR: 0000000000000000
[  116.298064] REGS: c000000e77217990 TRAP: 0300   Not tainted  (4.10.0-rc5-mdev-test)
[  116.298103] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[  116.298107]   CR: 84004444  XER: 00000000
[  116.298154] CFAR: c00000000000888c DAR: 0000000000000030 DSISR: 40000000 SOFTE: 1
               GPR00: d000000007870518 c000000e77217c10 d00000000787b0ed c000000eed2103c0
               GPR04: 0000000000000000 0000000000000000 c000000eed2103e0 0000000f24320000
               GPR08: 0000000000000104 0000000000000001 0000000000000000 d0000000078729b0
               GPR12: c00000000025b7e0 c00000000fe08400 0000000000000001 000001002d31d100
               GPR16: 000001002c22c850 00003ffff315c750 0000000043145680 0000000043141bc0
               GPR20: ffffffffffffffed fffffffffffff000 0000000020003b65 d000000007706018
               GPR24: c000000f16cf0d98 d000000007706000 c000000003f42980 c000000003f42980
               GPR28: c000000f1575ac00 c000000003f429c8 0000000000000000 c000000eed2103c0
[  116.298504] NIP [d000000007870524] tce_iommu_attach_group+0x10c/0x360 [vfio_iommu_spapr_tce]
[  116.298555] LR [d000000007870518] tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce]
[  116.298601] Call Trace:
[  116.298610] [c000000e77217c10] [d000000007870518] tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce] (unreliable)
[  116.298671] [c000000e77217cb0] [d0000000077033a0] vfio_fops_unl_ioctl+0x278/0x3e0 [vfio]
[  116.298713] [c000000e77217d40] [c0000000002a3ebc] do_vfs_ioctl+0xcc/0x8b0
[  116.298745] [c000000e77217de0] [c0000000002a4700] SyS_ioctl+0x60/0xc0
[  116.298782] [c000000e77217e30] [c00000000000b220] system_call+0x38/0xfc
[  116.298812] Instruction dump:
[  116.298828] 7d3f4b78 409effc8 3d220000 e9298020 3c800140 38a00018 608480c0 e8690028
[  116.298869] 4800249d e8410018 7c7f1b79 41820230 <e93e0030> 2fa90000 419e0114 e9090020
[  116.298914] ---[ end trace 1e10b0ced08b9120 ]---

This patch fixes the oops.

Reported-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
@greyltc

This comment has been minimized.

Show comment
Hide comment
@greyltc

greyltc Mar 15, 2017

@anholt Did your drm-vc4-dsi branch make it into 4.11?

greyltc commented Mar 15, 2017

@anholt Did your drm-vc4-dsi branch make it into 4.11?

@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Mar 15, 2017

Owner

The DSI driver is in, but the panel driver is blocked.

Owner

anholt commented Mar 15, 2017

The DSI driver is in, but the panel driver is blocked.

@greyltc

This comment has been minimized.

Show comment
Hide comment
@greyltc

greyltc Mar 16, 2017

What, if any of this stuff is in https://github.com/raspberrypi/linux rpi-4.9.y ?
Should I be able to use vc4 with my official touchscreen display if I use that kernel now?

greyltc commented Mar 16, 2017

What, if any of this stuff is in https://github.com/raspberrypi/linux rpi-4.9.y ?
Should I be able to use vc4 with my official touchscreen display if I use that kernel now?

@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Mar 16, 2017

Owner

It's all present in rpi-4.9.y and should work.

Owner

anholt commented Mar 16, 2017

It's all present in rpi-4.9.y and should work.

anholt pushed a commit that referenced this issue Mar 30, 2017

dccp/tcp: fix routing redirect race
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Mar 30, 2017

dccp/tcp: fix routing redirect race
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Apr 8, 2017

dccp/tcp: fix routing redirect race
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

anholt pushed a commit that referenced this issue Apr 19, 2017

inet: frag: release spinlock before calling icmp_send()
Dmitry reported a lockdep splat [1] (false positive) that we can fix
by releasing the spinlock before calling icmp_send() from ip_expire()

This is a false positive because sending an ICMP message can not
possibly re-enter the IP frag engine.

[1]
[ INFO: possible circular locking dependency detected ]
4.10.0+ #29 Not tainted
-------------------------------------------------------
modprobe/12392 is trying to acquire lock:
 (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] spin_lock
include/linux/spinlock.h:299 [inline]
 (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] __netif_tx_lock
include/linux/netdevice.h:3486 [inline]
 (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>]
sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180

but task is already holding lock:
 (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
include/linux/spinlock.h:299 [inline]
 (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&q->lock)->rlock){+.-...}:
       validate_chain kernel/locking/lockdep.c:2267 [inline]
       __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
       lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:299 [inline]
       ip_defrag+0x3a2/0x4130 net/ipv4/ip_fragment.c:669
       ip_check_defrag+0x4e3/0x8b0 net/ipv4/ip_fragment.c:713
       packet_rcv_fanout+0x282/0x800 net/packet/af_packet.c:1459
       deliver_skb net/core/dev.c:1834 [inline]
       dev_queue_xmit_nit+0x294/0xa90 net/core/dev.c:1890
       xmit_one net/core/dev.c:2903 [inline]
       dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2923
       sch_direct_xmit+0x31f/0x6d0 net/sched/sch_generic.c:182
       __dev_xmit_skb net/core/dev.c:3092 [inline]
       __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
       neigh_resolve_output+0x6b9/0xb10 net/core/neighbour.c:1308
       neigh_output include/net/neighbour.h:478 [inline]
       ip_finish_output2+0x8b8/0x15a0 net/ipv4/ip_output.c:228
       ip_do_fragment+0x1d93/0x2720 net/ipv4/ip_output.c:672
       ip_fragment.constprop.54+0x145/0x200 net/ipv4/ip_output.c:545
       ip_finish_output+0x82d/0xe10 net/ipv4/ip_output.c:314
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
       dst_output include/net/dst.h:486 [inline]
       ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
       ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
       ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
       raw_sendmsg+0x26de/0x3a00 net/ipv4/raw.c:655
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       ___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985
       __sys_sendmmsg+0x25c/0x750 net/socket.c:2075
       SYSC_sendmmsg net/socket.c:2106 [inline]
       SyS_sendmmsg+0x35/0x60 net/socket.c:2101
       do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
       return_from_SYSCALL_64+0x0/0x7a

-> #0 (_xmit_ETHER#2){+.-...}:
       check_prev_add kernel/locking/lockdep.c:1830 [inline]
       check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
       validate_chain kernel/locking/lockdep.c:2267 [inline]
       __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
       lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:299 [inline]
       __netif_tx_lock include/linux/netdevice.h:3486 [inline]
       sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
       __dev_xmit_skb net/core/dev.c:3092 [inline]
       __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
       neigh_hh_output include/net/neighbour.h:468 [inline]
       neigh_output include/net/neighbour.h:476 [inline]
       ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
       ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
       dst_output include/net/dst.h:486 [inline]
       ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
       ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
       ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
       icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
       icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
       ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
       call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
       expire_timers kernel/time/timer.c:1307 [inline]
       __run_timers+0x960/0xcf0 kernel/time/timer.c:1601
       run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
       __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364 [inline]
       irq_exit+0x1cc/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:657 [inline]
       smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
       apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
       __read_once_size include/linux/compiler.h:254 [inline]
       atomic_read arch/x86/include/asm/atomic.h:26 [inline]
       rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
       __rcu_is_watching kernel/rcu/tree.c:1133 [inline]
       rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
       rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
       radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
       filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
       do_fault_around mm/memory.c:3231 [inline]
       do_read_fault mm/memory.c:3265 [inline]
       do_fault+0xbd5/0x2080 mm/memory.c:3370
       handle_pte_fault mm/memory.c:3600 [inline]
       __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
       handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
       __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
       do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
       page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&q->lock)->rlock);
                               lock(_xmit_ETHER#2);
                               lock(&(&q->lock)->rlock);
  lock(_xmit_ETHER#2);

 *** DEADLOCK ***

10 locks held by modprobe/12392:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81329758>]
__do_page_fault+0x2b8/0xb60 arch/x86/mm/fault.c:1336
 #1:  (rcu_read_lock){......}, at: [<ffffffff8188cab6>]
filemap_map_pages+0x1e6/0x1570 mm/filemap.c:2324
 #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
spin_lock include/linux/spinlock.h:299 [inline]
 #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
pte_alloc_one_map mm/memory.c:2944 [inline]
 #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
alloc_set_pte+0x13b8/0x1b90 mm/memory.c:3072
 #3:  (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
lockdep_copy_map include/linux/lockdep.h:175 [inline]
 #3:  (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
call_timer_fn+0x1c2/0x820 kernel/time/timer.c:1258
 #4:  (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
include/linux/spinlock.h:299 [inline]
 #4:  (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201
 #5:  (rcu_read_lock){......}, at: [<ffffffff8389a633>]
ip_expire+0x1b3/0x6c0 net/ipv4/ip_fragment.c:216
 #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] spin_trylock
include/linux/spinlock.h:309 [inline]
 #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_xmit_lock
net/ipv4/icmp.c:219 [inline]
 #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>]
icmp_send+0x803/0x1c80 net/ipv4/icmp.c:681
 #7:  (rcu_read_lock_bh){......}, at: [<ffffffff838ab9a1>]
ip_finish_output2+0x2c1/0x15a0 net/ipv4/ip_output.c:198
 #8:  (rcu_read_lock_bh){......}, at: [<ffffffff836d1dee>]
__dev_queue_xmit+0x23e/0x1e60 net/core/dev.c:3324
 #9:  (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at:
[<ffffffff836d3a27>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3423

stack backtrace:
CPU: 0 PID: 12392 Comm: modprobe Not tainted 4.10.0+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
 print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204
 check_prev_add kernel/locking/lockdep.c:1830 [inline]
 check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
 validate_chain kernel/locking/lockdep.c:2267 [inline]
 __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
 spin_lock include/linux/spinlock.h:299 [inline]
 __netif_tx_lock include/linux/netdevice.h:3486 [inline]
 sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
 __dev_xmit_skb net/core/dev.c:3092 [inline]
 __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
 neigh_hh_output include/net/neighbour.h:468 [inline]
 neigh_output include/net/neighbour.h:476 [inline]
 ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
 ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:246 [inline]
 ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
 dst_output include/net/dst.h:486 [inline]
 ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
 icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
 icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
 ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
 call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
 expire_timers kernel/time/timer.c:1307 [inline]
 __run_timers+0x960/0xcf0 kernel/time/timer.c:1601
 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
 invoke_softirq kernel/softirq.c:364 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:657 [inline]
 smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline]
RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
RIP: 0010:__rcu_is_watching kernel/rcu/tree.c:1133 [inline]
RIP: 0010:rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
RSP: 0000:ffff8801c391f120 EFLAGS: 00000a03 ORIG_RAX: ffffffffffffff10
RAX: dffffc0000000000 RBX: ffff8801c391f148 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000055edd4374000 RDI: ffff8801dbe1ae0c
RBP: ffff8801c391f1a0 R08: 0000000000000002 R09: 0000000000000000
R10: dffffc0000000000 R11: 0000000000000002 R12: 1ffff10038723e25
R13: ffff8801dbe1ae00 R14: ffff8801c391f680 R15: dffffc0000000000
 </IRQ>
 rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
 radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
 filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
 do_fault_around mm/memory.c:3231 [inline]
 do_read_fault mm/memory.c:3265 [inline]
 do_fault+0xbd5/0x2080 mm/memory.c:3370
 handle_pte_fault mm/memory.c:3600 [inline]
 __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
 handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
 __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
 do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
 page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011
RIP: 0033:0x7f83172f2786
RSP: 002b:00007fffe859ae80 EFLAGS: 00010293
RAX: 000055edd4373040 RBX: 00007f83175111c8 RCX: 000055edd4373238
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f8317510970
RBP: 00007fffe859afd0 R08: 0000000000000009 R09: 0000000000000000
R10: 0000000000000064 R11: 0000000000000000 R12: 000055edd4373040
R13: 0000000000000000 R14: 00007fffe859afe8 R15: 0000000000000000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

anholt pushed a commit that referenced this issue May 4, 2017

net: ipconfig: fix ic_close_devs() use-after-free
Our chosen ic_dev may be anywhere in our list of ic_devs, and we may
free it before attempting to close others. When we compare d->dev and
ic_dev->dev, we're potentially dereferencing memory returned to the
allocator. This causes KASAN to scream for each subsequent ic_dev we
check.

As there's a 1-1 mapping between ic_devs and netdevs, we can instead
compare d and ic_dev directly, which implicitly handles the !ic_dev
case, and avoids the use-after-free. The ic_dev pointer may be stale,
but we will not dereference it.

Original splat:

[    6.487446] ==================================================================
[    6.494693] BUG: KASAN: use-after-free in ic_close_devs+0xc4/0x154 at addr ffff800367efa708
[    6.503013] Read of size 8 by task swapper/0/1
[    6.507452] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc3-00002-gda42158 #8
[    6.514993] Hardware name: AppliedMicro Mustang/Mustang, BIOS 3.05.05-beta_rc Jan 27 2016
[    6.523138] Call trace:
[    6.525590] [<ffff200008094778>] dump_backtrace+0x0/0x570
[    6.530976] [<ffff200008094d08>] show_stack+0x20/0x30
[    6.536017] [<ffff200008bee928>] dump_stack+0x120/0x188
[    6.541231] [<ffff20000856d5e4>] kasan_object_err+0x24/0xa0
[    6.546790] [<ffff20000856d924>] kasan_report_error+0x244/0x738
[    6.552695] [<ffff20000856dfec>] __asan_report_load8_noabort+0x54/0x80
[    6.559204] [<ffff20000aae86ac>] ic_close_devs+0xc4/0x154
[    6.564590] [<ffff20000aaedbac>] ip_auto_config+0x2ed4/0x2f1c
[    6.570321] [<ffff200008084b04>] do_one_initcall+0xcc/0x370
[    6.575882] [<ffff20000aa31de8>] kernel_init_freeable+0x5f8/0x6c4
[    6.581959] [<ffff20000a16df00>] kernel_init+0x18/0x190
[    6.587171] [<ffff200008084710>] ret_from_fork+0x10/0x40
[    6.592468] Object at ffff800367efa700, in cache kmalloc-128 size: 128
[    6.598969] Allocated:
[    6.601324] PID = 1
[    6.603427]  save_stack_trace_tsk+0x0/0x418
[    6.607603]  save_stack_trace+0x20/0x30
[    6.611430]  kasan_kmalloc+0xd8/0x188
[    6.615087]  ip_auto_config+0x8c4/0x2f1c
[    6.619002]  do_one_initcall+0xcc/0x370
[    6.622832]  kernel_init_freeable+0x5f8/0x6c4
[    6.627178]  kernel_init+0x18/0x190
[    6.630660]  ret_from_fork+0x10/0x40
[    6.634223] Freed:
[    6.636233] PID = 1
[    6.638334]  save_stack_trace_tsk+0x0/0x418
[    6.642510]  save_stack_trace+0x20/0x30
[    6.646337]  kasan_slab_free+0x88/0x178
[    6.650167]  kfree+0xb8/0x478
[    6.653131]  ic_close_devs+0x130/0x154
[    6.656875]  ip_auto_config+0x2ed4/0x2f1c
[    6.660875]  do_one_initcall+0xcc/0x370
[    6.664705]  kernel_init_freeable+0x5f8/0x6c4
[    6.669051]  kernel_init+0x18/0x190
[    6.672534]  ret_from_fork+0x10/0x40
[    6.676098] Memory state around the buggy address:
[    6.680880]  ffff800367efa600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    6.688078]  ffff800367efa680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[    6.695276] >ffff800367efa700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    6.702469]                       ^
[    6.705952]  ffff800367efa780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[    6.713149]  ffff800367efa800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    6.720343] ==================================================================
[    6.727536] Disabling lock debugging due to kernel taint

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

anholt pushed a commit that referenced this issue Jun 8, 2017

KVM: x86: Fix load damaged SSEx MXCSR register
commit a575813 upstream.

Reported by syzkaller:

   BUG: unable to handle kernel paging request at ffffffffc07f6a2e
   IP: report_bug+0x94/0x120
   PGD 348e12067
   P4D 348e12067
   PUD 348e14067
   PMD 3cbd84067
   PTE 80000003f7e87161

   Oops: 0003 [#1] SMP
   CPU: 2 PID: 7091 Comm: kvm_load_guest_ Tainted: G           OE   4.11.0+ #8
   task: ffff92fdfb525400 task.stack: ffffbda6c3d04000
   RIP: 0010:report_bug+0x94/0x120
   RSP: 0018:ffffbda6c3d07b20 EFLAGS: 00010202
    do_trap+0x156/0x170
    do_error_trap+0xa3/0x170
    ? kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
    ? mark_held_locks+0x79/0xa0
    ? retint_kernel+0x10/0x10
    ? trace_hardirqs_off_thunk+0x1a/0x1c
    do_invalid_op+0x20/0x30
    invalid_op+0x1e/0x30
   RIP: 0010:kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
    ? kvm_load_guest_fpu.part.175+0x1c/0x170 [kvm]
    kvm_arch_vcpu_ioctl_run+0xed6/0x1b70 [kvm]
    kvm_vcpu_ioctl+0x384/0x780 [kvm]
    ? kvm_vcpu_ioctl+0x384/0x780 [kvm]
    ? sched_clock+0x13/0x20
    ? __do_page_fault+0x2a0/0x550
    do_vfs_ioctl+0xa4/0x700
    ? up_read+0x1f/0x40
    ? __do_page_fault+0x2a0/0x550
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x23/0xc2

SDM mentioned that "The MXCSR has several reserved bits, and attempting to write
a 1 to any of these bits will cause a general-protection exception(#GP) to be
generated". The syzkaller forks' testcase overrides xsave area w/ random values
and steps on the reserved bits of MXCSR register. The damaged MXCSR register
values of guest will be restored to SSEx MXCSR register before vmentry. This
patch fixes it by catching userspace override MXCSR register reserved bits w/
random values and bails out immediately.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Jun 15, 2017

KVM: async_pf: avoid async pf injection when in guest mode
commit 9bc1f09 upstream.

 INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
       Not tainted 4.12.0-rc4+ #8
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 gnome-terminal- D    0  1734   1015 0x00000000
 Call Trace:
  __schedule+0x3cd/0xb30
  schedule+0x40/0x90
  kvm_async_pf_task_wait+0x1cc/0x270
  ? __vfs_read+0x37/0x150
  ? prepare_to_swait+0x22/0x70
  do_async_page_fault+0x77/0xb0
  ? do_async_page_fault+0x77/0xb0
  async_page_fault+0x28/0x30

This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
and then gives stress to memory on L1, I can observed this hang on L1 when
at least ~70% swap area is occupied on L0.

This is due to async pf was injected to L2 which should be injected to L1,
L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
actually), and L1 guest starts accumulating tasks stuck in D state in
kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.

This patch fixes the hang by doing async pf when executing L1 guest.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Aug 2, 2017

vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null
[ Upstream commit bd00fdf ]

The recently added mediated VFIO driver doesn't know about powerpc iommu.
It thus doesn't register a struct iommu_table_group in the iommu group
upon device creation. The iommu_data pointer hence remains null.

This causes a kernel oops when userspace tries to set the iommu type of a
container associated with a mediated device to VFIO_SPAPR_TCE_v2_IOMMU.

[   82.585440] mtty mtty: MDEV: Registered
[   87.655522] iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group 10
[   87.655527] vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id = 10
[  116.297184] Unable to handle kernel paging request for data at address 0x00000030
[  116.297389] Faulting instruction address: 0xd000000007870524
[  116.297465] Oops: Kernel access of bad area, sig: 11 [#1]
[  116.297611] SMP NR_CPUS=2048
[  116.297611] NUMA
[  116.297627] PowerNV
...
[  116.297954] CPU: 33 PID: 7067 Comm: qemu-system-ppc Not tainted 4.10.0-rc5-mdev-test #8
[  116.297993] task: c000000e7718b680 task.stack: c000000e77214000
[  116.298025] NIP: d000000007870524 LR: d000000007870518 CTR: 0000000000000000
[  116.298064] REGS: c000000e77217990 TRAP: 0300   Not tainted  (4.10.0-rc5-mdev-test)
[  116.298103] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[  116.298107]   CR: 84004444  XER: 00000000
[  116.298154] CFAR: c00000000000888c DAR: 0000000000000030 DSISR: 40000000 SOFTE: 1
               GPR00: d000000007870518 c000000e77217c10 d00000000787b0ed c000000eed2103c0
               GPR04: 0000000000000000 0000000000000000 c000000eed2103e0 0000000f24320000
               GPR08: 0000000000000104 0000000000000001 0000000000000000 d0000000078729b0
               GPR12: c00000000025b7e0 c00000000fe08400 0000000000000001 000001002d31d100
               GPR16: 000001002c22c850 00003ffff315c750 0000000043145680 0000000043141bc0
               GPR20: ffffffffffffffed fffffffffffff000 0000000020003b65 d000000007706018
               GPR24: c000000f16cf0d98 d000000007706000 c000000003f42980 c000000003f42980
               GPR28: c000000f1575ac00 c000000003f429c8 0000000000000000 c000000eed2103c0
[  116.298504] NIP [d000000007870524] tce_iommu_attach_group+0x10c/0x360 [vfio_iommu_spapr_tce]
[  116.298555] LR [d000000007870518] tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce]
[  116.298601] Call Trace:
[  116.298610] [c000000e77217c10] [d000000007870518] tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce] (unreliable)
[  116.298671] [c000000e77217cb0] [d0000000077033a0] vfio_fops_unl_ioctl+0x278/0x3e0 [vfio]
[  116.298713] [c000000e77217d40] [c0000000002a3ebc] do_vfs_ioctl+0xcc/0x8b0
[  116.298745] [c000000e77217de0] [c0000000002a4700] SyS_ioctl+0x60/0xc0
[  116.298782] [c000000e77217e30] [c00000000000b220] system_call+0x38/0xfc
[  116.298812] Instruction dump:
[  116.298828] 7d3f4b78 409effc8 3d220000 e9298020 3c800140 38a00018 608480c0 e8690028
[  116.298869] 4800249d e8410018 7c7f1b79 41820230 <e93e0030> 2fa90000 419e0114 e9090020
[  116.298914] ---[ end trace 1e10b0ced08b9120 ]---

This patch fixes the oops.

Reported-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Aug 2, 2017

scsi: Add STARGET_CREATED_REMOVE state to scsi_target_state
commit f9279c9 upstream.

The addition of the STARGET_REMOVE state had the side effect of
introducing a race condition that can cause a crash.

scsi_target_reap_ref_release() checks the starget->state to
see if it still in STARGET_CREATED, and if so, skips calling
transport_remove_device() and device_del(), because the starget->state
is only set to STARGET_RUNNING after scsi_target_add() has called
device_add() and transport_add_device().

However, if an rport loss occurs while a target is being scanned,
it can happen that scsi_remove_target() will be called while the
starget is still in the STARGET_CREATED state.  In this case, the
starget->state will be set to STARGET_REMOVE, and as a result,
scsi_target_reap_ref_release() will take the wrong path.  The end
result is a panic:

[ 1255.356653] Oops: 0000 [#1] SMP
[ 1255.360154] Modules linked in: x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel ghash_clmulni_i
[ 1255.393234] CPU: 5 PID: 149 Comm: kworker/u96:4 Tainted: G        W       4.11.0+ #8
[ 1255.401879] Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
[ 1255.410327] Workqueue: scsi_wq_6 fc_scsi_scan_rport [scsi_transport_fc]
[ 1255.417720] task: ffff88060ca8c8c0 task.stack: ffffc900048a8000
[ 1255.424331] RIP: 0010:kernfs_find_ns+0x13/0xc0
[ 1255.429287] RSP: 0018:ffffc900048abbf0 EFLAGS: 00010246
[ 1255.435123] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1255.443083] RDX: 0000000000000000 RSI: ffffffff8188d659 RDI: 0000000000000000
[ 1255.451043] RBP: ffffc900048abc10 R08: 0000000000000000 R09: 0000012433fe0025
[ 1255.459005] R10: 0000000025e5a4b5 R11: 0000000025e5a4b5 R12: ffffffff8188d659
[ 1255.466972] R13: 0000000000000000 R14: ffff8805f55e5088 R15: 0000000000000000
[ 1255.474931] FS:  0000000000000000(0000) GS:ffff880616b40000(0000) knlGS:0000000000000000
[ 1255.483959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1255.490370] CR2: 0000000000000068 CR3: 0000000001c09000 CR4: 00000000000406e0
[ 1255.498332] Call Trace:
[ 1255.501058]  kernfs_find_and_get_ns+0x31/0x60
[ 1255.505916]  sysfs_unmerge_group+0x1d/0x60
[ 1255.510498]  dpm_sysfs_remove+0x22/0x60
[ 1255.514783]  device_del+0xf4/0x2e0
[ 1255.518577]  ? device_remove_file+0x19/0x20
[ 1255.523241]  attribute_container_class_device_del+0x1a/0x20
[ 1255.529457]  transport_remove_classdev+0x4e/0x60
[ 1255.534607]  ? transport_add_class_device+0x40/0x40
[ 1255.540046]  attribute_container_device_trigger+0xb0/0xc0
[ 1255.546069]  transport_remove_device+0x15/0x20
[ 1255.551025]  scsi_target_reap_ref_release+0x25/0x40
[ 1255.556467]  scsi_target_reap+0x2e/0x40
[ 1255.560744]  __scsi_scan_target+0xaa/0x5b0
[ 1255.565312]  scsi_scan_target+0xec/0x100
[ 1255.569689]  fc_scsi_scan_rport+0xb1/0xc0 [scsi_transport_fc]
[ 1255.576099]  process_one_work+0x14b/0x390
[ 1255.580569]  worker_thread+0x4b/0x390
[ 1255.584651]  kthread+0x109/0x140
[ 1255.588251]  ? rescuer_thread+0x330/0x330
[ 1255.592730]  ? kthread_park+0x60/0x60
[ 1255.596815]  ret_from_fork+0x29/0x40
[ 1255.600801] Code: 24 08 48 83 42 40 01 5b 41 5c 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
[ 1255.621876] RIP: kernfs_find_ns+0x13/0xc0 RSP: ffffc900048abbf0
[ 1255.628479] CR2: 0000000000000068
[ 1255.632756] ---[ end trace 34a69ba0477d036f ]---

Fix this by adding another scsi_target state STARGET_CREATED_REMOVE
to distinguish this case.

Fixes: f05795d ("scsi: Add intermediate STARGET_REMOVE state to scsi_target_state")
Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Aug 2, 2017

ipmi: use rcu lock around call to intf->handlers->sender()
commit cdea465 upstream.

A vendor with a system having more than 128 CPUs occasionally encounters
the following crash during shutdown. This is not an easily reproduceable
event, but the vendor was able to provide the following analysis of the
crash, which exhibits the same footprint each time.

crash> bt
PID: 0      TASK: ffff88017c70ce70  CPU: 5   COMMAND: "swapper/5"
 #0 [ffff88085c143ac8] machine_kexec at ffffffff81059c8b
 #1 [ffff88085c143b28] __crash_kexec at ffffffff811052e2
 #2 [ffff88085c143bf8] crash_kexec at ffffffff811053d0
 #3 [ffff88085c143c10] oops_end at ffffffff8168ef88
 #4 [ffff88085c143c38] no_context at ffffffff8167ebb3
 #5 [ffff88085c143c88] __bad_area_nosemaphore at ffffffff8167ec49
 #6 [ffff88085c143cd0] bad_area_nosemaphore at ffffffff8167edb3
 #7 [ffff88085c143ce0] __do_page_fault at ffffffff81691d1e
 #8 [ffff88085c143d40] do_page_fault at ffffffff81691ec5
 #9 [ffff88085c143d70] page_fault at ffffffff8168e188
    [exception RIP: unknown or invalid address]
    RIP: ffffffffa053c800  RSP: ffff88085c143e28  RFLAGS: 00010206
    RAX: ffff88017c72bfd8  RBX: ffff88017a8dc000  RCX: ffff8810588b5ac8
    RDX: ffff8810588b5a00  RSI: ffffffffa053c800  RDI: ffff8810588b5a00
    RBP: ffff88085c143e58   R8: ffff88017c70d408   R9: ffff88017a8dc000
    R10: 0000000000000002  R11: ffff88085c143da0  R12: ffff8810588b5ac8
    R13: 0000000000000100  R14: ffffffffa053c800  R15: ffff8810588b5a00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <IRQ stack>
    [exception RIP: cpuidle_enter_state+82]
    RIP: ffffffff81514192  RSP: ffff88017c72be50  RFLAGS: 00000202
    RAX: 0000001e4c3c6f16  RBX: 000000000000f8a0  RCX: 0000000000000018
    RDX: 0000000225c17d03  RSI: ffff88017c72bfd8  RDI: 0000001e4c3c6f16
    RBP: ffff88017c72be78   R8: 000000000000237e   R9: 0000000000000018
    R10: 0000000000002494  R11: 0000000000000001  R12: ffff88017c72be20
    R13: ffff88085c14f8e0  R14: 0000000000000082  R15: 0000001e4c3bb400
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018

This is the corresponding stack trace

It has crashed because the area pointed with RIP extracted from timer
element is already removed during a shutdown process.

The function is smi_timeout().

And we think ffff8810588b5a00 in RDX is a parameter struct smi_info

crash> rd ffff8810588b5a00 20
ffff8810588b5a00:  ffff8810588b6000 0000000000000000   .`.X............
ffff8810588b5a10:  ffff880853264400 ffffffffa05417e0   .D&S......T.....
ffff8810588b5a20:  24a024a000000000 0000000000000000   .....$.$........
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a40:  ffffffffa053a040 ffffffffa053a060   @.S.....`.S.....
ffff8810588b5a50:  0000000000000000 0000000100000001   ................
ffff8810588b5a60:  0000000000000000 0000000000000e00   ................
ffff8810588b5a70:  ffffffffa053a580 ffffffffa053a6e0   ..S.......S.....
ffff8810588b5a80:  ffffffffa053a4a0 ffffffffa053a250   ..S.....P.S.....
ffff8810588b5a90:  0000000500000002 0000000000000000   ................

Unfortunately the top of this area is already detroyed by someone.
But because of two reasonns we think this is struct smi_info
 1) The address included in between  ffff8810588b5a70 and ffff8810588b5a80:
  are inside of ipmi_si_intf.c  see crash> module ffff88085779d2c0

 2) We've found the area which point this.
  It is offset 0x68 of  ffff880859df4000

crash> rd  ffff880859df4000 100
ffff880859df4000:  0000000000000000 0000000000000001   ................
ffff880859df4010:  ffffffffa0535290 dead000000000200   .RS.............
ffff880859df4020:  ffff880859df4020 ffff880859df4020    @.Y.... @.Y....
ffff880859df4030:  0000000000000002 0000000000100010   ................
ffff880859df4040:  ffff880859df4040 ffff880859df4040   @@.Y....@@.Y....
ffff880859df4050:  0000000000000000 0000000000000000   ................
ffff880859df4060:  0000000000000000 ffff8810588b5a00   .........Z.X....
ffff880859df4070:  0000000000000001 ffff880859df4078   ........x@.Y....

 If we regards it as struct ipmi_smi in shutdown process
 it looks consistent.

The remedy for this apparent race is affixed below.

Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This was first introduced in 7ea0ed2 ipmi: Make the
message handler easier to use for SMI interfaces
where some code was moved outside of the rcu_read_lock()
and the lock was not added.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
@anholt

This comment has been minimized.

Show comment
Hide comment
@anholt

anholt Nov 9, 2017

Owner

Everything is upstreamed now (other than DT, which is unclear if it can be upstreamed), so I'm calling this one done. Some patches still need to flow downstream.

Owner

anholt commented Nov 9, 2017

Everything is upstreamed now (other than DT, which is unclear if it can be upstreamed), so I'm calling this one done. Some patches still need to flow downstream.

@anholt anholt closed this Nov 9, 2017

anholt pushed a commit that referenced this issue Nov 13, 2017

ipv6: reset fn->rr_ptr when replacing route
[ Upstream commit 383143f ]

syzcaller reported the following use-after-free issue in rt6_select():
BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
Read of size 4 by task syz-executor1/439628
CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
 ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
 ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
Call Trace:
 [<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
 [<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
 [<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
 [<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
 [<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
 [<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
 [<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
 [<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
 [<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
 [<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
 [<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
 [<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
 [<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
 [<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
 [<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
 [<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
 [<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
 [<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
 [<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
 [<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
 [<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
 [<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
 [<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
 [<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
Object at ffff8800bc699380, in cache ip6_dst_cache size: 384

The root cause of it is that in fib6_add_rt2node(), when it replaces an
existing route with the new one, it does not update fn->rr_ptr.
This commit resets fn->rr_ptr to NULL when it points to a route which is
replaced in fib6_add_rt2node().

Fixes: 2759647 ("ipv6: fix ECMP route replacement")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Nov 21, 2017

perf buildid-list: Fix crash when processing PERF_RECORD_NAMESPACE
Thomas reported that 'perf buildid-list' gets a SEGFAULT due to NULL
pointer deref when he ran it on a data with namespace events.  It was
because the buildid_id__mark_dso_hit_ops lacks the namespace event
handler and perf_too__fill_default() didn't set it.

  Program received signal SIGSEGV, Segmentation fault.
  0x0000000000000000 in ?? ()
  Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc25.s390x bzip2-libs-1.0.6-21.fc25.s390x elfutils-libelf-0.169-1.fc25.s390x
  +elfutils-libs-0.169-1.fc25.s390x libcap-ng-0.7.8-1.fc25.s390x numactl-libs-2.0.11-2.ibm.fc25.s390x openssl-libs-1.1.0e-1.1.ibm.fc25.s390x perl-libs-5.24.1-386.fc25.s390x
  +python-libs-2.7.13-2.fc25.s390x slang-2.3.0-7.fc25.s390x xz-libs-5.2.3-2.fc25.s390x zlib-1.2.8-10.fc25.s390x
  (gdb) where
  #0  0x0000000000000000 in ?? ()
  #1  0x00000000010fad6a in machines__deliver_event (machines=<optimized out>, machines@entry=0x2c6fd18,
      evlist=<optimized out>, event=event@entry=0x3fffdf00470, sample=0x3ffffffe880, sample@entry=0x3ffffffe888,
      tool=tool@entry=0x1312968 <build_id.mark_dso_hit_ops>, file_offset=1136) at util/session.c:1287
  #2  0x00000000010fbf4e in perf_session__deliver_event (file_offset=1136, tool=0x1312968 <build_id.mark_dso_hit_ops>,
      sample=0x3ffffffe888, event=0x3fffdf00470, session=0x2c6fc30) at util/session.c:1340
  #3  perf_session__process_event (session=0x2c6fc30, session@entry=0x0, event=event@entry=0x3fffdf00470,
      file_offset=file_offset@entry=1136) at util/session.c:1522
  #4  0x00000000010fddde in __perf_session__process_events (file_size=11880, data_size=<optimized out>,
      data_offset=<optimized out>, session=0x0) at util/session.c:1899
  #5  perf_session__process_events (session=0x0, session@entry=0x2c6fc30) at util/session.c:1953
  #6  0x000000000103b2ac in perf_session__list_build_ids (with_hits=<optimized out>, force=<optimized out>)
      at builtin-buildid-list.c:83
  #7  cmd_buildid_list (argc=<optimized out>, argv=<optimized out>) at builtin-buildid-list.c:115
  #8  0x00000000010a026c in run_builtin (p=0x1311f78 <commands+24>, argc=argc@entry=2, argv=argv@entry=0x3fffffff3c0)
      at perf.c:296
  #9  0x000000000102bc00 in handle_internal_command (argv=<optimized out>, argc=2) at perf.c:348
  #10 run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:392
  #11 main (argc=<optimized out>, argv=0x3fffffff3c0) at perf.c:536
  (gdb)

Fix it by adding a stub event handler for namespace event.

Committer testing:

Further clarifying, plain using 'perf buildid-list' will not end up in a
SEGFAULT when processing a perf.data file with namespace info:

  # perf record -a --namespaces sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.024 MB perf.data (1058 samples) ]
  # perf buildid-list | wc -l
  38
  # perf buildid-list | head -5
  e2a171c7b905826fc8494f0711ba76ab6abbd604 /lib/modules/4.14.0-rc3+/build/vmlinux
  874840a02d8f8a31cedd605d0b8653145472ced3 /lib/modules/4.14.0-rc3+/kernel/arch/x86/kvm/kvm-intel.ko
  ea7223776730cd8a22f320040aae4d54312984bc /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/i915/i915.ko
  5961535e6732a8edb7f22b3f148bb2fa2e0be4b9 /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/drm.ko
  f045f54aa78cf1931cc893f78b6cbc52c72a8cb1 /usr/lib64/libc-2.25.so
  #

It is only when one asks for checking what of those entries actually had
samples, i.e. when we use either -H or --with-hits, that we will process
all the PERF_RECORD_ events, and since tools/perf/builtin-buildid-list.c
neither explicitely set a perf_tool.namespaces() callback nor the
default stub was set that we end up, when processing a
PERF_RECORD_NAMESPACE record, causing a SEGFAULT:

  # perf buildid-list -H
  Segmentation fault (core dumped)
  ^C
  #

Reported-and-Tested-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Fixes: f3b3614 ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info")
Link: http://lkml.kernel.org/r/20171017132900.11043-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

lategoodbye pushed a commit that referenced this issue Jan 6, 2018

ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch
An additional 'ip' will be pushed to the stack, for restoring the
DACR later, if CONFIG_CPU_SW_DOMAIN_PAN defined.

However, the fixup still get the err_ptr by add #8*4 to sp, which
results in the fact that the code area pointed by the LR will be
overwritten, or the kernel will crash if CONFIG_DEBUG_RODATA is enabled.

This patch fixes the stack mismatch.

Fixes: a5e090a ("ARM: software-based priviledged-no-access support")
Signed-off-by: Lvqiang Huang <Lvqiang.Huang@spreadtrum.com>
Signed-off-by: Chunyan Zhang <zhang.lyra@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

anholt pushed a commit that referenced this issue Jan 17, 2018

x86/intel_rdt: Fix potential deadlock during resctrl mount
[ Upstream commit 87943db ]

Sai reported a warning during some MBA tests:

[  236.755559] ======================================================
[  236.762443] WARNING: possible circular locking dependency detected
[  236.769328] 4.14.0-rc4-yocto-standard #8 Not tainted
[  236.774857] ------------------------------------------------------
[  236.781738] mount/10091 is trying to acquire lock:
[  236.787071]  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8117f892>] static_key_enable+0x12/0x30
[  236.797058]
               but task is already holding lock:
[  236.803552]  (&type->s_umount_key#37/1){+.+.}, at: [<ffffffff81208b2f>] sget_userns+0x32f/0x520
[  236.813247]
               which lock already depends on the new lock.

[  236.822353]
               the existing dependency chain (in reverse order) is:
[  236.830686]
               -> #4 (&type->s_umount_key#37/1){+.+.}:
[  236.837756]        __lock_acquire+0x1100/0x11a0
[  236.842799]        lock_acquire+0xdf/0x1d0
[  236.847363]        down_write_nested+0x46/0x80
[  236.852310]        sget_userns+0x32f/0x520
[  236.856873]        kernfs_mount_ns+0x7e/0x1f0
[  236.861728]        rdt_mount+0x30c/0x440
[  236.866096]        mount_fs+0x38/0x150
[  236.870262]        vfs_kern_mount+0x67/0x150
[  236.875015]        do_mount+0x1df/0xd50
[  236.879286]        SyS_mount+0x95/0xe0
[  236.883464]        entry_SYSCALL_64_fastpath+0x18/0xad
[  236.889183]
               -> #3 (rdtgroup_mutex){+.+.}:
[  236.895292]        __lock_acquire+0x1100/0x11a0
[  236.900337]        lock_acquire+0xdf/0x1d0
[  236.904899]        __mutex_lock+0x80/0x8f0
[  236.909459]        mutex_lock_nested+0x1b/0x20
[  236.914407]        intel_rdt_online_cpu+0x3b/0x4a0
[  236.919745]        cpuhp_invoke_callback+0xce/0xb80
[  236.925177]        cpuhp_thread_fun+0x1c5/0x230
[  236.930222]        smpboot_thread_fn+0x11a/0x1e0
[  236.935362]        kthread+0x152/0x190
[  236.939536]        ret_from_fork+0x27/0x40
[  236.944097]
               -> #2 (cpuhp_state-up){+.+.}:
[  236.950199]        __lock_acquire+0x1100/0x11a0
[  236.955241]        lock_acquire+0xdf/0x1d0
[  236.959800]        cpuhp_issue_call+0x12e/0x1c0
[  236.964845]        __cpuhp_setup_state_cpuslocked+0x13b/0x2f0
[  236.971242]        __cpuhp_setup_state+0xa7/0x120
[  236.976483]        page_writeback_init+0x43/0x67
[  236.981623]        pagecache_init+0x38/0x3b
[  236.986281]        start_kernel+0x3c6/0x41a
[  236.990931]        x86_64_start_reservations+0x2a/0x2c
[  236.996650]        x86_64_start_kernel+0x72/0x75
[  237.001793]        verify_cpu+0x0/0xfb
[  237.005966]
               -> #1 (cpuhp_state_mutex){+.+.}:
[  237.012364]        __lock_acquire+0x1100/0x11a0
[  237.017408]        lock_acquire+0xdf/0x1d0
[  237.021969]        __mutex_lock+0x80/0x8f0
[  237.026527]        mutex_lock_nested+0x1b/0x20
[  237.031475]        __cpuhp_setup_state_cpuslocked+0x54/0x2f0
[  237.037777]        __cpuhp_setup_state+0xa7/0x120
[  237.043013]        page_alloc_init+0x28/0x30
[  237.047769]        start_kernel+0x148/0x41a
[  237.052425]        x86_64_start_reservations+0x2a/0x2c
[  237.058145]        x86_64_start_kernel+0x72/0x75
[  237.063284]        verify_cpu+0x0/0xfb
[  237.067456]
               -> #0 (cpu_hotplug_lock.rw_sem){++++}:
[  237.074436]        check_prev_add+0x401/0x800
[  237.079286]        __lock_acquire+0x1100/0x11a0
[  237.084330]        lock_acquire+0xdf/0x1d0
[  237.088890]        cpus_read_lock+0x42/0x90
[  237.093548]        static_key_enable+0x12/0x30
[  237.098496]        rdt_mount+0x406/0x440
[  237.102862]        mount_fs+0x38/0x150
[  237.107035]        vfs_kern_mount+0x67/0x150
[  237.111787]        do_mount+0x1df/0xd50
[  237.116058]        SyS_mount+0x95/0xe0
[  237.120233]        entry_SYSCALL_64_fastpath+0x18/0xad
[  237.125952]
               other info that might help us debug this:

[  237.134867] Chain exists of:
                 cpu_hotplug_lock.rw_sem --> rdtgroup_mutex --> &type->s_umount_key#37/1

[  237.148425]  Possible unsafe locking scenario:

[  237.155015]        CPU0                    CPU1
[  237.160057]        ----                    ----
[  237.165100]   lock(&type->s_umount_key#37/1);
[  237.169952]                                lock(rdtgroup_mutex);
[  237.176641]
lock(&type->s_umount_key#37/1);
[  237.184287]   lock(cpu_hotplug_lock.rw_sem);
[  237.189041]
                *** DEADLOCK ***

When the resctrl filesystem is mounted the locks must be acquired in the
same order as was done when the cpus came online:

     cpu_hotplug_lock before rdtgroup_mutex.

This also requires to switch the static_branch_enable() calls to the
_cpulocked variant because now cpu hotplug lock is held already.

[ tglx: Switched to cpus_read_[un]lock ]

Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Link: https://lkml.kernel.org/r/9c41b91bc2f47d9e95b62b213ecdb45623c47a9f.1508490116.git.reinette.chatre@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Feb 22, 2018

RDS: IB: Fix null pointer issue
Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall

PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
 #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
 #3 [ffff898e35f15b60] no_context at ffffffff8104854c
 #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
    RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
    RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
    R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
    R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
 #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
 #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
 #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
 #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
 #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
 #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
 #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
 #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

PID: 45659                          PID: 47039
rds_ib_laddr_check
  /* create id_priv with a null event_handler */
  rdma_create_id
  rdma_bind_addr
    cma_acquire_dev
      /* add id_priv to cma_dev->id_list */
      cma_attach_to_dev
                                    cma_ndev_work_handler
                                      /* event_hanlder is null */
                                      id_priv->id.event_handler

Signed-off-by: Guanglei Li <guanglei.li@oracle.com>
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

anholt pushed a commit that referenced this issue Mar 9, 2018

mm, swap, frontswap: fix THP swap if frontswap enabled
commit 7ba7166 upstream.

It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur in
random user space applications as follow,

kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
 #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
 #1  0x00007fc08889c2f3 malloc (libc.so.6)
 #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
 #3  0x0000560e6005e75c n/a (urxvt)
 #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
 #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
 #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
 #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
 #8  0x0000560e6005cb55 ev_run (urxvt)
 #9  0x0000560e6003b9b9 main (urxvt)
 #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
 #11 0x0000560e6003f9da _start (urxvt)

After bisection, it was found the first bad commit is bd4c82c ("mm,
THP, swap: delay splitting THP after swapped out").

The root cause is as follows:

When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages to
improve performance.  But zswap (frontswap) will treat THP as a normal
page, so only the head page is saved.  After swapping in, tail pages
will not be restored to their original contents, causing memory
corruption in the applications.

This is fixed by refusing to save page in the frontswap store functions
if the page is a THP.  So that the THP will be swapped out to swap
device.

Another choice is to split THP if frontswap is enabled.  But it is found
that the frontswap enabling isn't flexible.  For example, if
CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
zswap itself isn't enabled.

Frontswap has multiple backends, to make it easy for one backend to
enable THP support, the THP checking is put in backend frontswap store
functions instead of the general interfaces.

Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com
Fixes: bd4c82c ("mm, THP, swap: delay splitting THP after swapped out")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>	[put THP checking in backend]
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Shaohua Li <shli@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org>	[4.14]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Mar 9, 2018

ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch
[ Upstream commit 36b0cb8 ]

An additional 'ip' will be pushed to the stack, for restoring the
DACR later, if CONFIG_CPU_SW_DOMAIN_PAN defined.

However, the fixup still get the err_ptr by add #8*4 to sp, which
results in the fact that the code area pointed by the LR will be
overwritten, or the kernel will crash if CONFIG_DEBUG_RODATA is enabled.

This patch fixes the stack mismatch.

Fixes: a5e090a ("ARM: software-based priviledged-no-access support")
Signed-off-by: Lvqiang Huang <Lvqiang.Huang@spreadtrum.com>
Signed-off-by: Chunyan Zhang <zhang.lyra@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

stschake pushed a commit to stschake/linux that referenced this issue Mar 11, 2018

mm, swap, frontswap: fix THP swap if frontswap enabled
It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur in
random user space applications as follow,

kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
 #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
 #1  0x00007fc08889c2f3 malloc (libc.so.6)
 #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
 #3  0x0000560e6005e75c n/a (urxvt)
 #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
 #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
 #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
 #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
 #8  0x0000560e6005cb55 ev_run (urxvt)
 #9  0x0000560e6003b9b9 main (urxvt)
 #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
 #11 0x0000560e6003f9da _start (urxvt)

After bisection, it was found the first bad commit is bd4c82c ("mm,
THP, swap: delay splitting THP after swapped out").

The root cause is as follows:

When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages to
improve performance.  But zswap (frontswap) will treat THP as a normal
page, so only the head page is saved.  After swapping in, tail pages
will not be restored to their original contents, causing memory
corruption in the applications.

This is fixed by refusing to save page in the frontswap store functions
if the page is a THP.  So that the THP will be swapped out to swap
device.

Another choice is to split THP if frontswap is enabled.  But it is found
that the frontswap enabling isn't flexible.  For example, if
CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
zswap itself isn't enabled.

Frontswap has multiple backends, to make it easy for one backend to
enable THP support, the THP checking is put in backend frontswap store
functions instead of the general interfaces.

Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com
Fixes: bd4c82c ("mm, THP, swap: delay splitting THP after swapped out")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>	[put THP checking in backend]
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Shaohua Li <shli@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org>	[4.14]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

stschake pushed a commit to stschake/linux that referenced this issue Mar 11, 2018

bcache: fix kcrashes with fio in RAID5 backend dev
Kernel crashed when run fio in a RAID5 backend bcache device, the call
trace is bellow:
[  440.012034] kernel BUG at block/blk-ioc.c:146!
[  440.012696] invalid opcode: 0000 [#1] SMP NOPTI
[  440.026537] CPU: 2 PID: 2205 Comm: md127_raid5 Not tainted 4.15.0 #8
[  440.027441] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16
/2015
[  440.028615] RIP: 0010:put_io_context+0x8b/0x90
[  440.029246] RSP: 0018:ffffa8c882b43af8 EFLAGS: 00010246
[  440.029990] RAX: 0000000000000000 RBX: ffffa8c88294fca0 RCX: 0000000000
0f4240
[  440.031006] RDX: 0000000000000004 RSI: 0000000000000286 RDI: ffffa8c882
94fca0
[  440.032030] RBP: ffffa8c882b43b10 R08: 0000000000000003 R09: ffff949cb8
0c1700
[  440.033206] R10: 0000000000000104 R11: 000000000000b71c R12: 00000000000
01000
[  440.034222] R13: 0000000000000000 R14: ffff949cad84db70 R15: ffff949cb11
bd1e0
[  440.035239] FS:  0000000000000000(0000) GS:ffff949cba280000(0000) knlGS:
0000000000000000
[  440.060190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  440.084967] CR2: 00007ff0493ef000 CR3: 00000002f1e0a002 CR4: 00000000001
606e0
[  440.110498] Call Trace:
[  440.135443]  bio_disassociate_task+0x1b/0x60
[  440.160355]  bio_free+0x1b/0x60
[  440.184666]  bio_put+0x23/0x30
[  440.208272]  search_free+0x23/0x40 [bcache]
[  440.231448]  cached_dev_write_complete+0x31/0x70 [bcache]
[  440.254468]  closure_put+0xb6/0xd0 [bcache]
[  440.277087]  request_endio+0x30/0x40 [bcache]
[  440.298703]  bio_endio+0xa1/0x120
[  440.319644]  handle_stripe+0x418/0x2270 [raid456]
[  440.340614]  ? load_balance+0x17b/0x9c0
[  440.360506]  handle_active_stripes.isra.58+0x387/0x5a0 [raid456]
[  440.380675]  ? __release_stripe+0x15/0x20 [raid456]
[  440.400132]  raid5d+0x3ed/0x5d0 [raid456]
[  440.419193]  ? schedule+0x36/0x80
[  440.437932]  ? schedule_timeout+0x1d2/0x2f0
[  440.456136]  md_thread+0x122/0x150
[  440.473687]  ? wait_woken+0x80/0x80
[  440.491411]  kthread+0x102/0x140
[  440.508636]  ? find_pers+0x70/0x70
[  440.524927]  ? kthread_associate_blkcg+0xa0/0xa0
[  440.541791]  ret_from_fork+0x35/0x40
[  440.558020] Code: c2 48 00 5b 41 5c 41 5d 5d c3 48 89 c6 4c 89 e7 e8 bb c2
48 00 48 8b 3d bc 36 4b 01 48 89 de e8 7c f7 e0 ff 5b 41 5c 41 5d 5d c3 <0f> 0b
0f 1f 00 0f 1f 44 00 00 55 48 8d 47 b8 48 89 e5 41 57 41
[  440.610020] RIP: put_io_context+0x8b/0x90 RSP: ffffa8c882b43af8
[  440.628575] ---[ end trace a1fd79d85643a73e ]--

All the crash issue happened when a bypass IO coming, in such scenario
s->iop.bio is pointed to the s->orig_bio. In search_free(), it finishes the
s->orig_bio by calling bio_complete(), and after that, s->iop.bio became
invalid, then kernel would crash when calling bio_put(). Maybe its upper
layer's faulty, since bio should not be freed before we calling bio_put(),
but we'd better calling bio_put() first before calling bio_complete() to
notify upper layer ending this bio.

This patch moves bio_complete() under bio_put() to avoid kernel crash.

[mlyle: fixed commit subject for character limits]

Reported-by: Matthias Ferdinand <bcache@mfedv.net>
Tested-by: Matthias Ferdinand <bcache@mfedv.net>
Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

stschake pushed a commit to stschake/linux that referenced this issue Mar 16, 2018

bpf: allow xadd only on aligned memory
The requirements around atomic_add() / atomic64_add() resp. their
JIT implementations differ across architectures. E.g. while x86_64
seems just fine with BPF's xadd on unaligned memory, on arm64 it
triggers via interpreter but also JIT the following crash:

  [  830.864985] Unable to handle kernel paging request at virtual address ffff8097d7ed6703
  [...]
  [  830.916161] Internal error: Oops: 96000021 [#1] SMP
  [  830.984755] CPU: 37 PID: 2788 Comm: test_verifier Not tainted 4.16.0-rc2+ #8
  [  830.991790] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.29 07/17/2017
  [  830.998998] pstate: 80400005 (Nzcv daif +PAN -UAO)
  [  831.003793] pc : __ll_sc_atomic_add+0x4/0x18
  [  831.008055] lr : ___bpf_prog_run+0x1198/0x1588
  [  831.012485] sp : ffff00001ccabc20
  [  831.015786] x29: ffff00001ccabc20 x28: ffff8017d56a0f00
  [  831.021087] x27: 0000000000000001 x26: 0000000000000000
  [  831.026387] x25: 000000c168d9db98 x24: 0000000000000000
  [  831.031686] x23: ffff000008203878 x22: ffff000009488000
  [  831.036986] x21: ffff000008b14e28 x20: ffff00001ccabcb0
  [  831.042286] x19: ffff0000097b5080 x18: 0000000000000a03
  [  831.047585] x17: 0000000000000000 x16: 0000000000000000
  [  831.052885] x15: 0000ffffaeca8000 x14: 0000000000000000
  [  831.058184] x13: 0000000000000000 x12: 0000000000000000
  [  831.063484] x11: 0000000000000001 x10: 0000000000000000
  [  831.068783] x9 : 0000000000000000 x8 : 0000000000000000
  [  831.074083] x7 : 0000000000000000 x6 : 000580d428000000
  [  831.079383] x5 : 0000000000000018 x4 : 0000000000000000
  [  831.084682] x3 : ffff00001ccabcb0 x2 : 0000000000000001
  [  831.089982] x1 : ffff8097d7ed6703 x0 : 0000000000000001
  [  831.095282] Process test_verifier (pid: 2788, stack limit = 0x0000000018370044)
  [  831.102577] Call trace:
  [  831.105012]  __ll_sc_atomic_add+0x4/0x18
  [  831.108923]  __bpf_prog_run32+0x4c/0x70
  [  831.112748]  bpf_test_run+0x78/0xf8
  [  831.116224]  bpf_prog_test_run_xdp+0xb4/0x120
  [  831.120567]  SyS_bpf+0x77c/0x1110
  [  831.123873]  el0_svc_naked+0x30/0x34
  [  831.127437] Code: 97fffe97 17ffffec 00000000 f9800031 (885f7c31)

Reason for this is because memory is required to be aligned. In
case of BPF, we always enforce alignment in terms of stack access,
but not when accessing map values or packet data when the underlying
arch (e.g. arm64) has CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS set.

xadd on packet data that is local to us anyway is just wrong, so
forbid this case entirely. The only place where xadd makes sense in
fact are map values; xadd on stack is wrong as well, but it's been
around for much longer. Specifically enforce strict alignment in case
of xadd, so that we handle this case generically and avoid such crashes
in the first place.

Fixes: 17a5267 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

stschake pushed a commit to stschake/linux that referenced this issue Mar 16, 2018

Jiri Olsa Arnaldo Carvalho de Melo
perf record: Fix crash in pipe mode
Currently we can crash perf record when running in pipe mode, like:

  $ perf record ls | perf report
  # To display the perf.data header info, please use --header/--header-only options.
  #
  perf: Segmentation fault
  Error:
  The - file has no samples!

The callstack of the crash is:

    0x0000000000515242 in perf_event__synthesize_event_update_name
  3513            ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->id[0]);
  (gdb) bt
  #0  0x0000000000515242 in perf_event__synthesize_event_update_name
  #1  0x00000000005158a4 in perf_event__synthesize_extra_attr
  #2  0x0000000000443347 in record__synthesize
  #3  0x00000000004438e3 in __cmd_record
  #4  0x000000000044514e in cmd_record
  #5  0x00000000004cbc95 in run_builtin
  #6  0x00000000004cbf02 in handle_internal_command
  #7  0x00000000004cc054 in run_argv
  #8  0x00000000004cc422 in main

The reason of the crash is that the evsel does not have ids array
allocated and the pipe's synthesize code tries to access it.

We don't force evsel ids allocation when we have single event, because
it's not needed. However we need it when we are in pipe mode even for
single event as a key for evsel update event.

Fixing this by forcing evsel ids allocation event for single event, when
we are in pipe mode.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180302161354.30192-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

stschake pushed a commit to stschake/linux that referenced this issue Apr 11, 2018

fsl/fman: avoid sleeping in atomic context while adding an address
__dev_mc_add grabs an adress spinlock so use
atomic context in kmalloc.

/ # ifconfig eth0 inet 192.168.0.111
[   89.331622] BUG: sleeping function called from invalid context at mm/slab.h:420
[   89.339002] in_atomic(): 1, irqs_disabled(): 0, pid: 1035, name: ifconfig
[   89.345799] 2 locks held by ifconfig/1035:
[   89.349908]  #0:  (rtnl_mutex){+.+.}, at: [<(ptrval)>] devinet_ioctl+0xc0/0x8a0
[   89.357258]  #1:  (_xmit_ETHER){+...}, at: [<(ptrval)>] __dev_mc_add+0x28/0x80
[   89.364520] CPU: 1 PID: 1035 Comm: ifconfig Not tainted 4.16.0-rc3-dirty #8
[   89.371464] Call Trace:
[   89.373908] [e959db60] [c066f948] dump_stack+0xa4/0xfc (unreliable)
[   89.380177] [e959db80] [c00671d8] ___might_sleep+0x248/0x280
[   89.385833] [e959dba0] [c01aec34] kmem_cache_alloc_trace+0x174/0x320
[   89.392179] [e959dbd0] [c04ab920] dtsec_add_hash_mac_address+0x130/0x240
[   89.398874] [e959dc00] [c04a9d74] set_multi+0x174/0x1b0
[   89.404093] [e959dc30] [c04afb68] dpaa_set_rx_mode+0x68/0xe0
[   89.409745] [e959dc40] [c057baf8] __dev_mc_add+0x58/0x80
[   89.415052] [e959dc60] [c060fd64] igmp_group_added+0x164/0x190
[   89.420878] [e959dca0] [c060ffa8] ip_mc_inc_group+0x218/0x460
[   89.426617] [e959dce0] [c06120fc] ip_mc_up+0x3c/0x190
[   89.431662] [e959dd10] [c0607270] inetdev_event+0x250/0x620
[   89.437227] [e959dd50] [c005f190] notifier_call_chain+0x80/0xf0
[   89.443138] [e959dd80] [c0573a74] __dev_notify_flags+0x54/0xf0
[   89.448964] [e959dda0] [c05743f8] dev_change_flags+0x48/0x60
[   89.454615] [e959ddc0] [c0606744] devinet_ioctl+0x544/0x8a0
[   89.460180] [e959de10] [c060987c] inet_ioctl+0x9c/0x1f0
[   89.465400] [e959de80] [c05479a8] sock_ioctl+0x168/0x460
[   89.470708] [e959ded0] [c01cf3ec] do_vfs_ioctl+0xac/0x8c0
[   89.476099] [e959df20] [c01cfc40] SyS_ioctl+0x40/0xc0
[   89.481147] [e959df40] [c0011318] ret_from_syscall+0x0/0x3c
[   89.486715] --- interrupt: c01 at 0x1006943c
[   89.486715]     LR = 0x100c45ec

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Acked-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stschake pushed a commit to stschake/linux that referenced this issue Apr 11, 2018

btrfs: Fix NULL pointer exception in find_bio_stripe
On detaching of a disk which is a part of a RAID6 filesystem, the
following kernel OOPS may happen:

[63122.680461] BTRFS error (device sdo): bdev /dev/sdo errs: wr 0, rd 0, flush 1, corrupt 0, gen 0
[63122.719584] BTRFS warning (device sdo): lost page write due to IO error on /dev/sdo
[63122.719587] BTRFS error (device sdo): bdev /dev/sdo errs: wr 1, rd 0, flush 1, corrupt 0, gen 0
[63122.803516] BTRFS warning (device sdo): lost page write due to IO error on /dev/sdo
[63122.803519] BTRFS error (device sdo): bdev /dev/sdo errs: wr 2, rd 0, flush 1, corrupt 0, gen 0
[63122.863902] BTRFS critical (device sdo): fatal error on device /dev/sdo
[63122.935338] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[63122.946554] IP: fail_bio_stripe+0x58/0xa0 [btrfs]
[63122.958185] PGD 9ecda067 P4D 9ecda067 PUD b2b37067 PMD 0
[63122.971202] Oops: 0000 [#1] SMP
[63123.006760] CPU: 0 PID: 3979 Comm: kworker/u8:9 Tainted: G W 4.14.2-16-scst34x+ #8
[63123.007091] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[63123.007402] Workqueue: btrfs-worker btrfs_worker_helper [btrfs]
[63123.007595] task: ffff880036ea4040 task.stack: ffffc90006384000
[63123.007796] RIP: 0010:fail_bio_stripe+0x58/0xa0 [btrfs]
[63123.007968] RSP: 0018:ffffc90006387ad8 EFLAGS: 00010287
[63123.008140] RAX: 0000000000000002 RBX: ffff88004beaa0b8 RCX: ffff8800b2bd5690
[63123.008359] RDX: 0000000000000000 RSI: ffff88007bb43500 RDI: ffff88004beaa000
[63123.008621] RBP: ffffc90006387ae8 R08: 0000000099100000 R09: ffff8800b2bd5600
[63123.008840] R10: 0000000000000004 R11: 0000000000010000 R12: ffff88007bb43500
[63123.009059] R13: 00000000fffffffb R14: ffff880036fc5180 R15: 0000000000000004
[63123.009278] FS: 0000000000000000(0000) GS:ffff8800b7000000(0000) knlGS:0000000000000000
[63123.009564] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63123.009748] CR2: 0000000000000080 CR3: 00000000b0866000 CR4: 00000000000406f0
[63123.009969] Call Trace:
[63123.010085] raid_write_end_io+0x7e/0x80 [btrfs]
[63123.010251] bio_endio+0xa1/0x120
[63123.010378] generic_make_request+0x218/0x270
[63123.010921] submit_bio+0x66/0x130
[63123.011073] finish_rmw+0x3fc/0x5b0 [btrfs]
[63123.011245] full_stripe_write+0x96/0xc0 [btrfs]
[63123.011428] raid56_parity_write+0x117/0x170 [btrfs]
[63123.011604] btrfs_map_bio+0x2ec/0x320 [btrfs]
[63123.011759] ? ___cache_free+0x1c5/0x300
[63123.011909] __btrfs_submit_bio_done+0x26/0x50 [btrfs]
[63123.012087] run_one_async_done+0x9c/0xc0 [btrfs]
[63123.012257] normal_work_helper+0x19e/0x300 [btrfs]
[63123.012429] btrfs_worker_helper+0x12/0x20 [btrfs]
[63123.012656] process_one_work+0x14d/0x350
[63123.012888] worker_thread+0x4d/0x3a0
[63123.013026] ? _raw_spin_unlock_irqrestore+0x15/0x20
[63123.013192] kthread+0x109/0x140
[63123.013315] ? process_scheduled_works+0x40/0x40
[63123.013472] ? kthread_stop+0x110/0x110
[63123.013610] ret_from_fork+0x25/0x30
[63123.014469] RIP: fail_bio_stripe+0x58/0xa0 [btrfs] RSP: ffffc90006387ad8
[63123.014678] CR2: 0000000000000080
[63123.016590] ---[ end trace a295ea7259c17880 ]—

This is reproducible in a cycle, where a series of writes is followed by
SCSI device delete command. The test may take up to few minutes.

Fixes: 74d4699 ("block: replace bi_bdev with a gendisk pointer and partitions index")
[ no signed-off-by provided ]
Author: Dmitriy Gorokh <Dmitriy.Gorokh@wdc.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

anholt pushed a commit that referenced this issue Apr 17, 2018

bpf: allow xadd only on aligned memory
[ upstream commit ca36960 ]

The requirements around atomic_add() / atomic64_add() resp. their
JIT implementations differ across architectures. E.g. while x86_64
seems just fine with BPF's xadd on unaligned memory, on arm64 it
triggers via interpreter but also JIT the following crash:

  [  830.864985] Unable to handle kernel paging request at virtual address ffff8097d7ed6703
  [...]
  [  830.916161] Internal error: Oops: 96000021 [#1] SMP
  [  830.984755] CPU: 37 PID: 2788 Comm: test_verifier Not tainted 4.16.0-rc2+ #8
  [  830.991790] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.29 07/17/2017
  [  830.998998] pstate: 80400005 (Nzcv daif +PAN -UAO)
  [  831.003793] pc : __ll_sc_atomic_add+0x4/0x18
  [  831.008055] lr : ___bpf_prog_run+0x1198/0x1588
  [  831.012485] sp : ffff00001ccabc20
  [  831.015786] x29: ffff00001ccabc20 x28: ffff8017d56a0f00
  [  831.021087] x27: 0000000000000001 x26: 0000000000000000
  [  831.026387] x25: 000000c168d9db98 x24: 0000000000000000
  [  831.031686] x23: ffff000008203878 x22: ffff000009488000
  [  831.036986] x21: ffff000008b14e28 x20: ffff00001ccabcb0
  [  831.042286] x19: ffff0000097b5080 x18: 0000000000000a03
  [  831.047585] x17: 0000000000000000 x16: 0000000000000000
  [  831.052885] x15: 0000ffffaeca8000 x14: 0000000000000000
  [  831.058184] x13: 0000000000000000 x12: 0000000000000000
  [  831.063484] x11: 0000000000000001 x10: 0000000000000000
  [  831.068783] x9 : 0000000000000000 x8 : 0000000000000000
  [  831.074083] x7 : 0000000000000000 x6 : 000580d428000000
  [  831.079383] x5 : 0000000000000018 x4 : 0000000000000000
  [  831.084682] x3 : ffff00001ccabcb0 x2 : 0000000000000001
  [  831.089982] x1 : ffff8097d7ed6703 x0 : 0000000000000001
  [  831.095282] Process test_verifier (pid: 2788, stack limit = 0x0000000018370044)
  [  831.102577] Call trace:
  [  831.105012]  __ll_sc_atomic_add+0x4/0x18
  [  831.108923]  __bpf_prog_run32+0x4c/0x70
  [  831.112748]  bpf_test_run+0x78/0xf8
  [  831.116224]  bpf_prog_test_run_xdp+0xb4/0x120
  [  831.120567]  SyS_bpf+0x77c/0x1110
  [  831.123873]  el0_svc_naked+0x30/0x34
  [  831.127437] Code: 97fffe97 17ffffec 00000000 f9800031 (885f7c31)

Reason for this is because memory is required to be aligned. In
case of BPF, we always enforce alignment in terms of stack access,
but not when accessing map values or packet data when the underlying
arch (e.g. arm64) has CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS set.

xadd on packet data that is local to us anyway is just wrong, so
forbid this case entirely. The only place where xadd makes sense in
fact are map values; xadd on stack is wrong as well, but it's been
around for much longer. Specifically enforce strict alignment in case
of xadd, so that we handle this case generically and avoid such crashes
in the first place.

Fixes: 17a5267 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Apr 17, 2018

btrfs: Fix NULL pointer exception in find_bio_stripe
commit 047fdea upstream.

On detaching of a disk which is a part of a RAID6 filesystem, the
following kernel OOPS may happen:

[63122.680461] BTRFS error (device sdo): bdev /dev/sdo errs: wr 0, rd 0, flush 1, corrupt 0, gen 0
[63122.719584] BTRFS warning (device sdo): lost page write due to IO error on /dev/sdo
[63122.719587] BTRFS error (device sdo): bdev /dev/sdo errs: wr 1, rd 0, flush 1, corrupt 0, gen 0
[63122.803516] BTRFS warning (device sdo): lost page write due to IO error on /dev/sdo
[63122.803519] BTRFS error (device sdo): bdev /dev/sdo errs: wr 2, rd 0, flush 1, corrupt 0, gen 0
[63122.863902] BTRFS critical (device sdo): fatal error on device /dev/sdo
[63122.935338] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[63122.946554] IP: fail_bio_stripe+0x58/0xa0 [btrfs]
[63122.958185] PGD 9ecda067 P4D 9ecda067 PUD b2b37067 PMD 0
[63122.971202] Oops: 0000 [#1] SMP
[63123.006760] CPU: 0 PID: 3979 Comm: kworker/u8:9 Tainted: G W 4.14.2-16-scst34x+ #8
[63123.007091] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[63123.007402] Workqueue: btrfs-worker btrfs_worker_helper [btrfs]
[63123.007595] task: ffff880036ea4040 task.stack: ffffc90006384000
[63123.007796] RIP: 0010:fail_bio_stripe+0x58/0xa0 [btrfs]
[63123.007968] RSP: 0018:ffffc90006387ad8 EFLAGS: 00010287
[63123.008140] RAX: 0000000000000002 RBX: ffff88004beaa0b8 RCX: ffff8800b2bd5690
[63123.008359] RDX: 0000000000000000 RSI: ffff88007bb43500 RDI: ffff88004beaa000
[63123.008621] RBP: ffffc90006387ae8 R08: 0000000099100000 R09: ffff8800b2bd5600
[63123.008840] R10: 0000000000000004 R11: 0000000000010000 R12: ffff88007bb43500
[63123.009059] R13: 00000000fffffffb R14: ffff880036fc5180 R15: 0000000000000004
[63123.009278] FS: 0000000000000000(0000) GS:ffff8800b7000000(0000) knlGS:0000000000000000
[63123.009564] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63123.009748] CR2: 0000000000000080 CR3: 00000000b0866000 CR4: 00000000000406f0
[63123.009969] Call Trace:
[63123.010085] raid_write_end_io+0x7e/0x80 [btrfs]
[63123.010251] bio_endio+0xa1/0x120
[63123.010378] generic_make_request+0x218/0x270
[63123.010921] submit_bio+0x66/0x130
[63123.011073] finish_rmw+0x3fc/0x5b0 [btrfs]
[63123.011245] full_stripe_write+0x96/0xc0 [btrfs]
[63123.011428] raid56_parity_write+0x117/0x170 [btrfs]
[63123.011604] btrfs_map_bio+0x2ec/0x320 [btrfs]
[63123.011759] ? ___cache_free+0x1c5/0x300
[63123.011909] __btrfs_submit_bio_done+0x26/0x50 [btrfs]
[63123.012087] run_one_async_done+0x9c/0xc0 [btrfs]
[63123.012257] normal_work_helper+0x19e/0x300 [btrfs]
[63123.012429] btrfs_worker_helper+0x12/0x20 [btrfs]
[63123.012656] process_one_work+0x14d/0x350
[63123.012888] worker_thread+0x4d/0x3a0
[63123.013026] ? _raw_spin_unlock_irqrestore+0x15/0x20
[63123.013192] kthread+0x109/0x140
[63123.013315] ? process_scheduled_works+0x40/0x40
[63123.013472] ? kthread_stop+0x110/0x110
[63123.013610] ret_from_fork+0x25/0x30
[63123.014469] RIP: fail_bio_stripe+0x58/0xa0 [btrfs] RSP: ffffc90006387ad8
[63123.014678] CR2: 0000000000000080
[63123.016590] ---[ end trace a295ea7259c17880 ]—

This is reproducible in a cycle, where a series of writes is followed by
SCSI device delete command. The test may take up to few minutes.

Fixes: 74d4699 ("block: replace bi_bdev with a gendisk pointer and partitions index")
[ no signed-off-by provided ]
Author: Dmitriy Gorokh <Dmitriy.Gorokh@wdc.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Apr 17, 2018

scsi: libiscsi: Allow sd_shutdown on bad transport
[ Upstream commit d754941 ]

If, for any reason, userland shuts down iscsi transport interfaces
before proper logouts - like when logging in to LUNs manually, without
logging out on server shutdown, or when automated scripts can't
umount/logout from logged LUNs - kernel will hang forever on its
sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
still existent paths.

PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
 #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
 #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
 #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
 #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c

This happens because iscsi_eh_cmd_timed_out(), the transport layer
timeout helper, would tell the queue timeout function (scsi_times_out)
to reset the request timer over and over, until the session state is
back to logged in state. Unfortunately, during server shutdown, this
might never happen again.

Other option would be "not to handle" the issue in the transport
layer. That would trigger the error handler logic, which would also need
the session state to be logged in again.

Best option, for such case, is to tell upper layers that the command was
handled during the transport layer error handler helper, marking it as
DID_NO_CONNECT, which will allow completion and inform about the
problem.

After the session was marked as ISCSI_STATE_FAILED, due to the first
timeout during the server shutdown phase, all subsequent cmds will fail
to be queued, allowing upper logic to fail faster.

Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue May 7, 2018

RDS: IB: Fix null pointer issue
[ Upstream commit 2c0aa08 ]

Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall

PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
 #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
 #3 [ffff898e35f15b60] no_context at ffffffff8104854c
 #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
    RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
    RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
    R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
    R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
 #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
 #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
 #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
 #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
 #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
 #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
 #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
 #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

PID: 45659                          PID: 47039
rds_ib_laddr_check
  /* create id_priv with a null event_handler */
  rdma_create_id
  rdma_bind_addr
    cma_acquire_dev
      /* add id_priv to cma_dev->id_list */
      cma_attach_to_dev
                                    cma_ndev_work_handler
                                      /* event_hanlder is null */
                                      id_priv->id.event_handler

Signed-off-by: Guanglei Li <guanglei.li@oracle.com>
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

anholt pushed a commit that referenced this issue Jun 22, 2018

mm/ksm.c: ignore STABLE_FLAG of rmap_item->address in rmap_walk_ksm()
In our armv8a server(QDF2400), I noticed lots of WARN_ON caused by
PAGE_SIZE unaligned for rmap_item->address under memory pressure
tests(start 20 guests and run memhog in the host).

  WARNING: CPU: 4 PID: 4641 at virt/kvm/arm/mmu.c:1826 kvm_age_hva_handler+0xc0/0xc8
  CPU: 4 PID: 4641 Comm: memhog Tainted: G        W 4.17.0-rc3+ #8
  Call trace:
   kvm_age_hva_handler+0xc0/0xc8
   handle_hva_to_gpa+0xa8/0xe0
   kvm_age_hva+0x4c/0xe8
   kvm_mmu_notifier_clear_flush_young+0x54/0x98
   __mmu_notifier_clear_flush_young+0x6c/0xa0
   page_referenced_one+0x154/0x1d8
   rmap_walk_ksm+0x12c/0x1d0
   rmap_walk+0x94/0xa0
   page_referenced+0x194/0x1b0
   shrink_page_list+0x674/0xc28
   shrink_inactive_list+0x26c/0x5b8
   shrink_node_memcg+0x35c/0x620
   shrink_node+0x100/0x430
   do_try_to_free_pages+0xe0/0x3a8
   try_to_free_pages+0xe4/0x230
   __alloc_pages_nodemask+0x564/0xdc0
   alloc_pages_vma+0x90/0x228
   do_anonymous_page+0xc8/0x4d0
   __handle_mm_fault+0x4a0/0x508
   handle_mm_fault+0xf8/0x1b0
   do_page_fault+0x218/0x4b8
   do_translation_fault+0x90/0xa0
   do_mem_abort+0x68/0xf0
   el0_da+0x24/0x28

In rmap_walk_ksm, the rmap_item->address might still have the
STABLE_FLAG, then the start and end in handle_hva_to_gpa might not be
PAGE_SIZE aligned.  Thus it will cause exceptions in handle_hva_to_gpa
on arm64.

This patch fixes it by ignoring (not removing) the low bits of address
when doing rmap_walk_ksm.

IMO, it should be backported to stable tree.  the storm of WARN_ONs is
very easy for me to reproduce.  More than that, I watched a panic (not
reproducible) as follows:

  page:ffff7fe003742d80 count:-4871 mapcount:-2126053375 mapping: (null) index:0x0
  flags: 0x1fffc00000000000()
  raw: 1fffc00000000000 0000000000000000 0000000000000000 ffffecf981470000
  raw: dead000000000100 dead000000000200 ffff8017c001c000 0000000000000000
  page dumped because: nonzero _refcount
  CPU: 29 PID: 18323 Comm: qemu-kvm Tainted: G W 4.14.15-5.hxt.aarch64 #1
  Hardware name: <snip for confidential issues>
  Call trace:
    dump_backtrace+0x0/0x22c
    show_stack+0x24/0x2c
    dump_stack+0x8c/0xb0
    bad_page+0xf4/0x154
    free_pages_check_bad+0x90/0x9c
    free_pcppages_bulk+0x464/0x518
    free_hot_cold_page+0x22c/0x300
    __put_page+0x54/0x60
    unmap_stage2_range+0x170/0x2b4
    kvm_unmap_hva_handler+0x30/0x40
    handle_hva_to_gpa+0xb0/0xec
    kvm_unmap_hva_range+0x5c/0xd0

I even injected a fault on purpose in kvm_unmap_hva_range by seting
size=size-0x200, the call trace is similar as above.  So I thought the
panic is similarly caused by the root cause of WARN_ON.

Andrea said:

: It looks a straightforward safe fix, on x86 hva_to_gfn_memslot would
: zap those bits and hide the misalignment caused by the low metadata
: bits being erroneously left set in the address, but the arm code
: notices when that's the last page in the memslot and the hva_end is
: getting aligned and the size is below one page.
:
: I think the problem triggers in the addr += PAGE_SIZE of
: unmap_stage2_ptes that never matches end because end is aligned but
: addr is not.
:
: 	} while (pte++, addr += PAGE_SIZE, addr != end);
:
: x86 again only works on hva_start/hva_end after converting it to
: gfn_start/end and that being in pfn units the bits are zapped before
: they risk to cause trouble.

Jia He said:

: I've tested by myself in arm64 server (QDF2400,46 cpus,96G mem) Without
: this patch, the WARN_ON is very easy for reproducing.  After this patch, I
: have run the same benchmarch for a whole day without any WARN_ONs

Link: http://lkml.kernel.org/r/1525403506-6750-1-git-send-email-hejianet@gmail.com
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Jia He <hejianet@gmail.com>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment