Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux 4.14 breakage tracking issue #31640

Closed
12 of 13 tasks
orivej opened this issue Nov 14, 2017 · 36 comments
Closed
12 of 13 tasks

Linux 4.14 breakage tracking issue #31640

orivej opened this issue Nov 14, 2017 · 36 comments

Comments

@orivej
Copy link
Contributor

orivej commented Nov 14, 2017

Here is the list of packages that broke after the switch to 4.14 (bfe9c92), which was reverted back to 4.9 in eb85eb5. Probably the two legacy packages may be restricted to older kernels, and the others should be fixed.

@NeQuissimus
Copy link
Member

I need help in #31037, that should take care of VirtualBox

@NeQuissimus
Copy link
Member

I have been trying to get amdgpu-pro fixed for a little while now but I think it does require an update from AMD's side for kernel 4.14.x

Or it's because I don't know make too well, the build errors out like so:

  CC [M]  /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkfd/kfd_crat.o
  CC [M]  /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkfd/kfd_rdma.o
  CC [M]  /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkfd/kfd_peerdirect.o
  CC [M]  /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkfd/kfd_ipc.o
  CC [M]  /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkfd/kfd_debugfs.o
  LD [M]  /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkfd/amdkfd.o
make[2]: *** [/nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/Makefile:1503: _module_/tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261] Error 2
make[1]: *** [Makefile:146: sub-make] Error 2
make: *** [Makefile:24: __sub-make] Error 2
make: Leaving directory '/nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/build'
note: keeping build directory ‘/tmp/nix-build-amdgpu-pro-17.40-4.14.drv-1’
builder for ‘/nix/store/z5q0c0q0gqd02skgxh94wgiccpx3jsc8-amdgpu-pro-17.40-4.14.drv’ failed with exit code 2
error: build of ‘/nix/store/z5q0c0q0gqd02skgxh94wgiccpx3jsc8-amdgpu-pro-17.40-4.14.drv’ failed

@orivej
Copy link
Contributor Author

orivej commented Nov 14, 2017

Re. make you have posted Error 2, a secondary error. The actual cause of the failure is Error 1 somewhere above.

@MP2E
Copy link

MP2E commented Nov 14, 2017

Working on mwprocapture now, it compiles on my local tree but doesn't work yet, as vfs_write and vfs_read are no longer exported

Also, with PR #31148 nvidia_x11 builds and works successfully with kernel 4.14.0, on my end

@NeQuissimus
Copy link
Member

NeQuissimus commented Nov 14, 2017

@orivej Ah, I had to scroll WAY up :)
It looks bad (to the untrained eye :D)...

/tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/include/kcl/kcl_drm.h:271:29: error: incompatible type for argument 7 of 'drm_universal_plane_init'
      formats, format_count, type, name);
                             ^~~~
In file included from /nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/include/drm/drm_crtc.h:45:0,
                 from /nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/include/drm/drmP.h:69,
                 from /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/include/kcl/kcl_drm.h:6,
                 from /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkcl/kcl_drm.c:1:
/nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/include/drm/drm_plane.h:548:5: note: expected 'const uint64_t * {aka const long long unsigned int *}' but argument is of type 'enum drm_plane_type'
 int drm_universal_plane_init(struct drm_device *dev,
     ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkcl/kcl_drm.c:1:0:
/tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/include/kcl/kcl_drm.h:271:35: error: incompatible type for argument 8 of 'drm_universal_plane_init'
      formats, format_count, type, name);
                                   ^~~~
In file included from /nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/include/drm/drm_crtc.h:45:0,
                 from /nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/include/drm/drmP.h:69,
                 from /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/include/kcl/kcl_drm.h:6,
                 from /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkcl/kcl_drm.c:1:
/nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/include/drm/drm_plane.h:548:5: note: expected 'enum drm_plane_type' but argument is of type 'const char *'
 int drm_universal_plane_init(struct drm_device *dev,
     ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkcl/kcl_drm.c:1:0:
/tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/include/kcl/kcl_drm.h:270:10: error: too few arguments to function 'drm_universal_plane_init'
   return drm_universal_plane_init(dev, plane, possible_crtcs, funcs,
          ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from /nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/include/drm/drm_crtc.h:45:0,
                 from /nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/include/drm/drmP.h:69,
                 from /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/include/kcl/kcl_drm.h:6,
                 from /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkcl/kcl_drm.c:1:
/nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/include/drm/drm_plane.h:548:5: note: declared here
 int drm_universal_plane_init(struct drm_device *dev,
     ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkcl/kcl_drm.c:1:0:
/tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/include/kcl/kcl_drm.h: In function 'kcl_drm_calc_vbltimestamp_from_scanoutpos':
/tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/include/kcl/kcl_drm.h:332:9: error: too many arguments to function 'drm_calc_vbltimestamp_from_scanoutpos'
  return drm_calc_vbltimestamp_from_scanoutpos(dev, pipe, max_error, vblank_time,
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/include/drm/drmP.h:83:0,
                 from /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/include/kcl/kcl_drm.h:6,
                 from /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkcl/kcl_drm.c:1:
/nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/include/drm/drm_vblank.h:173:6: note: declared here
 bool drm_calc_vbltimestamp_from_scanoutpos(struct drm_device *dev,
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make[4]: *** [/nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/scripts/Makefile.build:314: /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkcl/kcl_drm.o] Error 1
make[3]: *** [/nix/store/prk1nwg5s1619jbrpp67v5jr3xpv7347-linux-4.14-dev/lib/modules/4.14.0/source/scripts/Makefile.build:573: /tmp/nix-build-amdgpu-pro-17.40-4.14.drv-0/amdgpu-pro-17.40-492261/root/usr/src/amdgpu-17.40-492261/amd/amdkcl] Error 2

All this "too many arguments", "too few arguments" sound like an API mismatch to me..

@NeQuissimus
Copy link
Member

NeQuissimus commented Nov 14, 2017

Maybe we should just undo declaring 4.14 the new default for the moment, so we unblock the channel?!

It's pretty late where I am, somebody please do so? :) thx!

@MP2E
Copy link

MP2E commented Nov 14, 2017

Fixed mwprocapture as of ce8dea6 !

(Linking fully works, and I recorded video/did a quick test stream)

@NeQuissimus no problem, I will set the attribute back to version 4.9 and it can be set back when the dust settles a bit!

@orivej
Copy link
Contributor Author

orivej commented Nov 14, 2017

@MP2E This may or may not be quite correct, see what ZFS developers did: https://github.com/zfsonlinux/spl/pull/666/files#diff-fc21f15a0f248a8ac8af4d3a06d86a89

MP2E pushed a commit that referenced this issue Nov 14, 2017
partially reverts bfe9c92 by author's request, since many modules
are currently broken. 4.14 will be the default kernel when the dust
settles

track github issue #31640
@orivej
Copy link
Contributor Author

orivej commented Nov 14, 2017

@NeQuissimus I think so, let's fix this issue and retry.

@MP2E
Copy link

MP2E commented Nov 14, 2017

Thanks for the pointer, after looking through the source a lot more carefully, it seems that it works as-is, but I noticed that one of the types was wrong, so I fixed it and updated the patch

@fpletz fpletz added this to the 18.03 milestone Nov 14, 2017
@NeQuissimus
Copy link
Member

Virtualbox will work with 4.14 once 5.2.1 is out: https://www.virtualbox.org/ticket/17080

@Baughn
Copy link
Contributor

Baughn commented Nov 15, 2017

Worth noting that Linux 4.9 doesn't work properly with Threadripper CPUs, specifically throwing a lot of bus errors due to a problematic ASPM implementation.

4.13.11 seems to be fine. Maybe we could use that?

@adisbladis
Copy link
Member

zfsUnstable resolved in #31729

@vcunat
Copy link
Member

vcunat commented Nov 16, 2017

@Baughn: 4.13 is going away very soon. On the other hand, GKG will be maintaining 4.14 for years.

@Baughn
Copy link
Contributor

Baughn commented Nov 23, 2017

@vcunat Yeah, I didn't mean permanently. :)

For the moment I've pinned my workstation to 4.13, but I'll be tracking the latest kernel for a while anyway; there are hardware features that won't be supported until at least 4.15, such as the CPU temperature sensors.

I realize my use case isn't common. Luckily it's a single-line fix for me. :P

@aszlig
Copy link
Member

aszlig commented Nov 24, 2017

VirtualBox done in e5c24ab.

aszlig added a commit that referenced this issue Nov 24, 2017
Upstream changes without issue IDs:

 * User interface: various improvements for high resolution screens
 * User interface: added functionality to duplicate optical and floppy
                   images
 * User interface: various improvements for the virtual media manager
 * VMM: fixed emulation so that Plan 9 guests can start once more (5.1.0
        regression)
 * Storage: fixed regression breaking iSCSI
 * Audio: added HDA support for more exotic guests (e.g. Haiku)
 * Serial: fixed hanging I/O when using named pipes on Windows (5.2.0
           regression)
 * Serial: fixed broken communication with certain devices on Linux
           hosts
 * USB/OHCI: improved behavior so that the controller state after a VM
             reset is closer to the initial state after VM start
 * EFI: fixed HFS+ driver which in rare cases failed to access most
        files on a volume
 * Shared clipboard: fixed hang with OS X host and Linux guest
 * Linux hosts: fixed kernel module compilation and start failures with
                Linux kernel 4.14
 * X11 hosts: better handle WM_CLASS setting
 * Linux guests: fixed kernel module compilation and other problems with
                 Linux kernel 4.14
 * Linux guests: fixed various 5.2.0 regressions
 * Bridged networking: fixed duplicate EtherType in VLAN/priority tags
                       on Linux (5.2.0 regression)

The full changelog including issue IDs can be found at:

https://www.virtualbox.org/wiki/Changelog

Aside from just bumping the version number I also had to strip 3 levels
of the paths included in the guest-additions patches, because the
version was hardcoded in there and the patches still apply as-is.

I've re-added the stripped path using patchFlags and the -d option of
the patch utility.

Tested this by running all of the tests in the "virtualbox" NixOS VM
test module, here is the URL to the finished evaluation on my Hydra:

https://headcounter.org/hydra/eval/380191

Signed-off-by: aszlig <aszlig@nix.build>
Cc: @NeQuissimus, @orivej, @etu, @vcunat
Issue: #31640
Issue: #31037
adisbladis added a commit that referenced this issue Dec 4, 2017
This reverts commit b39ab30.

There are some show stopper issues in the 4.14 kernel that are still
not resolved.

#31640
vcunat added a commit that referenced this issue Dec 5, 2017
vcunat added a commit that referenced this issue Dec 5, 2017
vcunat pushed a commit that referenced this issue Dec 6, 2017
This reverts commit b39ab30.

There are some show stopper issues in the 4.14 kernel that are still
not resolved.

#31640
(cherry picked from commit 74857c9)
andir added a commit to andir/nixpkgs that referenced this issue Feb 8, 2018
This change is done for completness in regards to [1] since we already
support a newer "stable" version.

nvidia_x11_beta now compiles on both 4.14 and 4.15.

[1] NixOS#31640
vdemeester pushed a commit to vdemeester/nixpkgs that referenced this issue Feb 8, 2018
This change is done for completness in regards to [1] since we already
support a newer "stable" version.

nvidia_x11_beta now compiles on both 4.14 and 4.15.

[1] NixOS#31640
@vcunat
Copy link
Member

vcunat commented Feb 13, 2018

Do you think anything is blocking update of default to 4.14 (now)? For amdgpu-pro we can add an assertion advising users to switch to 4.9 manually. Nixpkgs 18.03 is to be forked in about two weeks.

@NeQuissimus
Copy link
Member

I've been running 4.14.x and 4.15.x for quite a while now, I have had no issues. (albeit Intel GPU)

@vcunat
Copy link
Member

vcunat commented Feb 13, 2018

For the record, during the last two weeks I've been using 4.14 only. It was mostly on a pure Intel system (Skylake-S) without any problems, and also on Ryzen + Nvidia with just those Ryzen bugs (probably).

@fpletz
Copy link
Member

fpletz commented Feb 13, 2018

We have not experienced problems with current 4.14 kernels on servers, VMs and laptops so far. As 4.14 also has long-term support, I also think we should default to that version for 18.03.

@vcunat
Copy link
Member

vcunat commented Feb 22, 2018

Yes, we keep all kernels supported upstream, i.e. six longterm branches now from which two are to be dropped soon. 4.14 is default in master now.

@adisbladis
Copy link
Member

adisbladis commented Feb 22, 2018

It should be ok to change default to 4.14 as long as 4.9 and 4.4 are still around

4.14 is supported until January 2019 and 4.4 until February 2022 so I see no reason we should drop these. :)

@fpletz fpletz removed their assignment Feb 23, 2018
@matthewbauer matthewbauer modified the milestones: 18.03, 18.09 Apr 17, 2018
@Ekleog
Copy link
Member

Ekleog commented Oct 10, 2018

(triage) 4.14 is currently the default on 18.03, so I assume it has been for quite a while. If no issue has been reported… maybe this issue should be closed?

@fpletz
Copy link
Member

fpletz commented Oct 11, 2018

Yeah, we can close this. Just checked the remaining issues: the amdgpu PR was merged and the Intel 3D issues were fixed upstream.

@fpletz fpletz closed this as completed Oct 11, 2018
@vcunat
Copy link
Member

vcunat commented Oct 11, 2018

amdgpu-pro bump was merged. It doesn't build now with 4.14, but that's probably just a minor libelf issue. On the whole it seems like it has few users if any.

@vcunat
Copy link
Member

vcunat commented Oct 11, 2018

Hmm, no, the obvious fix isn't enough:

--- a/pkgs/os-specific/linux/amdgpu-pro/default.nix
+++ b/pkgs/os-specific/linux/amdgpu-pro/default.nix
@@ -164,7 +164,7 @@ in stdenv.mkDerivation rec {
     done
   '';
 
-  buildInputs = [
+  nativeBuildInputs = kernel.moduleBuildDependencies ++ [
     patchelf
     perl
   ];

It fixes the ORC problem but fails later on some incompatibilities – it's apparently too new kernel (or libdrm or something) for that driver version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

12 participants