Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[20.09][21.03] PRIME render offload doesn't work on latest builds #98942

Closed
Alderaeney opened this issue Sep 27, 2020 · 30 comments · Fixed by #170695
Closed

[20.09][21.03] PRIME render offload doesn't work on latest builds #98942

Alderaeney opened this issue Sep 27, 2020 · 30 comments · Fixed by #170695

Comments

@Alderaeney
Copy link

Issue description

PRIME render offload stopped working on the latest update both on 20.09 channel and unstable channel, I've found that changing the bootloader and removing nvidia-drm.modeset=1 sometimes makes it work and sometimes don't, but maybe it's something else I changed.

xrandr --listproviders

Providers: number : 1
Provider 0: id: 0x46 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 4 associated providers: 0 name:modesetting

nvidia-offload glxinfo

name of display: :1
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  152 (GLX)
  Minor opcode of failed request:  24 (X_GLXCreateNewContext)
  Value in failed request:  0x0
  Serial number of failed request:  39
  Current serial number in output stream:  40

nvidia-smi

No devices were found

Steps to reproduce

1.- Install 20.09 or 21.03
2.- Run nixos-rebuild boot --upgrade
3.- Reboot

Technical details

configuration.nix

{ config, pkgs, ... }:

let
  nvidia-offload = pkgs.writeShellScriptBin "nvidia-offload" ''
    export __NV_PRIME_RENDER_OFFLOAD=1
    export __NV_PRIME_RENDER_OFFLOAD_PROVIDER=NVIDIA-G0
    export __GLX_VENDOR_LIBRARY_NAME=nvidia
    export __VK_LAYER_NV_optimus=NVIDIA_only
    exec -a "$0" "$@"
  '';
in
{
. . .
  # Xserver configuration
  services.xserver = {
    enable = true;

    # Xserver keyboard configuration
    layout = "es";
    xkbOptions = "eurosign:e";

    # Use libinput for trackpad support
    libinput.enable = true;

    # Wacom tablet support
    wacom.enable = true;

    # Use nvidia drivers
    videoDrivers = [ "nvidia" ];

    # Gnome3 desktop configuration
    displayManager = {
      gdm = {
        enable = true;
        wayland = false;
      };
    };
    desktopManager.gnome3.enable = true;
  };

  hardware.nvidia = {
    powerManagement.enable = true;
    prime = {
      offload.enable = true;
      intelBusId = "PCI:0:2:0";
      nvidiaBusId = "PCI:1:0:0";
    };
  };

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

  • system: "x86_64-linux"
  • host os: Linux 5.8.10, NixOS, 21.03pre244416.daaa0e33505 (Okapi)
  • multi-user?: yes
  • sandbox: yes
  • version: nix-env (Nix) 2.3.7
  • channels(root): "nixos-21.03pre244416.daaa0e33505"
  • nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos
@eadwu
Copy link
Member

eadwu commented Sep 27, 2020

journalctl output?

GPU?

I'm not really sure how to go about making it more resilient. I personally have a 1050 TI and have never had any problems (though I never go on stable since I'm following master so that might be where the source of issues are).

It looks like you're missing NVIDIA-G0 from your providers, might be a issue regarding glamor then from an initial look?

@Alderaeney
Copy link
Author

Sorry, I have a 1050 4gb, I've been using this same configuration.nix since some months ago and never failed to work, so I'm assuming this is an issue with a recent update, here are the journalctl and journalctl -xe outputs:
https://gist.github.com/Alderaeney/e52282c609f8d1d0cc5f88e38bc07336
https://gist.github.com/Alderaeney/d31759eb6588672306a22988733a2d88

@kevincox
Copy link
Contributor

kevincox commented Oct 8, 2020

I see the following in the log:

sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (II) NVIDIA GLX Module  450.66  Wed Aug 12 19:41:37 UTC 2020
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (II) NVIDIA: The X server supports PRIME Render Offload.
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:1:0:0.  Please
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(GPU-0):     check your system's kernel log for additional error
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(GPU-0):     messages and refer to Chapter 8: Common Problems in the
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(GPU-0):     README for additional information.
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(G0): Failing initialization of X screen

I also see that if I enable modesetting. But have the same issue if I don't. It seems like the GPU is available and online but the GLX is not set up. For example I can see nvidia-smi.

% sudo nvidia-smi
Wed Oct  7 20:04:40 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66       Driver Version: 450.66       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    On   | 00000000:01:00.0 Off |                  N/A |
| N/A   57C    P8    N/A /  N/A |      3MiB /  2004MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1039      G   ...-xorg-server-1.20.8/bin/X        3MiB |
+-----------------------------------------------------------------------------+

But if I pass the __GLX_VENDOR_LIBRARY_NAME=nvidia I get:

% __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo
name of display: :0
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  156 (NV-GLX)
  Minor opcode of failed request:  6 ()
  Value in failed request:  0x0
  Serial number of failed request:  86
  Current serial number in output stream:  86

@bryanasdev000
Copy link
Member

@kevincox just got it working at #90152 (comment).

@lightdiscord
Copy link
Member

lightdiscord commented Oct 20, 2020

I had the same issue with gdm, switching to lightdm fixed it. (I'm currently searching how to fix it for gdm).

@kevincox
Copy link
Contributor

Weird, it works fine for me with gdm.

@Alderaeney
Copy link
Author

I've tested the solution on #90152 and doesn't work for me, the issue I'm having is that gdm fails to initialize the GPU and consequently it can't configure prime render offload, I'm pretty sure this had to be an update to something (quite probably gdm) because the days before it worked fine with the same configuration.

This is basically the root of the issue:

sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (II) NVIDIA GLX Module  450.66  Wed Aug 12 19:41:37 UTC 2020
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (II) NVIDIA: The X server supports PRIME Render Offload.
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:1:0:0.  Please
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(GPU-0):     check your system's kernel log for additional error
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(GPU-0):     messages and refer to Chapter 8: Common Problems in the
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(GPU-0):     README for additional information.
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
sep 27 21:19:51 link-gl63-8rc /nix/store/2ngdc7ln6zsg5bajxqi4fjiwv8fa0cf7-gdm-3.34.1/libexec/gdm-x-session[1864]: (EE) NVIDIA(G0): Failing initialization of X screen

@bryanasdev000
Copy link
Member

@Alderaeney this may be relevant:

https://forums.developer.nvidia.com/t/nvidia-driver-not-yet-supported-for-linux-kernel-5-9/157263

I am without my notebook for now, but as soon as I can I will try to test PRIME on unstable.

@ishan9299
Copy link

@bryanasdev000 on checking the logs @Alderaeney there is some info regarding it in nvidia docs.
http://download.nvidia.com/XFree86/Linux-x86_64/440.31/README/commonproblems.html
Will try some stuff and give an update.

@ishan9299
Copy link

@bryanasdev000 I had to enable nvidiaPersistenced to get it working.

❯ nvidia-smi
Sat Mar 13 10:24:04 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.56       Driver Version: 460.56       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce 930MX       On   | 00000000:01:00.0 Off |                  N/A |
| N/A   59C    P8    N/A /  N/A |      3MiB /  2004MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2097      G   ...xorg-server-1.20.10/bin/X        2MiB |
+-----------------------------------------------------------------------------+

I am also using gdm currently.

@bryanasdev000
Copy link
Member

@bryanasdev000 I had to enable nvidiaPersistenced to get it working.

❯ nvidia-smi
Sat Mar 13 10:24:04 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.56       Driver Version: 460.56       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce 930MX       On   | 00000000:01:00.0 Off |                  N/A |
| N/A   59C    P8    N/A /  N/A |      3MiB /  2004MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2097      G   ...xorg-server-1.20.10/bin/X        2MiB |
+-----------------------------------------------------------------------------+

I am also using gdm currently.

Awesome! I didn't have a lot of time to work on it these days but I will test it as soon as possible on my laptop.

Interestingly, having to force the GPU to stay on means that at some point the GDM is failing to discover/activate the GPU...

@jcdickinson
Copy link
Contributor

Here's the curious thing. With wayland (not x11) everything works, except the NVIDIA settings app, with GDM. Things do not work with lightdm. nvidia-offload glxgears displays the gears, nvidia-offload glxinfo lists the NVIDIA GPU (while plain glxinfo lists the other GPU). Given NVIDIA shenanigans, this will probably change.

I have an AMD (Renior) and NVIDIA laptop.

My config:

 { config, pkgs, ... }:

let
  nvidia-offload = pkgs.writeShellScriptBin "nvidia-offload" ''
    export __NV_PRIME_RENDER_OFFLOAD=1
    export __NV_PRIME_RENDER_OFFLOAD_PROVIDER=NVIDIA-G0
    export __GLX_VENDOR_LIBRARY_NAME=nvidia
    export __VK_LAYER_NV_optimus=NVIDIA_only
    exec -a "$0" "$@"
  '';
in
{
  system.autoUpgrade.channel = "https://nixos.org/channels/nixos-21.05/";
  nixpkgs.config.allowUnfree = true;
  
  imports =
    [ # Include the results of the hardware scan.
      ./hardware-configuration.nix
    ];

  # Use the systemd-boot EFI boot loader.
  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  networking.hostName = "jono-laptop"; # Define your hostname.
  # networking.wireless.enable = true;  # Enables wireless support via wpa_supplicant.

  # Set your time zone.
  time.timeZone = "America/New_York";

  # The global useDHCP flag is deprecated, therefore explicitly set to false here.
  # Per-interface useDHCP will be mandatory in the future, so this generated config
  # replicates the default behaviour.
  networking.useDHCP = false;
  networking.interfaces.eno1.useDHCP = true;
  networking.interfaces.wlp4s0.useDHCP = true;

  # Configure network proxy if necessary
  # networking.proxy.default = "http://user:password@proxy:port/";
  # networking.proxy.noProxy = "127.0.0.1,localhost,internal.domain";

  # Select internationalisation properties.
  # i18n.defaultLocale = "en_US.UTF-8";
  # console = {
  #   font = "Lat2-Terminus16";
  #   keyMap = "us";
  # };

  # Enable the X11 windowing system.
  services.xserver.enable = true;
  services.xserver.displayManager.gdm.enable = true;
  services.xserver.displayManager.gdm.wayland = true;
  services.xserver.displayManager.gdm.nvidiaWayland = true;
  services.xserver.desktopManager.gnome.enable = true;
  services.xserver.videoDrivers = [ "modeset" "nvidia" ];
  hardware.nvidia = {
    powerManagement.enable = true;
    modesetting.enable = true;
    prime = {
      offload.enable = true;
      amdgpuBusId = "PCI:5:0:0";
      nvidiaBusId = "PCI:1:0:0";
    };
  };
  

  # Configure keymap in X11
  services.xserver.layout = "us";
  # services.xserver.xkbOptions = "eurosign:e";

  # Enable CUPS to print documents.
  # services.printing.enable = true;

  # Enable sound.
  sound.enable = true;
  hardware.pulseaudio.enable = true;

  # Enable touchpad support (enabled default in most desktopManager).
  services.xserver.libinput.enable = true;

  # Define a user account. Don't forget to set a password with ‘passwd’.
  users.users.jono = {
     isNormalUser = true;
     extraGroups = [
      "video"
      "networkmanager"
      "wheel"
     ]; # Enable ‘sudo’ for the user.
  };
  
  environment.systemPackages = with pkgs;[
    nvidia-offload
  ];
  services.udev.packages = with pkgs; [ gnome3.gnome-settings-daemon ];

  # Some programs need SUID wrappers, can be configured further or are
  # started in user sessions.
  # programs.mtr.enable = true;
  programs.gnupg.agent = {
     enable = true;
     enableSSHSupport = true;
  };

  # List services that you want to enable:

  # Enable the OpenSSH daemon.
  services.openssh.enable = true;

  # Open ports in the firewall.
  # networking.firewall.allowedTCPPorts = [ ... ];
  # networking.firewall.allowedUDPPorts = [ ... ];
  # Or disable the firewall altogether.
  # networking.firewall.enable = false;

  # This value determines the NixOS release from which the default
  # settings for stateful data, like file locations and database versions
  # on your system were taken. It‘s perfectly fine and recommended to leave
  # this value at the release version of the first install of this system.
  # Before changing this value read the documentation for this option
  # (e.g. man configuration.nix or on https://nixos.org/nixos/options.html).
  system.stateVersion = "21.05"; # Did you read the comment?

}

@bryanasdev000
Copy link
Member

Here's the curious thing. With wayland (not x11) everything works, except the NVIDIA settings app, with GDM. Things do not work with lightdm. nvidia-offload glxgears displays the gears, nvidia-offload glxinfo lists the NVIDIA GPU (while plain glxinfo lists the other GPU). Given NVIDIA shenanigans, this will probably change.

I have an AMD (Renior) and NVIDIA laptop.

My config:

 { config, pkgs, ... }:

let
  nvidia-offload = pkgs.writeShellScriptBin "nvidia-offload" ''
    export __NV_PRIME_RENDER_OFFLOAD=1
    export __NV_PRIME_RENDER_OFFLOAD_PROVIDER=NVIDIA-G0
    export __GLX_VENDOR_LIBRARY_NAME=nvidia
    export __VK_LAYER_NV_optimus=NVIDIA_only
    exec -a "$0" "$@"
  '';
in
{
  system.autoUpgrade.channel = "https://nixos.org/channels/nixos-21.05/";
  nixpkgs.config.allowUnfree = true;
  
  imports =
    [ # Include the results of the hardware scan.
      ./hardware-configuration.nix
    ];

  # Use the systemd-boot EFI boot loader.
  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  networking.hostName = "jono-laptop"; # Define your hostname.
  # networking.wireless.enable = true;  # Enables wireless support via wpa_supplicant.

  # Set your time zone.
  time.timeZone = "America/New_York";

  # The global useDHCP flag is deprecated, therefore explicitly set to false here.
  # Per-interface useDHCP will be mandatory in the future, so this generated config
  # replicates the default behaviour.
  networking.useDHCP = false;
  networking.interfaces.eno1.useDHCP = true;
  networking.interfaces.wlp4s0.useDHCP = true;

  # Configure network proxy if necessary
  # networking.proxy.default = "http://user:password@proxy:port/";
  # networking.proxy.noProxy = "127.0.0.1,localhost,internal.domain";

  # Select internationalisation properties.
  # i18n.defaultLocale = "en_US.UTF-8";
  # console = {
  #   font = "Lat2-Terminus16";
  #   keyMap = "us";
  # };

  # Enable the X11 windowing system.
  services.xserver.enable = true;
  services.xserver.displayManager.gdm.enable = true;
  services.xserver.displayManager.gdm.wayland = true;
  services.xserver.displayManager.gdm.nvidiaWayland = true;
  services.xserver.desktopManager.gnome.enable = true;
  services.xserver.videoDrivers = [ "modeset" "nvidia" ];
  hardware.nvidia = {
    powerManagement.enable = true;
    modesetting.enable = true;
    prime = {
      offload.enable = true;
      amdgpuBusId = "PCI:5:0:0";
      nvidiaBusId = "PCI:1:0:0";
    };
  };
  

  # Configure keymap in X11
  services.xserver.layout = "us";
  # services.xserver.xkbOptions = "eurosign:e";

  # Enable CUPS to print documents.
  # services.printing.enable = true;

  # Enable sound.
  sound.enable = true;
  hardware.pulseaudio.enable = true;

  # Enable touchpad support (enabled default in most desktopManager).
  services.xserver.libinput.enable = true;

  # Define a user account. Don't forget to set a password with ‘passwd’.
  users.users.jono = {
     isNormalUser = true;
     extraGroups = [
      "video"
      "networkmanager"
      "wheel"
     ]; # Enable ‘sudo’ for the user.
  };
  
  environment.systemPackages = with pkgs;[
    nvidia-offload
  ];
  services.udev.packages = with pkgs; [ gnome3.gnome-settings-daemon ];

  # Some programs need SUID wrappers, can be configured further or are
  # started in user sessions.
  # programs.mtr.enable = true;
  programs.gnupg.agent = {
     enable = true;
     enableSSHSupport = true;
  };

  # List services that you want to enable:

  # Enable the OpenSSH daemon.
  services.openssh.enable = true;

  # Open ports in the firewall.
  # networking.firewall.allowedTCPPorts = [ ... ];
  # networking.firewall.allowedUDPPorts = [ ... ];
  # Or disable the firewall altogether.
  # networking.firewall.enable = false;

  # This value determines the NixOS release from which the default
  # settings for stateful data, like file locations and database versions
  # on your system were taken. It‘s perfectly fine and recommended to leave
  # this value at the release version of the first install of this system.
  # Before changing this value read the documentation for this option
  # (e.g. man configuration.nix or on https://nixos.org/nixos/options.html).
  system.stateVersion = "21.05"; # Did you read the comment?

}

Curious, I got access to my PRIME laptop again and did a quick test with GDM and LightDM testing only X11.

In GDM I have the following error:

image

I was using gdm only with services.xserver.displayManager.gdm.enable = true;.

In LightDM it works perfectly.

I did these tests in the same round without rebooting.

I did this to see if other DMs had the error: gkr-pam: unable to locate daemon control file.

I'm using AwesomeWM (X11 Only) with NixOS Hardware settings for NVIDIA (https://github.com/NixOS/nixos-hardware/blob/master/common/gpu/nvidia.nix).

@cidkidnix
Copy link
Contributor

Here's the curious thing. With wayland (not x11) everything works, except the NVIDIA settings app, with GDM. Things do not work with lightdm. nvidia-offload glxgears displays the gears, nvidia-offload glxinfo lists the NVIDIA GPU (while plain glxinfo lists the other GPU). Given NVIDIA shenanigans, this will probably change.

I have an AMD (Renior) and NVIDIA laptop.

My config:

 { config, pkgs, ... }:

let
  nvidia-offload = pkgs.writeShellScriptBin "nvidia-offload" ''
    export __NV_PRIME_RENDER_OFFLOAD=1
    export __NV_PRIME_RENDER_OFFLOAD_PROVIDER=NVIDIA-G0
    export __GLX_VENDOR_LIBRARY_NAME=nvidia
    export __VK_LAYER_NV_optimus=NVIDIA_only
    exec -a "$0" "$@"
  '';
in
{
  system.autoUpgrade.channel = "https://nixos.org/channels/nixos-21.05/";
  nixpkgs.config.allowUnfree = true;
  
  imports =
    [ # Include the results of the hardware scan.
      ./hardware-configuration.nix
    ];

  # Use the systemd-boot EFI boot loader.
  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  networking.hostName = "jono-laptop"; # Define your hostname.
  # networking.wireless.enable = true;  # Enables wireless support via wpa_supplicant.

  # Set your time zone.
  time.timeZone = "America/New_York";

  # The global useDHCP flag is deprecated, therefore explicitly set to false here.
  # Per-interface useDHCP will be mandatory in the future, so this generated config
  # replicates the default behaviour.
  networking.useDHCP = false;
  networking.interfaces.eno1.useDHCP = true;
  networking.interfaces.wlp4s0.useDHCP = true;

  # Configure network proxy if necessary
  # networking.proxy.default = "http://user:password@proxy:port/";
  # networking.proxy.noProxy = "127.0.0.1,localhost,internal.domain";

  # Select internationalisation properties.
  # i18n.defaultLocale = "en_US.UTF-8";
  # console = {
  #   font = "Lat2-Terminus16";
  #   keyMap = "us";
  # };

  # Enable the X11 windowing system.
  services.xserver.enable = true;
  services.xserver.displayManager.gdm.enable = true;
  services.xserver.displayManager.gdm.wayland = true;
  services.xserver.displayManager.gdm.nvidiaWayland = true;
  services.xserver.desktopManager.gnome.enable = true;
  services.xserver.videoDrivers = [ "modeset" "nvidia" ];
  hardware.nvidia = {
    powerManagement.enable = true;
    modesetting.enable = true;
    prime = {
      offload.enable = true;
      amdgpuBusId = "PCI:5:0:0";
      nvidiaBusId = "PCI:1:0:0";
    };
  };
  

  # Configure keymap in X11
  services.xserver.layout = "us";
  # services.xserver.xkbOptions = "eurosign:e";

  # Enable CUPS to print documents.
  # services.printing.enable = true;

  # Enable sound.
  sound.enable = true;
  hardware.pulseaudio.enable = true;

  # Enable touchpad support (enabled default in most desktopManager).
  services.xserver.libinput.enable = true;

  # Define a user account. Don't forget to set a password with ‘passwd’.
  users.users.jono = {
     isNormalUser = true;
     extraGroups = [
      "video"
      "networkmanager"
      "wheel"
     ]; # Enable ‘sudo’ for the user.
  };
  
  environment.systemPackages = with pkgs;[
    nvidia-offload
  ];
  services.udev.packages = with pkgs; [ gnome3.gnome-settings-daemon ];

  # Some programs need SUID wrappers, can be configured further or are
  # started in user sessions.
  # programs.mtr.enable = true;
  programs.gnupg.agent = {
     enable = true;
     enableSSHSupport = true;
  };

  # List services that you want to enable:

  # Enable the OpenSSH daemon.
  services.openssh.enable = true;

  # Open ports in the firewall.
  # networking.firewall.allowedTCPPorts = [ ... ];
  # networking.firewall.allowedUDPPorts = [ ... ];
  # Or disable the firewall altogether.
  # networking.firewall.enable = false;

  # This value determines the NixOS release from which the default
  # settings for stateful data, like file locations and database versions
  # on your system were taken. It‘s perfectly fine and recommended to leave
  # this value at the release version of the first install of this system.
  # Before changing this value read the documentation for this option
  # (e.g. man configuration.nix or on https://nixos.org/nixos/options.html).
  system.stateVersion = "21.05"; # Did you read the comment?

}

Same with me AMD (Renoir) laptop with NVIDIA GPU works fine in wayland with somewhat the same config.

@bryanasdev000
Copy link
Member

Somewhat related #139354

@vasishath
Copy link

I don't know if it's related, but I have faced this issue on my laptop as well. The reason was that the X server started up way too fast before the nvidia GPU was even ready. I fixed it by following this article on ArchWiki which adds a systemd service which makes the login manager wait for nvidia GPU to be ready.

@bryanasdev000
Copy link
Member

I don't know if it's related, but I have faced this issue on my laptop as well. The reason was that the X server started up way too fast before the nvidia GPU was even ready. I fixed it by following this article on ArchWiki which adds a systemd service which makes the login manager wait for nvidia GPU to be ready.

Interesting... It may be necessary to add in the NVIDIA module as well.

To confirm:

  • Are you using GDM?
  • Did you add udev rules manually too?

@vasishath
Copy link

I don't know if it's related, but I have faced this issue on my laptop as well. The reason was that the X server started up way too fast before the nvidia GPU was even ready. I fixed it by following this article on ArchWiki which adds a systemd service which makes the login manager wait for nvidia GPU to be ready.

Interesting... It may be necessary to add in the NVIDIA module as well.

To confirm:

  • Are you using GDM?
  • Did you add udev rules manually too?

No I am using SDDM. It worked fine on GDM since it runs on Wayland and by the time u login to the desktop, the GPU is ready anyway.

And yes I added these rules manually.

@PaulGrandperrin
Copy link
Contributor

Hi, after an uncountable number of conf tries, reboots, I have found a few things that might help understand the issue:

I have a Dell Inc. XPS 15 9560, with:

  • Mesa Intel® HD Graphics 630 (KBL GT2) , connected to the integrated and external display
  • NVIDIA GeForce GTX 1050/PCIe/SSE2, not connected to any display (muxless)

I know this hardware can work well with primus because it does on Ubuntu, both in offload and sync modes.

On NixOS, neither sync nor offload work for me however I found a few things.

with this conf:

services.xserver.enable = true;
services.xserver.displayManager = {
    gdm = {
      enable = true;
      wayland = true;
   };
  defaultSession = "gnome"; # means gnome-wayland
};
services.xserver.desktopManager.gnome.enable = true;
services.xserver.videoDrivers = [ "nvidia" ];
hardware.nvidia = {
    prime = {
      offload.enable = true;
      intelBusId = "PCI:0:2:0";
      nvidiaBusId = "PCI:1:0:0";
    };
  };

The desktop starts in X11 mode even though I explicitly asked for wayland.
And both nvidia-offload and nvidia-smi fail with errors. The nvidia kernel module is not even loaded anyway.
Loading the nvidia module doesn't solve the issue, the only reliable way to make nvidia-offload work that I found, is to run sudo nvidia-smi once.
Then, nvidia-smi will run fine without sudo, but nvidia-offload will still not work.
To make it work, you need to systemctl restart display-manager.service.

However wayland still doesn't work... and it should!

First, wayland doesn't fail or anything, it's just that something, somewhere tells gnome-shell explicitly to run with X11.
To prove that wayland can indeed work with this conf, just launch gnome-shell directly from a tty:

dbus-run-session -- gnome-shell --display-server --wayland

You'll get minimal a session, but:

  • wayland is working! checked in gnome-settings, and with firefox-wayland in about:support
  • nvidia-offload is working too! checked with nvidia-offload glxgears -info

Now I would love to be able to understand and fix display-manager to make it stop forcing gnome-shell to run with X11.
Here is what I understand so far:

  • lightdm has the same issue
  • $XDG_SESSION_TYPE is explicity set to "X11" somewhere, and it then disables gnome-wayland because of the ExecCondition line in /etc/systemd/user/org.gnome.Shell@wayland.service:
[Unit]
Description=GNOME Shell on Wayland
# On wayland, force a session shutdown
OnFailure=org.gnome.Shell-disable-extensions.service gnome-session-shutdown.target
OnFailureJobMode=replace-irreversibly
CollectMode=inactive-or-failed
RefuseManualStart=on
RefuseManualStop=on

After=gnome-session-manager.target

Requisite=gnome-session-initialized.target
PartOf=gnome-session-initialized.target
Before=gnome-session-initialized.target

[Service]
Slice=session.slice
Type=notify
# NOTE: This can be replaced with ConditionEnvironment=XDG_SESSION_TYPE=%I in
#       the [Unit] section with systemd >= 246. Also, the current solution is
#       kind of painful as systemd had a bug where it retries the condition.
# Only start if the template instance matches the session type.
ExecCondition=/bin/sh -c 'test "$XDG_SESSION_TYPE" = "%I" || exit 2'
ExecStart=/nix/store/xkmhcrv4gbgzikzcjmc7ca2cc8gfhn4i-gnome-shell-41.1/bin/gnome-shell
# Exit code 1 means we are probably *not* dealing with an extension failure
SuccessExitStatus=1

# unset some environment variables that were set by the shell and won't work now that the shell is gone
ExecStopPost=-/bin/sh -c 'test "$SERVICE_RESULT" != "exec-condition" && systemctl --user unset-environment GNOME_SETUP_DISPLAY WAYLAND_DISPLAY DISPLAY XAUTHORITY'

# On wayland we cannot restart
Restart=no
# Kill any stubborn child processes after this long
TimeoutStopSec=5

@PaulGrandperrin
Copy link
Contributor

I'm still trying to figure out where this XDG_SESSION_TYPE is set to "x11".

  • I have searched though my nix store and found nothing interesting rg -i "XDG_SESSION_TYPE" --binary /nix/store/
  • searched through nixpkgs and found nothing interesting too
  • searched through many packages' source code, like gnome-shell, gnome-session, gdm... and nothing

The only thing I found is this: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/services/x11/display-managers/set-session.py#L70

But even monkeypatching this to force wayland doesn't change anything

@PaulGrandperrin
Copy link
Contributor

PaulGrandperrin commented Dec 3, 2021

My previous investigations might be incorrect because I assumed the choice between wayland and x11 was constant when no changes was made to the system.

This is apparently not true on my system, with this conf:

services.xserver.enable = true;
services.xserver.displayManager = {
    gdm = {
      enable = true;
      wayland = true;
   };
  defaultSession = "gnome"; # means gnome-wayland
};
services.xserver.desktopManager.gnome.enable = true;

I get:

  • wayland when booting, then after systemctl restart display-manager.service
  • x11, then after systemctl restart display-manager.service
  • wayland..
  • x11
  • wayland
  • x11
  • ...

So this is not random, after an

  • even number of display-manager launches, I get wayland
  • odd number of display-manager launches, I get x11

Am I the only one being cursed by this?

EDIT:
doing systemctl stop display-manager.service && sleep 2 && systemctl start display-manager.service instead of restarting solves this issue...

@PaulGrandperrin
Copy link
Contributor

To get back to the first issue, here's my method to get nvidia offloading:

Start a session with a conf without nvidia, like:

services.xserver.enable = true;
services.xserver.displayManager = {
    gdm = {
      enable = true;
      wayland = true;
   };
  defaultSession = "gnome"; # means gnome-wayland
};
services.xserver.desktopManager.gnome.enable = true;

when logged in, change the conf to add the nvidia drivers, like so:

services.xserver.enable = true;
services.xserver.displayManager = {
    gdm = {
      enable = true;
      wayland = true;
   };
  defaultSession = "gnome"; # means gnome-wayland
};
services.xserver.desktopManager.gnome.enable = true;
services.xserver.videoDrivers = [ "nvidia" ];
hardware.nvidia = {
    prime = {
      offload.enable = true;
      intelBusId = "PCI:0:2:0";
      nvidiaBusId = "PCI:1:0:0";
    };
  };

then run:

sudo nixos-rebuild test
sudo modprobe -r nouveau
sudo nvidia-smi
nvidia-offload glxgears -info

@PaulGrandperrin
Copy link
Contributor

Hello, I understood a few more things today:

  • I straced nvidia-smi to understand what made it succeed or fail as a user, and what made it help things when launched in root
strace -f -e trace=file nvidia-smi 2>&1 |grep nvidia0                                                                                                                                                          288ms  Sat 04 Dec 2021 04:11:06 PM CET
stat("/dev/nvidia0", 0x7fff3503e2b0)    = -1 ENOENT (No such file or directory)
mknodat(AT_FDCWD, "/dev/nvidia0", S_IFCHR|0666, makedev(0xc3, 0)) = -1 EACCES (Permission denied)
stat("/dev/nvidia0", 0x7fff3503e2e0)    = -1 ENOENT (No such file or directory)
  • It appears that nvidia-smi tried to open /dev/nvidia0 and if this node is not present:
    • if not root: fail
    • if root: mknod it
sudo strace -f -e trace=file nvidia-smi 2>&1 |grep nvidia0                                                                                                                                                             Sat 04 Dec 2021 04:11:48 PM CET
[sudo] password for paulg: 
stat("/dev/nvidia0", 0x7fffebdc7d00)    = -1 ENOENT (No such file or directory)
mknodat(AT_FDCWD, "/dev/nvidia0", S_IFCHR|0666, makedev(0xc3, 0)) = 0
chmod("/dev/nvidia0", 0666)             = 0
chown("/dev/nvidia0", 0, 0)             = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR|O_CLOEXEC) = 4
stat("/dev/nvidia0", {st_mode=S_IFCHR|0666, st_rdev=makedev(0xc3, 0), ...}) = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR|O_CLOEXEC) = 7
stat("/dev/nvidia0", {st_mode=S_IFCHR|0666, st_rdev=makedev(0xc3, 0), ...}) = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR|O_CLOEXEC) = 8

other programs using the nvidia libs will also try to open /dev/nvidia0.

strace -f -e trace=file nvidia-offload glxgears -info 2>&1 |grep nvidia0                                                                                                                                   296ms  Sat 04 Dec 2021 04:11:00 PM CET
stat("/dev/nvidia0", 0x7ffd0df9ed80)    = -1 ENOENT (No such file or directory)
mknodat(AT_FDCWD, "/dev/nvidia0", S_IFCHR|0666, makedev(0xc3, 0)) = -1 EACCES (Permission denied)
stat("/dev/nvidia0", 0x7ffd0df9edb0)    = -1 ENOENT (No such file or directory)
stat("/dev/nvidia0", 0x7ffd0df9eda0)    = -1 ENOENT (No such file or directory)
mknodat(AT_FDCWD, "/dev/nvidia0", S_IFCHR|0666, makedev(0xc3, 0)) = -1 EACCES (Permission denied)
stat("/dev/nvidia0", 0x7ffd0df9edd0)    = -1 ENOENT (No such file or directory)

/dev/nvidia0 is supposed to be created through this udev rule: https://github.com/NixOS/nixpkgs/blob/nixos-21.11/nixos/modules/hardware/video/nvidia.nix#L360, but:

  • sometimes it works and nvidia0 is created
  • sometimes it doesn't and nothing is created
    • if the card has id 0 in /sys/class/drm/card*, then creating the node manually or with sudo nvidia-smi will make nvidia-offload work
    • if the card has another id, I dont know how to make things work, just reboot and hope that the id will be 0 the next time
  • sometimes nvidia1 is created and creating nvidia0 manually will not help.. again, reboot and hope for the best

@PaulGrandperrin
Copy link
Contributor

New findings:

  • so for prime to work we need to have /dev/nvidia0, which means, we need to have the nvidia card being detected before the intel one in /dev/sys/drm
  • when hardware.nvidia.powerManagement.enable=false
    • I get card0 => nvidia and card1 => intel
  • when hardware.nvidia.powerManagement.enable=true
    • I get card0 => intel and card1 => nvidia

It feels cursed but that's what I found!

@ishan9299
Copy link

@nrdxp
Copy link
Contributor

nrdxp commented Dec 29, 2021

I just updated to latest nixos-unstable today with a turing GPU, and have power management and finegrained power management both enabled.

The card will completely power off when not in use in this circumstance and become invisible to the kernel unless you also enable the nvidia persistence daemon, which I found in the Nvidia docs. After doing this, offload works fine for X11, so unless I'm missing something, I think we can probably close this.

Actually, it might be better to reopen #90152 since the only thing not working is wayland native applications. I can offload xwayland programs just fine though, although Unigine Superposition still thought it was using the Intel card in this case, but based on the benchmark score, it was definitely using the Nvidia card.

This is probably an upstream issue though, and we'll probably have to wait for a driver level fix before we can offload wayland apps unfortunately.

@bryanasdev000
Copy link
Member

bryanasdev000 commented Dec 29, 2021

I just updated to latest nixos-unstable today with a turing GPU, and have power management and finegrained power management both enabled.

The card will completely power off when not in use in this circumstance and become invisible to the kernel unless you also enable the nvidia persistence daemon, which I found in the Nvidia docs. After doing this, offload works fine for X11, so unless I'm missing something, I think we can probably close this.

Actually, it might be better to reopen #90152 since the only thing not working is wayland native applications. I can offload xwayland programs just fine though, although Unigine Superposition still thought it was using the Intel card in this case, but based on the benchmark score, it was definitely using the Nvidia card.

This is probably an upstream issue though, and we'll probably have to wait for a driver level fix before we can offload wayland apps unfortunately.

NICE!

I haven't used my PRIME for some time now (on NVIDIA desktop for now), but I think we need to just give a update in the docs, especially a note about the Wayland thing, and we are ready to go :P

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/hdmi-output-not-working-on-nvidia-laptop/16911/1

@Ericson2314
Copy link
Member

New findings:

* so for prime to work we need to have `/dev/nvidia0`, which means, we need to have the nvidia card being detected before the intel one in `/dev/sys/drm`

* when `hardware.nvidia.powerManagement.enable=false`
  
  * I get card0 => nvidia and card1 => intel

* when `hardware.nvidia.powerManagement.enable=true`
  
  * I get card0 => intel and card1 => nvidia

It feels cursed but that's what I found!

powerManagement didn't seem to affect the order for me. Do we know how DRM decides what order to do?

@balacij
Copy link

balacij commented Jan 7, 2022

I agree with everything that @nrdxp said. W.r.t. Wayland, I am getting very good and consistent results with everything working perfectly except for nvidia-settings (which just completely fails to open). However, to even boot into Wayland, I also had to add:

environment.variables = {
...
    MUTTER_ALLOW_HYBRID_GPUS = "1";
...
};

or else it would always crash on load, and fallback to X11.

I've found that if you log in to a Wayland Gnome session, and it has / and /boot mounted, it's likely that it "failed" to load the Wayland session and defaulted to X11 -- I'm not sure why it decides to mount those two, however. By the way, It appears that Gnome on Wayland does not yet support systems with multiple GPUs (including hybrid GPUs fully), however, we can get by just fine if our display is only connected to a single GPU (e.g., our integrated GPUs) through enabling that MUTTER_ALLOW_HYBRID_GPUS flag (but, again, we forego nvidia-settings).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.