Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to find a directory #148

Open
mrbenjadmin opened this issue Jul 11, 2020 · 25 comments
Open

Failing to find a directory #148

mrbenjadmin opened this issue Jul 11, 2020 · 25 comments

Comments

@mrbenjadmin
Copy link

I'm on Debian 10 and I've followed the installation instructions to the best of my ability, however I keep getting the same output regardless of if the command is run as root, or with a program specified:

Removing Nvidia bus from the kernel
tee: '/sys/bus/pci/devices/0000:01:00.0/remove': No such file or directory
1
Enabling powersave for the PCIe controller
auto

I'm not entirely sure what this means, but I would greatly appreciate any assistance.

P.S. I haven't posted on GitHub before so my apologies if I've messed up somewhere, or if this is the wrong place to put this.

@michelesr
Copy link
Contributor

michelesr commented Jul 12, 2020

That probably means the bus ids aren't correct, you'll have to change them in /etc/default/nvidia-xrun. Follow the readme instructions, the last sentence in that paragraph.

Also read this thread.

@mrbenjadmin
Copy link
Author

I ran lshw as root again and compared it to the config file at /etc/default/nvidia-xrun, and they seem to be the same values, 01:00.0 for the graphics card, and 00:01.0 for the PCIe controller.

It's rather odd that I've gotten the same issue twice, considering I completely reinstalled the OS since my first post. Maybe a package that I'm missing?

Thanks for responding so quickly by the way, I really appreciate it.

@michelesr
Copy link
Contributor

michelesr commented Jul 12, 2020

The /sys/bus/pci/devices/ entries are created by the kernel itself, and you shouldn't require additional packages, so I don't know exactly what's happening in your system.

Can you see the bus entries in that directory? Nvidia-xrun will first attempt to use remove (e.g. /sys/bus/pci/devices/0000:01:00.0/remove) to de-register the card from the system, so that programs like GNOME shell or Xorg won't be able to load the nvidia module, which prevent the card controller to be put in power saving mode. Then it will set power/control on the PCI controller to auto (this is what powertop does, e.g. with --auto-tune or when you toggle the power-saving manually on the TUI) and that should effectively turn off che controller and so the card.

@mrbenjadmin
Copy link
Author

Alright, I checked the /sys/bus/pci/devices/ directory and I can see quite a few folders including a 0000:00:01.0 folder for the PCIe bus, but I can't find a 0000:01:00.0 folder for the actual GPU.

@michelesr
Copy link
Contributor

Does lspci | grep -i nvidia show the card? If not a previous run of nvidia-xrun might have already removed the card from the system, and thus the entry in /sys/bus/pci/devices. The systemd service of nvidia-xrun does the same at boot if enabled.

If that's the case, assuming the bus ids are set properly in the config file, nvidia-xrun should restore the card at the next run by triggering a PCI rescan in the kernel. You can trigger the rescan manually using this command:

sudo tee /sys/bus/pci/rescan <<<1

Then you should be able to see the card again.

I appreciate this might seem confusing so I'll try to break it down for you:

  • when the card is not used (e.g. you're running the desktop using the iGPU) you want that card not to be present, because some programs might attempt to load the nvidia module if they detect the card, and when the module is loaded the card (and the controller) will stay on and consume power all the time

  • when you effectively need to use the card, and then you run nvidia-xrun, the card will be added again to the system and the nvidia module will be loaded

  • at the end of the nvidia-xrun session, the module wlil be unloaded and the card will be removed again and so won't be in the bus entries and in the output of lshw/lspci

This is the default behavior of nvidia-xrun, and can be tweaked using the config file, e.g. you might choose not to remove the card if you're confident enough that the nvidia module won't be loaded by mistake, but it's not recommended if you have GNOME shell or Xorg using the modesetting driver (not sure how Wayland compositors handle this TBH, but since NVIDIA is not supporting Wayland maybe they won't try to load the module on Wayland sessions) .

@mrbenjadmin
Copy link
Author

Alrighty, I did a PCI rescan and my graphics card is now visible in lspci. What should I do next?

@michelesr
Copy link
Contributor

michelesr commented Jul 12, 2020

Just double check that ids are correctly set in the config file, then try to run a command with nvidia-xrun and check that it's working properly. Post the output here so that I can double check.

@mrbenjadmin
Copy link
Author

mrbenjadmin commented Jul 12, 2020

Alright, I ran it as root, trying to start lutris, and this was the output:

Removing Nvidia bus from the kernel
1
Enabling powersave for the PCIe controller
auto

The program didn't appear to start during this.

@michelesr
Copy link
Contributor

Are you running that command from a linux virtual terminal tty or in a terminal emulator within a desktop environment? In order for this to work, expecially if you're using the modeset option in the module (which is the default for nvidia-xrun) you have to logout from your current graphical session and run nvidia-xrun from a linux virtual terminal (e.g CTRL+ALT+F2 ). The common use case is to run nvidia-xrun without arguments to start the X server and so run the X init script located at $XDG_CONFIG_HOME/X11/nvidia-xinitrc which has to contain a line such as exec gnome-session or whatever you need to start the desktop environment.

If you're trying to use it as you would use optirun then it won't work, AFAIK.

@mrbenjadmin
Copy link
Author

I somehow didn't catch that I had to run it in a tty so thank you for pointing that out lol

I've now tried running openbox-session through nvidia-xrun as root on a free tty but I still seem to be getting that exact same output without a trace of openbox starting up.

@michelesr
Copy link
Contributor

Can you please run it from a graphical terminal emulator with -d flag that is a dry run, and post the whole output here?

nvidia-xrun -d

This should print all the commands that nvidia-xrun will execute instead of actually executing them.

@mrbenjadmin
Copy link
Author

Upon running nvidia-xrun -d:

Removing Nvidia bus from the kernel
>>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:01:00.0/remove <<<1
Enabling powersave for the PCIe controller
>>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:00:01.0/power/control <<<auto

@michelesr
Copy link
Contributor

That doesn't look right to me, as it looks like nvidia-xrun is trying to only detach the graphic card rather than actually executing a command, as if the TURN_OFF_GPU_ONLY was set to 1 (see https://github.com/Witko/nvidia-xrun/blob/master/nvidia-xrun#L78).

Is by any chance that environment variable set? try this:

env TURN_OFF_GPU_ONLY=0 nvidia-xrun -d

And post the output here please.

@michelesr
Copy link
Contributor

Also please post the output of:

cat /etc/default/nvidia-xrun

@mrbenjadmin
Copy link
Author

TURN_OFF_GPU_ONLY is currently set to 1 because when I tried to run nvidia-xrun as root earlier, it told me that it must be set to 1 in order to run the command with sudo.

When I tried running env TURN_OFF_GPU_ONLY=0 nvidia-xrun -d the output it gave me was the exact same as the output given in my last post.

I also ran cat /etc/default/nvidia-xrun and this was the output:
# When enabled, nvidia-xrun will turn the card on before attempting to load the
# modules and running the command, and turn it off after the commands exits and
# the modules gets unloaded. If order for this to work, CONTROLLER_BUS_ID and
# DEVICE_BUS_ID must be set correctly. IDs can be found by by inspecting the
# output of lshw.
ENABLE_PM=1
# When PM is enabled, remove the card from the system after the command exists
# and modules unload: the card will be readded in the next nvidia-xrun
# execution before loading the nvidia module again. This is recommended as Xorg
# and some other programs tend to load the nvidia module if they detect a
# nvidia card in the system, and when the module is loaded the card can't save
# power.
REMOVE_DEVICE=1
# Bus ID of the PCI express controller
CONTROLLER_BUS_ID=0000:00:01.0
# Bus ID of the graphic card
DEVICE_BUS_ID=0000:01:00.0
# Seconds to wait before turning on the card after PCI devices rescan
BUS_RESCAN_WAIT_SEC=1
# Ordered list of modules to load before running the command
MODULES_LOAD=(nvidia nvidia_uvm nvidia_modeset "nvidia_drm modeset=1")
# Ordered list of modules to unload after the command exits
MODULES_UNLOAD=(nvidia_drm nvidia_modeset nvidia_uvm nvidia)
TURN_OFF_GPU_ONLY=1

@michelesr
Copy link
Contributor

Please remove TURN_OFF_GPU_ONLY=1 from your config file. That option exist only to be used by the systemd service to disable the nvidia card at boot. Nvidia-xrun has to be run as normal user as it will use sudo to elevate to superuser privileges.

Remove that from the config, then run nvidia-xrun -d again and check the output.

@mrbenjadmin
Copy link
Author

Alrighty, a much longer output from nvidia-xrun -d this time:

Couldn't get a file descriptor referring to the console
Turning the PCIe controller on to allow card rescan
>>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:00:01.0/power/control <<<on
Waiting 1 second
>>Dry run. Command: sleep 1
Rescanning PCI devices
>>Dry run. Command: sudo tee /sys/bus/pci/rescan <<<1
Waiting 1 second for rescan
>>Dry run. Command: sleep 1
Turning the card on
>>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:01:00.0/power/control <<<on
Loading module nvidia
>>Dry run. Command: sudo modprobe nvidia
Loading module nvidia_uvm
>>Dry run. Command: sudo modprobe nvidia_uvm
Loading module nvidia_modeset
>>Dry run. Command: sudo modprobe nvidia_modeset
Loading module nvidia_drm modeset=1
>>Dry run. Command: sudo modprobe nvidia_drm modeset=1
>>Dry run. Command: xinit /etc/X11/xinit/nvidia-xinitrc "" -- :1 vt -nolisten tcp -br -config nvidia-xorg.conf -configdir nvidia-xorg.conf.d
Unloading module nvidia_drm
>>Dry run. Command: sudo modprobe -r nvidia_drm
Unloading module nvidia_modeset
>>Dry run. Command: sudo modprobe -r nvidia_modeset
Unloading module nvidia_uvm
>>Dry run. Command: sudo modprobe -r nvidia_uvm
Unloading module nvidia
>>Dry run. Command: sudo modprobe -r nvidia
Removing Nvidia bus from the kernel
>>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:01:00.0/remove <<<1
Enabling powersave for the PCIe controller
>>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:00:01.0/power/control <<<auto

Though now when I run sudo nvidia-xrun, it spits out the following:
This script must not be run as root unless TURN_OFF_GPU_ONLY=1 is set

@michelesr
Copy link
Contributor

You don't have to run it with sudo, as sudo will be used internally.

The output looks sane now. Logout from your graphical session, open a virtual tty and run:

nvidia-xrun 

And it should work this time. If it doesn't, check that the nvidia-xinitrc file is properly configured to run your desktop environment, e.g. :

exec openbox-session

@mrbenjadmin
Copy link
Author

It appears that there might be an issue with the elevation as there are quite a few mentions of operations not being permitted, here is the output:

Couldn't get a file descriptor referring to the console
Turning the PCIe controller on to allow card rescan
on
Waiting 1 second
Rescanning PCI devices
1
Waiting 1 second for rescan
Turning the card on
on
Loading module nvidia
modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia
modprobe: ERROR: could not insert 'nvidia': Operation not permitted
Loading module nvidia_uvm
modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia
modprobe: ERROR: could not insert 'nvidia': Operation not permitted
modprobe: FATAL: Module nvidia-current-uvm not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia_uvm
modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted
Loading module nvidia_modeset
modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia
modprobe: ERROR: could not insert 'nvidia': Operation not permitted
modprobe: FATAL: Module nvidia-current-modeset not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia_modeset
modprobe: ERROR: could not insert 'nvidia_modeset': Operation not permitted
Loading module nvidia_drm modeset=1
modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia
modprobe: ERROR: could not insert 'nvidia': Operation not permitted
modprobe: FATAL: Module nvidia-current-modeset not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia_modeset
modprobe: ERROR: could not insert 'nvidia_modeset': Operation not permitted
modprobe: FATAL: Module nvidia-current-drm not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia_drm
modprobe: ERROR: could not insert 'nvidia_drm': Operation not permitted
/usr/bin/nvidia-xrun: line 17: xinit: command not found
Unloading module nvidia_drm
Unloading module nvidia_modeset
Unloading module nvidia_uvm
Unloading module nvidia
Removing Nvidia bus from the kernel
1
Enabling powersave for the PCIe controller
auto

@michelesr
Copy link
Contributor

Couldn't get a file descriptor referring to the console

Are you running this from a linux virtual terminal?

Module nvidia-current not found in directory /lib/modules/4.19.0-9-amd64

Did you install the nvidia drivers? I'm not sure about this specific issue but you might want to look at the existing issues on this project about Debian and NVIDIA drivers

xinit: command not found

You need to install this, try sudo apt install -y xinit

@mrbenjadmin
Copy link
Author

Are you running this from a linux virtual terminal?

I had run that command in a tty before this, but to be able to copy the output I ran it in a virtual terminal.

Did you install the nvidia drivers? I'm not sure about this specific issue but you might want to look at the existing issues on this project about Debian and NVIDIA drivers

Yes, I installed the newest available nvidia drivers from the Debian backports according to this article from the Debian website: https://wiki.debian.org/NvidiaGraphicsDrivers#Version_440.82_.28via_buster-backports.29

You need to install this, try sudo apt install -y xinit

Alrighty, done.

@michelesr
Copy link
Contributor

Not sure how to help with the drivers in debian, maybe check #44

@mrbenjadmin
Copy link
Author

I'm going to try installing nvidia's proprietary drivers from their website instead of the ones from the debian backports, and hopefully that will work.

Thanks a ton for helping me so far, you're a life-saver lol

@mrbenjadmin
Copy link
Author

Alright, I tried installing Nvidia's drivers from their website and it gave me a ton of warnings against installing things that weren't meant to be used with Debian, and didn't allow me to install them.

So basically where I'm at now is assuming this probably won't work for me and I'm likely going to need to switch back to Windows until either a fix comes for xrun, or an alternative program pops up for Debian.

@DrWaleedAYousef
Copy link

I have the same problem on my archlinux. I tried all the above; nothing working. it always gives the message:

tee: '/sys/bus/pci/devices/0000:01:00.0/remove': No such file or directory

I spent more than a days trying this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants