Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot buld tuxclocker1.5 with nvidia support due to dependency on flatpak #89

Open
pallaswept opened this issue Jan 29, 2024 · 12 comments

Comments

@pallaswept
Copy link

pallaswept commented Jan 29, 2024

You already know all the details on this one. The flatpak workaround to retrieve the library a) won't build on RPM-based distros (fedora, opensuse, etc) because there's no network during the build phase and b) isn't present on a lot of people's PC due to it's tendency for high disk space requirements.... and we can't build the library at all on FOSS distros (ubuntu, fedora, opensuse, etc due to their licensing restrictions), so as discussed over here the way to deal with this is the not-actually-a-rube-golberg-machine approach of using a wrapper library... but instead, we got a workaround that doesn't work.

Disappointed to see 1.5 release before this is fixed :( I do understand that this nvidia support is something of a thorn in your side, and that's why I said to you way back here

building each plugin is optional

Writing them in the first place is the part that creates a load on you.

So, I don't want you to feel like I'm ignorant of the difficulty this whole situation creates for you. I was concerned about minimising your workload a long, long time ago.... But also, It kinda sucks to be in a situation where all this work is being done to support everything else, while many nvidia users are now stuck on an old release, or at best, the current release without support for their GPU, for the foreseeable future.

I tried to discuss with you that this was going to be a problem across various threads and have put in time to try and create workarounds, but you never reply when I mention the problems or alternative workarounds. It would be nice to hear some kind of feedback from you as to a real solution to this whole nvidia annoyance -insert linus nvidia meme.gif- 😆

Edit: it didn't put my nvidia meme joke in htere because I enclosed it in <> tags, sorry about that, I wasn't laughing at you, I was laughing at the hidden joke :)

@Lurkki14
Copy link
Owner

I don't see how not having Flatpak prevents from building anything compared to 1.4. Only thing related to Flatpak that was added is for the prebuilt binaries.

@Lurkki14
Copy link
Owner

Lurkki14 commented Jan 29, 2024

Also one way to build NVIDIA support without NVML being present on the host would be to dynamically load it, but that'd require changes to the code and Arch and Nix at least seem to manage fine without that.

@pallaswept
Copy link
Author

I don't see how not having Flatpak prevents from building anything compared to 1.4. Only thing related to Flatpak that was added is for the prebuilt binaries.

It's that we don't have a method to obtain the pre-built binaries.... The current workaround for this problem is to use flatpak, but that doesn't work because of the online nature of the tool (assuming it is even present, which is often not the case) :(

Also one way to build NVIDIA support without NVML being present on the host would be to dynamically load it,

You may recall that was my initial suggestion, early on in the discussion of #65, but as yet we (meaning, neither you, nor I, nor @tujhen) have been able to find a way to make that happen.

but that'd require changes to the code

Exactly - the changes we need to resolve #65, and now also, this issue. This issue is really an extension of #65, but I raised the flatpak limitation there, and there was no response, and now there is this new release, so, a new issue arises.

Dynamically loading would be just fine and solve all of these problems, and using a wrapper library would have the same effect, of not requiring the libraries to be present during the build. I really don't mind what method is taken, but yes, some changes to the code need to be made, so that the application fully works.

and Arch and Nix at least seem to manage fine without that.

I would presume that is a result of their more lax requirements of both security and licensing during the builds - they will allow non-OSI-compliant packages (so, the library binaries can be present during the build), and also provide internet access during the build (so, flatpak could possibly work, for example). Either of which would be enough to solve the problem for more restrictive/secure packaging requirements in other, more common, distros.

In another project I help with (packaging, testing, and some minor contributions), which also shares similar needs for access to Nvidia hardware, and faces the same challenges you face here (you are NOT alone my friend! This Nvidia situation sucks for many application developers), I leaned that even COPR has somewhat relaxed requirements (for example it allows internet access during builds, where the official fedora repos do not), so possibly it might even work OK, there, too.... but for a real fedora package, so it could be included for all fedora users 'out of the box' (obviously a desirable outcome, and the ultimate end-goal of any project), as with openSUSE, those strict (and more safe) requirements apply.

Regardless of the means to the end, be it dynamic loading, a 'wrapper' lib, or any other means not yet considered, I'm sure you would agree that it would be good to see us in a situation where all distributions, with any variety of the Nvidia drivers, can run the latest version of this outstanding tool, in all its glory, including its Nvidia control features. Sadly, the present Flatpak workaround does not reach that goal.

I know it's not an easy task, and I assure you that if I can get through my extensive list of matters to attend with, I will personally contribute a solution - but it is likely that I would take an unreasonable time frame, given the long list of bugs I'm currently tracking, and my severe health problems preventing me from being as productive as most people (In short terms: I'm disabled, so I am slow, and I find lots of bugs because I was a test engineer before I was a cripple, so I'm slow and I have a long way to go) .... and so I have to take a more realistic path to a solution, in asking that you help us with this - or some other kind individual if someone else wanted to contribute, which would be lovely!

I really don't want to create more work for you. I just want to be able to really use all of the features of this great tool! I was first to package it for openSUSE, because I wanted to share it with as many people as I could, because I think it's great - so obviously, I am willing to do work to try and help.... but this job, adding the required changes to the code, is one I can't really help with, at least, not soon enough to be useful, so, on behalf of all of us on these various troublesome distros (which combined, include most users), who are using Nvidia GPUs (which again, means most users), I ask for your help.

@Lurkki14
Copy link
Owner

I would presume that is a result of their more lax requirements of both security and licensing during the builds

Nix doesn't allow internet access during build, and also doesn't distribute non-free binaries. The way the Nix package works is that the non-free parts are built on the user's computer.

I know it's not an easy task

It's not especially hard, it'd just be a Meson flag, and a couple of ifdefs where the library functions would be dlopen'ed. But to me this sounds like a packaging and not a code problem. Can rpm not build packages locally?

@pallaswept
Copy link
Author

pallaswept commented Jan 29, 2024

The way the Nix package works is that the non-free parts are built on the user's computer.
Can rpm not build packages locally?

The intention here is to have the application available via the package manager of the distro, out of the box, so that when someone wants to add this application, they can just use the normal, built-in tools.. Accordingly,

to me this sounds like a packaging and not a code problem.

Perhaps that is a result of your perspective being the guy who writes the code but doesn't care much for packaging. It's really a combination of both. But the solution must lie in changing the code, because changing the packaging cannot result in a solution.

Consider that no amount of packaging can solve this problem - the only workarounds which have existed so far, avoid packaging entirely: the nix package you mentioned which avoids the packaging of these parts and builds them locally instead; running scripts after installation to fetch necessary libraries as per the flatpak workaround; running scripts after installation modify linking behaviour (as discussed in #65); or the openSUSE package which simply doesn't package the nvidia module... The only packaging solution that's ever worked was mine, and it was breaking TOS, so I had to remove it, so it didn't really work at all.

But if the code is changed, then the software can be packaged, and it is no longer either a code or a packaging problem, it's not even a problem any more.

Obviously, as a developer, you are not averse to building software locally on your PC, but most users, for good reason, don't want to do that - either in order to maintain a system who's contents are entirely managed by the package manager, to serve the goal of reducing maintenance/having to fix bugs - or for reasons of UX; think about a person who is switching to linux from Windows - or they just have more modern expectations of UX, and just wants to have a nice GUI to click on for everything .... after all, those users who prefer a nice UX with a nice GUI are exactly the reason that this tool even exists, otherwise we could just use the terminal to do everything tuxclocker does. It seems a little contradictory, to provide a tool which provides a nice GUI for overclocking your PC, but won't allow you to use your distro's nice GUI to install or uninstall it.

As a person who's perspective is all of the above - I write code, I package apps, I want a nice friendly UX clicky GUI as much as is possible, and I want my system under complete control of the package manager to avoid maintenance problems - I know that if it were me in control of fixing this, I'd be fixing it in code.... and as I've mentioned, I do not mean to be selfish about this, I am willing to put in work myself, and that should be clear, as I have done so already - I was first to package it - and I had to do work including contributing code, to get bugs fixed in the packaging appliance for openSUSE, to make that happen - because I wanted it to get the recognition it deserves by being easily accessible, and because I wanted to maintain a system under control of package management - and I did try to fix this issue by means of packaging and was met with the threat of being banned from the platform, and as I've said, if this problem still exists, I will assist by contributing that code, as soon as I'm able, but because of my limitations in capacity and my excessive load, I can't do that in a reasonably useful time-frame, which is why I'm asking you (or someone else who might want to contribute, to help you out) to provide the code that is needed to reach a true solution that allows the full use of the software, under full control of the package management system.

I feel like if we combine all of the time that you and I and @tujhen have spent discussing this, it would have taken less of your stress and time, to just implement the wrapper (edit: or dynamic linking or whatever fix) and be done with it. I really want this to be as easy on you as possible - but I do want something that really works, too.

@Lurkki14
Copy link
Owner

The intention here is to have the application available via the package manager of the distro, out of the box

Yep, that's what happens with the Nix package, you just install it, nothing extra needed and the non-free parts are built automatically.

@Lurkki14
Copy link
Owner

The exact way it works is that Nix sees that tuxclocker depends on a non-free package (the NVIDIA plugin) which is not available prebuilt, so it's built locally, while the other components can be fetched from the servers. Is this not something RPM supports?

@pallaswept
Copy link
Author

Is this not something RPM supports?

TLDR yes it does, but this is not the same - it is a nonfree package, but it IS available prebuilt, and it is already installed, so there is no need to build it locally, because it's already been built (remotely) and copied onto the machine. It's already there, so it doesn't need to be built - and it wouldn't help, anyway, because we can't even get that far. We can't build tuxclocker at all, with the nvidia module.

RPM can run scripts post-build, but the whole point here is to avoid any kind of local building and post-installation hacks, and provide the entire package, within the package, so that the entirety of the software is under control of the package manager. This is the intended method of installing software on the system - by not building anything locally, but by building it on the packaging server and installing it on the target system by extracting the built binary from the package archive. That way, the package manager is in complete control of all of the contents of the entire filesystem (with the exception of user data like documents, photos, configuration files....that kind of thing). This avoids having problems with the system having files which are not designed to be there but somehow, are. Building the nvidia driver locally, kinda breaks this whole paradigm, especially when it's done while simultaneously having installed the driver via the package manager, because now we just have two of the driver, and one of them is in the wrong place...

Normally, for a use-case such as tuxclocker's, the RPM spec file would contain a requires directive, to ensure that on the target system, the package manager installs the necessary library as a dependency of tuxclocker, and it does so successfully, provided that a package which provides this package is available in one of the system's configured repositories - and if the user is using an Nvidia GPU, they will have configured such a repository, in order to install the driver, and the library will already be present. So, there's no need to build the nonfree driver, given that there is a repository allowing it to be installed, as the file exists already, having already been built on the packaging server, and extracted from that package's archive, onto the target machine. - in other words, we don't need to build it after installation like the nix package, because it was already built, and it's already there.

The problem is that because of something about the way TuxClocker is written, it not only requires the library be present at runtime on the target system (ie, a requires directive, which achieves what nix would achieve by building it locally), it requires the library be present at buildtime of the application, on the packaging server (ie, a buildrequires directive), apparently (and I say 'apparently' because I have not had the opportunity to confirm it, but it certainly seems this way) because it's not dynamically loading the library at runtime, it's statically linking it at buildtime, and this buildrequires directive means that the nonfree library would have to be installed in the build environment, and the build environment, unlike the end-user's system, does not have a repository containing the nonfree library, so the library cannot be installed in the build environment, and TuxClocker can't be built with the nvidia features.

Of interest, the open-source version of the proprietary nvidia driver (as in, nvidia-driver-open-source-*, not nouveau) is free, and is available to the build environment, and uses the same header file, and tuxclocker will build against it, with the open-source nvidia driver installed in the build environment - but that package will not function properly when installed on the target machine with the closed-source version of the library installed, which again suggests that the application is statically linking the library at buildtime, rather than just creating entry points to dynamically load it at runtime.

But, the application, even if succesfully built against the closed-source library, won't work properly without the library present at runtime, either - so it needs the lib at buildtime and at runtime, it seems as though it is both statically linking it, AND dynamically loading it. It's....weird. Something's wrong.

As we've mentioned, the simplest way to deal with this is to dynamically load the library at run time, which means that library won't need to be present during the build - provided that the headers for the library are present, they will provide sufficient data for the built tuxclocker binary to create entry points into the library at runtime - and those headers are free (as in freedom), and not binary files, so there's no problem including those in the source, which you already do - and then the library being present at runtime will be sufficient for the tuxclocker module to load the nvml library at runtime.... but something about the way tuxclocker works, means it's not working out that way, and the build requires the actual libraries, not just their headers, be present, and the tuxclocker module also requires the library be present. I really do wish I had the time to look into exactly why it's doing that, but I can't say for sure why that is until I've looked at it.

@Lurkki14
Copy link
Owner

Lurkki14 commented Jan 30, 2024

it seems as though it is both statically linking it, AND dynamically loading it

No, its existence is just checked at build time so we know it exists and where as with any other build process using dynamic libraries

@pallaswept
Copy link
Author

pallaswept commented Jan 30, 2024

it seems as though it is both statically linking it, AND dynamically loading it

No, its existence is just checked at build time so we know it exists and where as with any other build process using dynamic libraries

If that's the case, then we could just build it by touching a file with the correct name and path, so that it's existence is confirmed, and not have to actually install the library during the build process, and then it will dynamically link at runtime to the file at that location, which would be the real library, and everything will just work, right?

@pallaswept
Copy link
Author

pallaswept commented Jan 30, 2024

as with any other build process using dynamic libraries

Maybe this is something more specific to your distro. I must have like, hundreds, maybe thousands of packages here, which specify a library as a runtime requirement, but not as a build requirement, and so the presence and location of the library is never confirmed at build time, but is expected at runtime. Just as an example, practically the entirely of KDE builds like this, on opensuse and fedora at least, so you can imagine how many packages we're talking about, here. It's a lot 😆

Maybe it would be easier to just not do that whole 'confirmation of presence and location' process by requiring actually installing the library during the build, even thought the build never uses the library beyond locating a file name/path, because it dynamically links and doesn't statically link.... and instead do like all these other packages do?

Normally, they use the distro's standardised location for libraries (eg on openSUSE, and pretty sure it's the same on fedora, but it will be similar) it's /usr/lib64 for 64 bit libs, and /usr/lib for 32bit libs - and for the packaging process, that path is specified by means of using a macro, so you just use %{_libdir} (on opensuse or fedora it's the same macro) and the packaging system figures out the location, and the file name of the library (eg libnvidia-ml.so) is known, and in case it needs to, on a specific system, that file is actually a symlink to some other file (eg libnvidia-ml.so.1, which might then also be a symlink, eg to libnvidia-ml.so.545.29.06, which takes care of different versions of the libraries being present.)

This is how it's done on practically every package installed on my system, and from what you;'ve just said, it sounds like it could work here, too.... Sounds like it could be a really easy fix!

Edit: FWIW, I'm very rusty on deb packaging, it's been a long long time, but I do remember, the theory is the same there, too. You just depend on the package at runtime, but not at build time, and at build time, you use a variable to replace the path, and use the known library file name, which at runtime, is usually a symlink to the version-specific library.

@pallaswept
Copy link
Author

pallaswept commented Feb 6, 2024

it seems as though it is both statically linking it, AND dynamically loading it

No, its existence is just checked at build time so we know it exists and where as with any other build process using dynamic libraries

If that's the case, then we could just build it by touching a file with the correct name and path, so that it's existence is confirmed, and not have to actually install the library during the build process, and then it will dynamically link at runtime to the file at that location, which would be the real library, and everything will just work, right?

So, I went ahead and tried it - I just modified meson to not test for the presence of the libs, and go ahead with building the plugin anyway....and it failed when it tried to static link the libs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants