Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing native modules for mac and windows build #152

Closed
3 tasks done
tegefaulkes opened this issue Mar 5, 2024 · 29 comments · Fixed by #154
Closed
3 tasks done

Fix missing native modules for mac and windows build #152

tegefaulkes opened this issue Mar 5, 2024 · 29 comments · Fixed by #154
Assignees
Labels
development Standard development

Comments

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Mar 5, 2024

Specification

Right now the mac and windows build outputs are missing the proper native modules. So far as I can tell, all of the required modules are configured to be included by pkg. However they are unavailable to be included during the build process.

This needs to be fixed before the windows and mac builds can be used.

The main problem is that the build process only has the Linux specific optional dependencies available when we are packaging. I'm still exploring the exact reason for this.

Tasks

  1. Fix the build process to include the platform specific native modules in the output.
  2. Test the windows build on windows to make sure it's working.
  3. Test the mac build on macos to make sure it's working.
@tegefaulkes tegefaulkes added the development Standard development label Mar 5, 2024
@tegefaulkes tegefaulkes self-assigned this Mar 5, 2024
@tegefaulkes
Copy link
Contributor Author

Right now I'm not sure why the proper platform specific optional dependencies are not being included. I'm still digging into that aspect.

@tegefaulkes
Copy link
Contributor Author

I checked out a commit from before the flakes change and it has the same issue. So It's not a flake related issue.

@tegefaulkes
Copy link
Contributor Author

Ok, so here is the core of the problem and what I've found out.

Some facts:

  1. We have native modules packages such as @matrixai/quic and @matrixai/exec. This packages don't directly contain the native module but includes optional dependencies to sub packages that do.
  2. These sub packages named like @matrixai/quic-darwin-x64 or @matrixai/quic-win-x64 depending on OS and CPU support. These have platform support define inside their package.json for the specific architecture they support.
  3. When a npm package is included in the package.json as optional then it will only be installed if the platform is supported. If it was not optional then installing will fail.

So given these facts, from what I can tell. No matter what target platform we're building for. Only the optional dependencies that work on the linux platform are included. This means that for the mac and windows platform they're missing their platform specific native modules. And we only know this during runtime.

The solution is obvious, we need to make sure that the target platform native modules are available in the build environment to be included in the packaged executable. This is where I'm running into some stumbling blocks. Here is a general summary.

  1. I checked and the prefetchNpmDeps does list out all of the native module packages we'd expect. Including all of the windows and macos ones we need.
  2. I don't think buildNpmPackage has a way of overriding the os or cpu or forcing certain optional dependencies to be included.
  3. buildNpmPackage documentaion mentions npmDeps as a way to manually specify packages but never explains it in detail so I'm unsure of how it actually works.
  4. npm install and possibly npm ci has an --os and --cpu flag you can set which supposedly overrides the os and cpu when deciding what packages to include. But I don't think it's supported by our current version of npm and I haven't looked into it to test it fully yet.

Right now I think there just isn't good support for having optional dependencies that only support a specific platform. I can't find a clean work around to force include them in the build process. The only solution I can find that comes close is to include all of them and then prune the ones we don't need after installing. But that means removing the os and cpu limitation on the packages.

Right now what works is to add all of the optional native modules to the devDependencies and install with npm install --force. But this means we have to always include the --force flag.

@tegefaulkes
Copy link
Contributor Author

So In my mind right now, the preliminary solution is to

  1. Remove the os and cpu limitations to all of the native modules so then can be installed on all platforms, or at the minimum just linux.
  2. Include all of the platform specific modules in the depencies.
  3. Add a post-install script to prune the native modules we don't need for the target platform, default target to current platform.

@CMCDragonkai
Copy link
Member

Yes I was aware of this in the beginning when the native packaging was changed from js-db to the new style. The new style was intended to be more optimal and the current behaviour is the correct behaviour. It reduces the bloat.

However the question is about building targetting different platforms and thus requiring the installation of additional optional dependencies.

The solution is simple. In the ci, just directly run npm install for those specific dependencies.

The problem is because while nix understands the difference between build platform and host platform and target platform, npm does not understand this. So we just need to do a little trick in the CI job to force install those dependencies at that point of building the Mac or Windows binary.

On the other hand - a long term solution might be to use the Mac and windows vms builder to build the PKG. In fact this is how the actual native objects/libraries are done, and so one can argue the same should be done in Mac and windows ci jobs.

This is further supported by the need of doing code signing which may require keeping the entire build pipeline platform specific.

@CMCDragonkai
Copy link
Member

For now just npm install the optional package directly as an unsaved npm dependency (probably don't need saving, but if you do, you will need to do it before the npmDepsHash calculation).

@tegefaulkes
Copy link
Contributor Author

I'm trying to go down the npm install the missing deps without saving them route but I'm running into other problems. I thought I could just install it with npm install @matrixai/quic-darwin-x64 --force. This will run but it will not install that package into the node_modules`. It seems to refuse to do this unless the package is included in the non optional dependencies.

And we can't do anything a modifies the npmDepsHash during the build since the flake sand-boxing explicitly dislikes that. I'm starting to think the only option is to include all of the native modules as normal dependencies regardless of platform and just prune the ones we don't need. The only downside with this is that it's a little more complicated and we have to download native modules for all platforms. Or we can just not prune any of it. It's only a few extra MB anyway.

This means we'll need to remove the platform and cpu restrictions for the native sub-packages.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Mar 7, 2024

So we know what the problem is in detail now, and we know how to fix it. But doing so is going to involve a general revamp of the CI build process to make use of platform specific runners to do the pkg phase of the build for each platform.

The following changes need to be made to the CI jobs

  • 1. integration:builds job needs to be split up. Rather than building everything it will do 3 things,
    • a. Build the bundled code in dist and keep that as an artifact.
    • b. Build the docker image
    • c. the linux, win and mac outputs will no longer be built this way
  • 2. the integration testing jobs will now handle the packaging with pkg just before the tests are run on the output. It will provide the packaged executables as artefacts
  • 3. And since the win and mac outputs don't work I'll remove them from the flake.nix

@tegefaulkes
Copy link
Contributor Author

For the final stage packaging we won't be using nix anymore. So it won't technically be a deterministic output. The main difference is the name of the final output will change.

@tegefaulkes
Copy link
Contributor Author

I got the macos build and test working but there's an oddity. When building we have the arm64 native modules available. But this build is labelled as the x64 build. Come to think of it, do we have a x64 version mac runner we can use? If we need platform specific runner how to we make the x64 build? Should we just combine the native x64 and arm64 native modules into the one build? Something to look into.

We may have to explore setting up our own runners on the office mac and windows machines.

The windows job is still failing for now. There seems to be a problem with running the build script in windows. I need to explore the problem more.

@CMCDragonkai
Copy link
Member

Long term I expect native integration into nix on the macos platform too with only Windows as the outlier.

@tegefaulkes
Copy link
Contributor Author

Given we only have an arm64 runner for darwin I think we're pretty much forced to package a combined executable for darwin, we may just have to package a single darwin executable that includes the x64 and arm64 compiled native modules for darwin. This could be done in two ways.

  1. Remove the cpu restriction from the darwin native builds.
  2. Update the native builds to include a x64+arm64 build.

@tegefaulkes
Copy link
Contributor Author

I've gotten both the windows and mac build to work. The mac build still needs some polish. It's only building for arm64 right now, I need to include some way to enable x64, possibly by combining the arm64 and x64 into one executable.

On reflection, It may still be possible to do cross compiling if we build and bundle with node20 and pkg with node21 targeting node20. I'll look into that after clean up.

@tegefaulkes
Copy link
Contributor Author

I'm able to override the buildNpmPackage to use nodejs_21 and so far it solves most of our problems. But there are some minor details to work out with including the bundled application and some dependency conflicts with node21. I think the rest of this can be solved with the work-spaces feature, so we can have an application workspace and a packaging workspace with stripped down dependencies.

If that doesn't work or we decide not to go with that then we still need to solve how to build both version of darwin without having a x64 mac runner.

@CMCDragonkai
Copy link
Member

Did gitlab entirely deprecate the x64 runner? Anyway the arm64 runner is capable of doing anything the x64 was doing.

@tegefaulkes
Copy link
Contributor Author

Not sure, That would explain why it was x64 before and arm64 now. In any case for our purposes we don't have a x64 darwin runner to build a pure x64 executable in.

But I have good news, I found a way to get it working in nix again. So we can build all the executables in nix again. Turns out it was possible to override the node version buildNpmPackage was working. So with a combination of npm workspaces to isolate dependencies and utilizing two stages of buildNpmPackage it is possible to build for a target platform with all the required native modules.

@tegefaulkes
Copy link
Contributor Author

On review, It may be possible to do this all in node20 just using the workspaces. I'll explore this further.

@tegefaulkes
Copy link
Contributor Author

The nix building method that I fixed up works nominally. But the man and win runner fails to run their respective builds. The mac job is killing it so I'm assuming that's a code signing problem. The windows has a completely inscrutable error failed to run: Class not registeredAt line:300. For now I'm going to branch these changes off for future reference and return to the previous method.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Mar 13, 2024

I think for now I won't worry about getting the x64 mac version working. Or at the very least it's separate enough to be addressed last.

The leaves the windows build to fix. It seems to be failing to load fd-lock when running. There are two possibilities for this.

  1. The native module is not being included in the packaged output.
  2. The native module doesn't exist when being built.

Looking into it.

@CMCDragonkai
Copy link
Member

Not sure, That would explain why it was x64 before and arm64 now. In any case for our purposes we don't have a x64 darwin runner to build a pure x64 executable in.

But I have good news, I found a way to get it working in nix again. So we can build all the executables in nix again. Turns out it was possible to override the node version buildNpmPackage was working. So with a combination of npm workspaces to isolate dependencies and utilizing two stages of buildNpmPackage it is possible to build for a target platform with all the required native modules.

Yes I believe gitlab completely removed the x64 runner. However the arm64 runner is capable of doing both. In that case I think you can build a "fat binary" which means the executable supports both.

@CMCDragonkai
Copy link
Member

I think for now I won't worry about getting the x64 mac version working. Or at the very least it's separate enough to be addressed last.

The leaves the windows build to fix. It seems to be failing to load fd-lock when running. There are two possibilities for this.

  1. The native module is not being included in the packaged output.
  2. The native module doesn't exist when being built.

Looking into it.

The fd-lock issue is known. We had plans to create our own js-file-lock native library - we can do it like js-exec. You can search the issues and do it.

@tegefaulkes
Copy link
Contributor Author

When I include fd-lock in the pkg assets in package.json that seems to fix it. So it was available but not included when packaging. This seems to be a problem with the win build generally. It seems I need to add all of the native modules to assets to get the win build to work.

@tegefaulkes
Copy link
Contributor Author

Windows job is passing now.

@tegefaulkes
Copy link
Contributor Author

I'm looking into getting js-quic and js-exec to build a dwarwin module for x64+arm64 but I'm not sure it's possible? I can't seem to find a way to do it.

As an alternative I'm just going to remove the cpu restriction from the generated prebuild packages. While not combining both architectures it will ensure that all architectures for a platform are included.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Mar 14, 2024 via email

@tegefaulkes
Copy link
Contributor Author

I did a little research and it seems simple enough to do with lipo (https://ss64.com/mac/lipo.html) when creating the darwin binaries in the mac runner.

That said, what I have is a workable solution that's creating an arm64 only darwin executable for Polykey-cli. So I'll consolidate what I have already and merge a stage 1. Then I'll solve the problem of creating the universal binary and follow up with a stage 2 fix.

@tegefaulkes
Copy link
Contributor Author

Pretty much done for this stage. I'll make a PR for review.

@tegefaulkes
Copy link
Contributor Author

Fixed up the macos native dependencies. We now have a universal darwin binary that should work.

@CMCDragonkai
Copy link
Member

ETA on the release page for the working mac binary - non-codesigned of course so that @CryptoTotalWar can try it, he's working on the docs now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development
Development

Successfully merging a pull request may close this issue.

2 participants