Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: Fix build with GCC 13 by using fmt_8 -> fmt_9. Fixes #281027 #281858

Merged
merged 2 commits into from Jan 24, 2024

Conversation

nh2
Copy link
Contributor

@nh2 nh2 commented Jan 18, 2024

Description of changes

Fixes #281027

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@nh2 nh2 mentioned this pull request Jan 18, 2024
@nh2
Copy link
Contributor Author

nh2 commented Jan 18, 2024

My local build is currently running, should be done in a few minutes.

@nh2
Copy link
Contributor Author

nh2 commented Jan 18, 2024

NixOS tests are hanging, but that's unlikely to be caused by this change alone (formatting library upgrade).

Hanging at:

monA # set pool 2 size to 2
(finished: must succeed: ceph osd pool set single-node-other-test size 2, in 1.02 seconds)
monA: waiting for success: ceph -s | grep 'HEALTH_OK'

Further up in the output, a couple ceph-mgr Python stack traces (not sure if relevant for this error):

monA # [   34.993567] ceph-mgr[1090]:   File "/nix/store/0szj6m127spn9mrrxmxkja5z1cvypilv-ceph-18.2.0-lib/lib/ceph/mgr/restful/__init__.py", line 1, in <module>
monA # [   34.994444] ceph-mgr[1090]:     from .module import Module
monA # [   34.994804] ceph-mgr[1090]:   File "/nix/store/0szj6m127spn9mrrxmxkja5z1cvypilv-ceph-18.2.0-lib/lib/ceph/mgr/restful/module.py", line 21, in <module>
monA # [   34.995599] ceph-mgr[1090]:     from OpenSSL import crypto
monA # [   34.995974] ceph-mgr[1090]:   File "/nix/store/5fz9vxyx7psy54265w00wr0bsk5dvzw5-python3-3.10.13-env/lib/python3.10/site-packages/OpenSSL/__init__.py", line 8, in <module>
monA # [   34.997171] ceph-mgr[1090]:     from OpenSSL import SSL, crypto
monA # [   34.997549] ceph-mgr[1090]:   File "/nix/store/5fz9vxyx7psy54265w00wr0bsk5dvzw5-python3-3.10.13-env/lib/python3.10/site-packages/OpenSSL/SSL.py", line 10, in <module>
monA # [   34.998506] ceph-mgr[1090]:     from OpenSSL._util import (
monA # [   35.000145] ceph-mgr[1090]:   File "/nix/store/5fz9vxyx7psy54265w00wr0bsk5dvzw5-python3-3.10.13-env/lib/python3.10/site-packages/OpenSSL/_util.py", line 6, in <module>
monA # [   35.000984] ceph-mgr[1090]:     from cryptography.hazmat.bindings.openssl.binding import Binding
monA # [   35.001547] ceph-mgr[1090]:   File "/nix/store/5fz9vxyx7psy54265w00wr0bsk5dvzw5-python3-3.10.13-env/lib/python3.10/site-packages/cryptography/hazmat/bindings/openssl/binding.py", line 15, in <module>
monA # [   35.002552] ceph-mgr[1090]:     from cryptography.exceptions import InternalError
monA # [   35.003054] ceph-mgr[1090]:   File "/nix/store/5fz9vxyx7psy54265w00wr0bsk5dvzw5-python3-3.10.13-env/lib/python3.10/site-packages/cryptography/exceptions.py", line 9, in <module>
monA # [   35.003929] ceph-mgr[1090]:     from cryptography.hazmat.bindings._rust import exceptions as rust_exceptions
monA # [   35.004526] ceph-mgr[1090]: ImportError: PyO3 modules may only be initialized once per interpreter process
monA # [   35.005155] ceph-mgr[1090]: 2024-01-18T20:22:50.998+0000 7f27426ba9c0 -1 mgr[py] Class not found in module 'restful'
monA # [   35.005772] ceph-mgr[1090]: 2024-01-18T20:22:50.998+0000 7f27426ba9c0 -1 mgr[py] Error loading module 'restful': (2) No such file or directory
monA # [   35.006543] ceph-mgr[1090]: 2024-01-18T20:22:50.998+0000 7f27426ba9c0 -1 log_channel(cluster) log [ERR] : Failed to load ceph-mgr modules: k8sevents, dashboard, cephadm, restful

@nh2
Copy link
Contributor Author

nh2 commented Jan 18, 2024

NixOS tests are hanging

@lejonet or anybody who knows:

How does one debug these tests? E.g. because of

networking = networkConfig;

One cannot start the VM with

QEMU_NET_OPTS="hostfwd=tcp:127.0.0.1:2222-:22" $(NIX_PATH=nixpkgs=. nix-build --no-out-link -A ceph.tests.ceph-single-node.driverInteractive)/bin/nixos-test-driver

to get SSH into the machine to investigate ceph status while the test is running, because the normal QEMU networking seems to be disabled.

@nh2
Copy link
Contributor Author

nh2 commented Jan 18, 2024

I found that this will help connect in conveniently if the networking = ... is commented out; but of course the test requires that networking setup.

@nh2
Copy link
Contributor Author

nh2 commented Jan 18, 2024

How does one debug these tests?

I resorted now to just delete the wait_until_succeeds() line that's hanging, and all lines below, so that I can use the interactive test driver to monA.shell_interact() and run ceph status inside.

It gives:

    health: HEALTH_WARN
            Module 'restful' has failed dependency: PyO3 modules may only be initialized once per interpreter process

So indeed the PyO3 exceptions mentioned above make the test hang.

@nh2
Copy link
Contributor Author

nh2 commented Jan 18, 2024

Related: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/FB7XE6WYDK3EBJYPABSPX5B2LEILWWJA/

But the "just upgrade" advice is not useful; on Ceph 18.2.1 (#281474) the same issue happens.

@nh2
Copy link
Contributor Author

nh2 commented Jan 18, 2024

This is more useful: https://forum.proxmox.com/threads/ceph-warning-post-upgrade-to-v8.129371/page-5#post-610330

(In this thread is also the person who posted the above ceph-users link.)

From there:

And a very useful summary:

So this effectively breaks the mgr.

@nh2
Copy link
Contributor Author

nh2 commented Jan 18, 2024

We likely want to merge this anyway, because it fixes the build of Ceph and packages that depend on it.


But the tests still fail because the Python versions recently upgraded in nixpkgs master introduce this issue.

Maybe we can add back some older Python libraries that Ceph can use until this is fixed?

@deepfire
Copy link
Contributor

@nh2, so should we track down whoever maintains these python libraries and ask them to add older pins?

@deepfire
Copy link
Contributor

Also, no-one was tagged for review -- who is aware of this PR even?

@qubitnano
Copy link
Contributor

FYI builds ok with aur fmt10 patch

diff --git a/pkgs/tools/filesystems/ceph/default.nix b/pkgs/tools/filesystems/ceph/default.nix
index f38cd4be880c..4625c7e3eeae 100644
--- a/pkgs/tools/filesystems/ceph/default.nix
+++ b/pkgs/tools/filesystems/ceph/default.nix
@@ -4,6 +4,7 @@
 , fetchurl
 , fetchFromGitHub
 , fetchPypi
+, fetchpatch
 
 # Build time
 , cmake
@@ -294,6 +295,13 @@ in rec {
       optLibedit
     ];
 
+    patches = [
+      (fetchpatch {
+        url = "https://aur.archlinux.org/cgit/aur.git/plain/ceph-18.2.0-fmt10-fixes.patch?h=ceph";
+        hash = "sha256-j9xycE3NR54C82tbpMnySMEFNqcTZO31H+kQCQhhOKg=";
+      })
+    ];
+
     pythonPath = [ ceph-python-env "${placeholder "out"}/${ceph-python-env.sitePackages}" ];
 
     # replace /sbin and /bin based paths with direct nix store paths
diff --git a/pkgs/top-level/all-packages.nix b/pkgs/top-level/all-packages.nix
index 8667c3d9cd2b..c4316b02f0c7 100644
--- a/pkgs/top-level/all-packages.nix
+++ b/pkgs/top-level/all-packages.nix
@@ -6817,7 +6817,6 @@ with pkgs;
   libceph = ceph.lib;
   inherit (callPackages ../tools/filesystems/ceph {
     lua = lua5_4;
-    fmt = fmt_8;
   })
     ceph
     ceph-client;

@nh2
Copy link
Contributor Author

nh2 commented Jan 19, 2024

@nh2, so should we track down whoever maintains these python libraries and ask them to add older pins?

I am currently trying if this works: Adding pins for cryptography_40 and thus required pyopenssl 23.1.1.

Will report in a few minutes.

Also, no-one was tagged for review -- who is aware of this PR even?

Ideally the people from #281027, I posted there.

@nh2
Copy link
Contributor Author

nh2 commented Jan 19, 2024

FYI builds ok with aur fmt10 patch

@qubitnano Thanks for checking, that's great to know!

I think while we have fmt_9 available we probably want to use that, reducing the need to patch. I'll link your post into a comment so we'll find it easily once we must do fmt_10.

@nh2
Copy link
Contributor Author

nh2 commented Jan 19, 2024

I am currently trying if this works: Adding pins for cryptography_40 and thus required pyopenssl 23.1.1.

Will report in a few minutes.

That seems to indeed get rid of the mentioned Python issues.

But this one still remains:

monA # [   24.891030] ceph-mgr[1097]: 2024-01-19T12:22:38.905+0000 7f04360b89c0 -1 mgr[py] Traceback (most recent call last):
monA # [   24.891675] ceph-mgr[1097]:   File "/nix/store/dsm04fps9pkhjlz885lk63bgisr4m1wv-ceph-18.2.1-lib/lib/ceph/mgr/k8sevents/__init__.py", line 1, in <module>
monA # [   24.892588] ceph-mgr[1097]:     from .module import Module
monA # [   24.893266] ceph-mgr[1097]:   File "/nix/store/dsm04fps9pkhjlz885lk63bgisr4m1wv-ceph-18.2.1-lib/lib/ceph/mgr/k8sevents/module.py", line 70, in <module>
monA # [   24.894867] ceph-mgr[1097]:     from kubernetes.client.models.v1_event import V1Event
monA # [   24.896086] ceph-mgr[1097]: ModuleNotFoundError: No module named 'kubernetes.client.models.v1_event'

Looks like this:

But I don't understand how this worked so far, given that this is 2 years old, and nixpkgs already had the Python package kubernetes > 11 for 2 years.

Edit:

A related isssue is #241482.

So probably this has been an issue for a while, now ceph -s surfaces it and thus the test hangs on the non-healthy status.

@nh2
Copy link
Contributor Author

nh2 commented Jan 19, 2024

I could fix the kurbernetes problem by adding back an old version of that library to nixpkgs as well.

nh2 added a commit to nh2/nixpkgs that referenced this pull request Jan 19, 2024
…1858.

Fixes NixOS#241482.

Also fix test putting cluster in unhealthy `POOL_APP_NOT_ENABLED` state;
this seems to be the default state with Ceph 18.
@nh2
Copy link
Contributor Author

nh2 commented Jan 19, 2024

Tests are passing now on Ceph 18.2.1.

Now switching back to 18.2.0, as that upgrade is separate. Will notify here when it's done.

Please somebody review my addition of the old Python packages.

@nh2
Copy link
Contributor Author

nh2 commented Jan 19, 2024

Now switching back to 18.2.0, as that upgrade is separate. Will notify here when it's done.

It's working.

But the ofborg eval failed:

https://gist.github.com/GrahamcOfBorg/8b4e2ab113b7b1f5b634299c41402fea

       error: evaluation aborted with the following error message: 'lib.customisation.callPackageWith: Function called without required argument "requests_oauthlib" at /var/lib/ofborg/checkout/1/repo/38dca4e3aa6bca43ea96d2fcc04e8229/mr-est/ofborg-evaluator-2/pkgs/development/python-modules/kubernetes/18.nix:13, did you mean "requests-oauthlib"?'

I can see in

pkgs/top-level/python-aliases.nix:  requests_oauthlib = requests-oauthlib; # added 2022-02-12

How comes this alias seems to work in my local build, but not for ofborg?

nh2 added a commit to nh2/nixpkgs that referenced this pull request Jan 19, 2024
…1858.

Fixes NixOS#241482.

Also fix test putting cluster in unhealthy `POOL_APP_NOT_ENABLED` state;
this seems to be the default state with Ceph 18.
@nh2
Copy link
Contributor Author

nh2 commented Jan 19, 2024

How comes this alias seems to work in my local build, but not for ofborg?

@lilyinstarlight Explained:

it intentionally does not (by setting config.allowAliases = false for nixpkgs eval) because aliases should not be used within nixpkgs

I fixed that now by using requests_oauthlib -> requests-oauthlib instead in my revival of python-modules/kubernetes/18.nix.

nh2 added a commit to nh2/nixpkgs that referenced this pull request Jan 19, 2024
…1858.

Fixes NixOS#241482.

Also fix test putting cluster in unhealthy `POOL_APP_NOT_ENABLED` state;
this seems to be the default state with Ceph 18.2.1 at least,
and it does not hurt to fix it now already in the way the Ceph docs say.
nh2 added a commit to nh2/nixpkgs that referenced this pull request Jan 23, 2024
…1858.

Fixes NixOS#241482.

Also fix test putting cluster in unhealthy `POOL_APP_NOT_ENABLED` state;
this seems to be the default state with Ceph 18.2.1 at least,
and it does not hurt to fix it now already in the way the Ceph docs say.

Also revert "nixosTests.ceph-single-node: remove dashboard check"

This reverts commit 41b27d7.
nh2 added a commit to nh2/nixpkgs that referenced this pull request Jan 23, 2024
…1858.

Fixes NixOS#241482.

Also fix test putting cluster in unhealthy `POOL_APP_NOT_ENABLED` state;
this seems to be the default state with Ceph 18.2.1 at least,
and it does not hurt to fix it now already in the way the Ceph docs say.

Also revert "nixosTests.ceph-single-node: remove dashboard check"

This reverts commit 41b27d7.
@nh2
Copy link
Contributor Author

nh2 commented Jan 23, 2024

@dotlambda Please take another look!

@nh2 nh2 requested a review from dotlambda January 23, 2024 15:45
Copy link
Member

@dotlambda dotlambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I think you can split out the test changes into something like nixos/tests/ceph: re-enable dashboard tests.

@nh2
Copy link
Contributor Author

nh2 commented Jan 24, 2024

LGTM, though I think you can split out the test changes into something like nixos/tests/ceph: re-enable dashboard tests.

They were disabled only because of the cryptography issue (commit 41b27d7), so it makes sense that the commit that fixes the cryptography issue also re-enables those tests.

@nh2
Copy link
Contributor Author

nh2 commented Jan 24, 2024

Now newly ofborg complains

       error: attribute 'fetchPypi' missing

       at /ofborg/checkout/2/repo/38dca4e3aa6bca43ea96d2fcc04e8229/mr-est/ofborg-evaluator-0/pkgs/tools/filesystems/ceph/default.nix:183:15:

          182|       cryptography-vectors = (self.callPackage ../../../development/python-modules/cryptography/vectors.nix { }).overridePythonAttrs (old: {
          183|         src = self.fetchPypi {
             |               ^
          184|           pname = "cryptography_vectors";

Not sure what that is yet.

@nh2
Copy link
Contributor Author

nh2 commented Jan 24, 2024

Not sure what that is yet.

Aha, commit ffbf6f2 python3.pkgs.fetchPypi: deprecate in favor of top-level fetchPypi.

I force-pushed a fix.

Comment on lines 182 to 207
cryptography-vectors = (self.callPackage ../../../development/python-modules/cryptography/vectors.nix { }).overridePythonAttrs (old: {
src = fetchPypi {
pname = "cryptography_vectors";
version = cryptographyOverrideVersion;
hash = "sha256-hGBwa1tdDOSoVXHKM4nPiPcAu2oMYTPcn+D1ovW9oEE=";
};
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just disable tests in cryptography instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuperSandro2000 Is there a drawback of running the tests? Given that they are not slow, it seems nice to check that the version/src override doesn't add any bugs.

Copy link
Contributor Author

@nh2 nh2 Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a drawback of running the tests?

Ah, I see, because it requires the overridability of cryptography-vectors which you want to avoid in #281858 (comment).

OK, I've disabled tests as requested in any case with the latest force-push of this PR.

@@ -166,7 +169,73 @@ let

# Watch out for python <> boost compatibility
python = python310.override {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already dropped python 3.10 support, so the situation here will only get much worse over time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ceph 19 will address this, with Python 3.13 support:

Comment on lines -27 to -29
let
cryptography-vectors = callPackage ./vectors.nix { };
in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please revert this. I do not want people to use this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

…1858.

Fixes NixOS#241482.

Also fix test putting cluster in unhealthy `POOL_APP_NOT_ENABLED` state;
this seems to be the default state with Ceph 18.2.1 at least,
and it does not hurt to fix it now already in the way the Ceph docs say.

Also revert "nixosTests.ceph-single-node: remove dashboard check"

This reverts commit 41b27d7.
@nh2
Copy link
Contributor Author

nh2 commented Jan 24, 2024

After rebasing on master I am getting a failure now for /nix/store/f15inffyd3cvj0m3wi9v1lp2m59adbfd-python3.10-anyio-4.1.0.drv:

=================================== FAILURES ===================================
______________ test_cancelscope_propagation_when_abandoned[trio] _______________
[gw31] linux -- Python 3.10.13 /nix/store/8np01kr8qpm49m2gzjbwcjr68f162b9x-python3-3.10.13/bin/python3.10
/nix/store/r4mhvx3r2pigizgch91dvx4s7kf10baw-python3.10-pytest-7.4.3/lib/python3.10/site-packages/_pytest/runner.py:341: in from_call
    result: Optional[TResult] = func()
/nix/store/r4mhvx3r2pigizgch91dvx4s7kf10baw-python3.10-pytest-7.4.3/lib/python3.10/site-packages/_pytest/runner.py:262: in <lambda>
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
/nix/store/wmx47qf64ymxicnjiw05p2y54r998h5i-python3.10-pluggy-1.3.0/lib/python3.10/site-packages/pluggy/_hooks.py:493: in __call__
    return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
/nix/store/wmx47qf64ymxicnjiw05p2y54r998h5i-python3.10-pluggy-1.3.0/lib/python3.10/site-packages/pluggy/_manager.py:115: in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
/nix/store/r4mhvx3r2pigizgch91dvx4s7kf10baw-python3.10-pytest-7.4.3/lib/python3.10/site-packages/_pytest/threadexception.py:83: in pytest_runtest_call
    yield from thread_exception_runtest_hook()
/nix/store/r4mhvx3r2pigizgch91dvx4s7kf10baw-python3.10-pytest-7.4.3/lib/python3.10/site-packages/_pytest/threadexception.py:73: in thread_exception_runtest_hook
    warnings.warn(pytest.PytestUnhandledThreadExceptionWarning(msg))
E   pytest.PytestUnhandledThreadExceptionWarning: Exception in thread AnyIO worker thread
E   
E   Traceback (most recent call last):
E     File "/nix/store/8np01kr8qpm49m2gzjbwcjr68f162b9x-python3-3.10.13/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
E       self.run()
E     File "/nix/store/1fdn4d018vfn7ygwaiy451iyp9va54nh-python3.10-anyio-4.1.0/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 830, in run
E       self.loop.call_soon_threadsafe(
E     File "uvloop/loop.pyx", line 1288, in uvloop.loop.Loop.call_soon_threadsafe
E     File "uvloop/loop.pyx", line 671, in uvloop.loop.Loop._append_ready_handle
E     File "uvloop/loop.pyx", line 703, in uvloop.loop.Loop._check_closed
E   RuntimeError: Event loop is closed

But it is a flaky test, re-running nix-store -r /nix/store/f15inffyd3cvj0m3wi9v1lp2m59adbfd-python3.10-anyio-4.1.0.drv makes it pass.

So not related to this PR.

@nh2
Copy link
Contributor Author

nh2 commented Jan 24, 2024

Ceph tests are passing again.

@nh2
Copy link
Contributor Author

nh2 commented Jan 24, 2024

I believe I addressed all feedback, merging.

@nh2 nh2 merged commit 987dc94 into NixOS:master Jan 24, 2024
23 of 24 checks passed
jonboh pushed a commit to jonboh/nixpkgs that referenced this pull request Jan 24, 2024
…1858.

Fixes NixOS#241482.

Also fix test putting cluster in unhealthy `POOL_APP_NOT_ENABLED` state;
this seems to be the default state with Ceph 18.2.1 at least,
and it does not hurt to fix it now already in the way the Ceph docs say.

Also revert "nixosTests.ceph-single-node: remove dashboard check"

This reverts commit 41b27d7.
khaneliman pushed a commit to khaneliman/nixpkgs that referenced this pull request Jan 25, 2024
…1858.

Fixes NixOS#241482.

Also fix test putting cluster in unhealthy `POOL_APP_NOT_ENABLED` state;
this seems to be the default state with Ceph 18.2.1 at least,
and it does not hurt to fix it now already in the way the Ceph docs say.

Also revert "nixosTests.ceph-single-node: remove dashboard check"

This reverts commit 41b27d7.
@nh2 nh2 mentioned this pull request Feb 5, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build failure: ceph
8 participants