Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgresql: split -lib and -dev outputs cleanly #294504

Open
wants to merge 6 commits into
base: staging
Choose a base branch
from

Conversation

wolfgangwalther
Copy link
Contributor

@wolfgangwalther wolfgangwalther commented Mar 9, 2024

Description of changes

TLDR: This splits outputs for the postgresql package cleanly, using a minimum of patching and/or nuking of references. Instead, dead references to other store paths are removed at compile and link time. The minimal -lib output is then added as a top-level libpq package.

This builds on my previous work in #292993 and #293996 and contains commits from those two branches. Please ignore the first ~20 commits here and provide feedback for those in the two other PRs. This PR starts with treewide: prepare pg_config moving to dev. This commit (by @jtojnar) and the 4 commits after (by @SuperSandro2000) were cherry-picked from #179962. Compared to #179962, the approach taken in this PR is simpler and requires far less patching of postgresql source code. In fact, the overall size of patches is reduced compared to before and fewer references needed to be removed afterwards.

While the most work here involved splitting the -dev output, a clean -lib output comes with that naturally. Thus, this PR replaces #179962 and #273175. The cleaned up lib output was discussed in #61580 (comment) (@thoughtpolice). With this output available, it's easy to create a top-level attribute libpq pointing to the latest version's lib output automatically as suggested by @danbst in #61580 (comment), which should resolve #61580 (@nh2). A true separate libpq package as proposed in #234470 (@szlend) would then not be necessary. This PR will be followed-up by another PR to build postgresql v16+ based on meson instead of autotools as mentioned in #292993 (comment). With a bit of patching (of which I have a working PoC already), this will allow us to build postgresql (at least libpq, but possibly even more) in pkgsStatic - which is probably the biggest thing that didn't work so far and why a separate libpq package was asked for in the first place.

After the split, the outputs contain the following:

  • default out:
    • All binaries, except pg_config and ecpg which are in the dev output. The default output thus includes both server-side and client-side binaries.
    • All contrib/core extension related files: server-libraries, extension control files etc.
  • lib:
    • libpq
    • libecpg, including libecpg_compat and libpgtypes
    • theoretically the /share/locale folders, if there were any built, which is not the case right now
  • doc and man as before
  • dev:
    • all header files in /include (both server and client)
    • the pgxs infrastructure to build extensions
    • static support libraries libpgport*.a, libpgcommon*.a and libpgfeutils.a
    • pkg-config files
    • ecpg and pg_config binaries as mentioned above

pg_config outputs all paths correctly. Example:

BINDIR = /nix/store/cc6n22zq8xg30d04km0l5ablzn3c8sw6-postgresql-13.14/bin
DOCDIR = /nix/store/i3wz2pp3xdxr36ydawhp4zvgfg96v1xm-postgresql-13.14-doc/share/doc/postgresql
HTMLDIR = /nix/store/i3wz2pp3xdxr36ydawhp4zvgfg96v1xm-postgresql-13.14-doc/share/doc/postgresql
INCLUDEDIR = /nix/store/0mn3a190bwza34dpgnaiwmhm2b9v30xp-postgresql-13.14-dev/include
PKGINCLUDEDIR = /nix/store/0mn3a190bwza34dpgnaiwmhm2b9v30xp-postgresql-13.14-dev/include
INCLUDEDIR-SERVER = /nix/store/0mn3a190bwza34dpgnaiwmhm2b9v30xp-postgresql-13.14-dev/include/server
LIBDIR = /nix/store/1785qgnghmk01k9q4x5bysywb6alrsfs-postgresql-13.14-lib/lib
PKGLIBDIR = /nix/store/cc6n22zq8xg30d04km0l5ablzn3c8sw6-postgresql-13.14/lib
LOCALEDIR = /nix/store/1785qgnghmk01k9q4x5bysywb6alrsfs-postgresql-13.14-lib/share/locale
MANDIR = /nix/store/ad6qwl677y7x9bcrrxps4sfvchdjk03f-postgresql-13.14-man/share/man
SHAREDIR = /nix/store/cc6n22zq8xg30d04km0l5ablzn3c8sw6-postgresql-13.14/share/postgresql
SYSCONFDIR = /etc/postgresql
PGXS = /nix/store/0mn3a190bwza34dpgnaiwmhm2b9v30xp-postgresql-13.14-dev/lib/pgxs/src/makefiles/pgxs.mk

One thing that is still missing right now, is that the -dev/lib folder must be added to the pkg-config file, so that libpgcommon, libpgport etc. can be found. It's still unclear to me whether we need to add this to some other parts of the pg_config output, too, to make it possible for pg_config-based builds to actually find those static libraries. This should work, see comment below. I think this question was also raised by @szlend in #273175 (comment).

Maintainer ping, if not mentioned above already: @globin @marsam @ivan @Ma27

Things done

I built all 5 versions (12-16) with and without JIT support, with glibc (default) and via pkgsMusl. I also built .tests for the glibc variants (musl didn't work for me for unrelated reasons). I built all extensions with various versions (which resulted in #294457). This works very well, so far.

What I did not build, yet, is any downstream packages actually depending on postgresql and the new -dev output. This will be the next step, but I would like to put this PR out for feedback already.

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@Ma27 Ma27 marked this pull request as draft March 9, 2024 16:52
@Ma27
Copy link
Member

Ma27 commented Mar 9, 2024

Converting to a draft until the base PRs are merged.

@Ma27
Copy link
Member

Ma27 commented Mar 9, 2024

Compared to #179962, the approach taken in this PR is simpler and requires far less patching of postgresql source code. In fact, the overall size of patches is reduced compared to before and fewer references needed to be removed afterwards.

Yeah, from a first glance I prefer this PR over the alternatives.
But I think the refactorings are still a good idea, so I'll focus on them for now.

@wolfgangwalther wolfgangwalther mentioned this pull request Mar 10, 2024
13 tasks
@wolfgangwalther
Copy link
Contributor Author

With this output available, it's easy to create a top-level attribute libpq pointing to the latest version's lib output automatically

I don't think this libpq -> postgresql_latest.lib top-level attribute will actually work. IIUC, when passing postgresql as a buildInput, it is actually using lib.getDev behind the scenes, to use the dev output of the postgresql package. But this won't work with this kind of libpq package - because lib.getDev postgresql.lib does not return the -dev output, but still only the -lib output. Without any of the headers, pkg-config files and static libraries... linking against libpq will certainly fail.

Of course the fix is simple - just make libpq point to postgresql_latest directly and not the lib output, which will have everything we actually wanted to, now that the outputs are split cleanly.

@wolfgangwalther
Copy link
Contributor Author

One thing that is still missing right now, is that the -dev/lib folder must be added to the pkg-config file, so that libpgcommon, libpgport etc. can be found.

I wonder how this would work before this PR anyway. Currently, those libraries are in the default output, but neither pg_config nor the pkg-config files have any mention of the default output's /lib directory.

When I remove those libraries entirely, then postgresql.pkgs.pg_auto_failover fails to build, because of missing libpgport and libpgcommon. But, even after moving those libs to the -dev output, the build succeeds. The linker flags look like this, though:

...
-L /nix/store/fnqpsn19nlyh45inikngy896fzf92d4z-postgresql-15.6/lib
-L /nix/store/gp2rvnn2xmm6k9iws48f7kcjgjd1wzrw-postgresql-15.6-lib/lib
-L/nix/store/70y8lpggw1jxmkgcmxfpp769z1y2cnpb-libxml2-2.12.5/lib
-L/nix/store/5wzi64qpd6i1zn3bsg9vm0l2l2n85m25-lz4-1.9.4/lib
-L/nix/store/9l0rfic1ax0ywx63qb97wyhycpcmy1cx-zstd-1.5.5/lib
...

No mention of the -dev output anywhere, but it still works fine.

Without knowing exactly what the magic behind the scenes does here, I conclude that this will work just fine without pg_config / pkg-config returning another -dev/lib folder.

@szlend
Copy link
Member

szlend commented Mar 16, 2024

I don't think this libpq -> postgresql_latest.lib top-level attribute will actually work. IIUC, when passing postgresql as a buildInput, it is actually using lib.getDev behind the scenes, to use the dev output of the postgresql package. But this won't work with this kind of libpq package - because lib.getDev postgresql.lib does not return the -dev output, but still only the -lib output. Without any of the headers, pkg-config files and static libraries... linking against libpq will certainly fail.

Yep, I came to the same conclusion a while back.

Of course the fix is simple - just make libpq point to postgresql_latest directly and not the lib output, which will have everything we actually wanted to, now that the outputs are split cleanly.

I don't think this is a great solution either. I would find it pretty surprising if installing libpq included the entire postgresql server. In that case I'd rather just not have a libpq package ta all.

Alternatively I could see libpq be a variant of the posgresql package with a few overrides that make it skip building the server bits. But iirc the package maintainers were against this idea.

One thing that is still missing right now, is that the -dev/lib folder must be added to the pkg-config file, so that libpgcommon, libpgport etc. can be found.

I wonder how this would work before this PR anyway. Currently, those libraries are in the default output, but neither pg_config nor the pkg-config files have any mention of the default output's /lib directory.

This is one of the reason I still think a separate libpq package is the right move. This is just not solvable in any reasonable way with a single package with multiple outputs. pg_config can only refer to one lib, and there are two of them. It's as simple as that.

@tie
Copy link
Member

tie commented Jun 24, 2024

👍 for avoiding wrappers with paths-for-split-outputs.patch.

@wolfgangwalther
Copy link
Contributor Author

Note to myself to investigate: #322025 (comment) mentions that we still have -dev references via CONFIGURE_ARGS in the default output here. This was not supposed to happen, IIRC.

@tie
Copy link
Member

tie commented Jun 24, 2024

Note to myself to investigate: #322025 (comment) mentions that we still have -dev references via CONFIGURE_ARGS in the default output here. This was not supposed to happen, IIRC.

In particular, in bin/postgres for CONFIGURE_ARGS and other recorded build information.

$ rg --binary -- -dev
bin/postgres: binary file matches (found "\0" byte around offset 7)

This is for select pg_config() and select * from pg_config (see src/backend/utils/misc/pg_config.c), and otherwise CONFIGURE_ARGS do not have any meaningful use case. Perhaps we can actually nuke the references in CONFIGURE_ARGS as the linked PR does? And also # CPPFLAGS += -DVAL_.

@tie
Copy link
Member

tie commented Jun 24, 2024

The following patch based on #322025 avoids these references in the output. That is, it removes references from CONFIGURE_ARGS and does not record VAL_{CC,CPPFLAGS,…}.

diff --git a/pkgs/servers/sql/postgresql/generic.nix b/pkgs/servers/sql/postgresql/generic.nix
index 2bc8e922205a..01a75227f245 100644
--- a/pkgs/servers/sql/postgresql/generic.nix
+++ b/pkgs/servers/sql/postgresql/generic.nix
@@ -7,6 +7,8 @@ let
       , pkg-config, libxml2, tzdata, libkrb5, substituteAll, darwin
       , linux-pam
       , removeReferencesTo
+      , writeScript
+      , nukeReferences
 
       # This is important to obtain a version of `libpq` that does not depend on systemd.
       , systemdSupport ? lib.meta.availableOn stdenv.hostPlatform systemd && !stdenv.hostPlatform.isStatic
@@ -108,6 +110,7 @@ let
       makeWrapper
       pkg-config
       removeReferencesTo
+      nukeReferences
     ]
       ++ lib.optionals jitSupport [ llvmPackages.llvm.dev ];
 
@@ -175,6 +178,31 @@ let
       substituteInPlace "src/Makefile.global.in" --subst-var out
       # Hardcode the path to pgxs so pg_config returns the path in $dev
       substituteInPlace "src/common/config_info.c" --subst-var dev
+
+      # Do not record CC and related build information. See code reference:
+      # https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/common/Makefile;h=3d83299432b948b08e18f8371d514ba70c026845;hb=HEAD#l34
+      substituteInPlace src/common/Makefile \
+        --replace-fail \
+        "override CPPFLAGS += -DVAL_" \
+        "# override CPPFLAGS += -DVAL_"
+
+      # Record arguments without store references.
+      substituteInPlace configure \
+        --replace-fail \
+        '#define CONFIGURE_ARGS "$ac_configure_args"' \
+        '#define CONFIGURE_ARGS "$fakeConfigureArgs"'
+      # PostgreSQL compares configure script timestamps.
+      touch -r configure.${if atLeast "14" then "ac" else "in"} configure
+    '';
+
+    # Avoid implicit references in the recorded configure arguments.
+    configureScript = writeScript "postgresql-configure-wrapper" ''
+      printf %s "''${*@Q}" >fake-configure-args
+      nuke-refs fake-configure-args
+      fakeConfigureArgs=$(<fake-configure-args)
+      rm fake-configure-args
+      export fakeConfigureArgs
+      exec ./configure "$@"
     '';
 
     postInstall =

This is seems to be a supported behavior, see

@wolfgangwalther
Copy link
Contributor Author

we still have -dev references via CONFIGURE_ARGS in the default output here.

First of all, I added a disallowedRequisites = zlib.dev to the out and lib outputs to catch this case.

This is for select pg_config() and select * from pg_config (see src/backend/utils/misc/pg_config.c), and otherwise CONFIGURE_ARGS do not have any meaningful use case. Perhaps we can actually nuke the references in CONFIGURE_ARGS as the linked PR does? And also # CPPFLAGS += -DVAL_.

My thoughts about this:

  • We need to remove those references from the pg_config system information view. There's no way around that, if we want to reduce closure size.
  • If possible, it would be great if we could keep those references in the pg_config binary, which resides in the -dev output anyway.

The only way I found to achieve both is to just empty the pg_config view with a patch entirely. It will just return nothing now. That's not too bad, because 90% of it's contents would be nuked path references anyway, so quite useless. This feels like a cleaner approach overall: We can't provide a useful pg_config view, so let's not provide any output at all.

This also made me realize that the postgres binary contains some more references to the -dev output, not only via the pg_config view. Thus, I adjusted the comment for the remove-references-to accordingly.

I left this change as a fixup commit in 02e2ecd for you to have a look at. @Ma27 @tie WDYT about this approach?

@tie
Copy link
Member

tie commented Jul 7, 2024

The only way I found to achieve both is to just empty the pg_config view with a patch entirely. It will just return nothing now. That's not too bad, because 90% of it's contents would be nuked path references anyway, so quite useless. This feels like a cleaner approach overall: We can't provide a useful pg_config view, so let's not provide any output at all.

This sounds reasonable, at least I’m not aware of any pg_config view use cases.

Haven’t checked whether fixup commit 02e2ecd still retains $NIX_CC and $NIX_BINTOOLS references when CC, CXX, LD and other variables are set to an absolute path (see PR #314920 that I’ve been working on for some time).

@wolfgangwalther
Copy link
Contributor Author

Haven’t checked whether fixup commit 02e2ecd still retains $NIX_CC and $NIX_BINTOOLS references when CC, CXX, LD

AFAICT from the postgres source, those VAL_xxx variables are only used for the pg_config view and binaries, so this should be good.

@Ma27
Copy link
Member

Ma27 commented Jul 11, 2024

Eh sorry, I'm not sure if I read about this PR being unblocked and forgot or if I just missed it. Anyways, apologies! But hey, this has been a pleasant surprise at least ;-)

I left this change as a fixup commit in 02e2ecd for you to have a look at. @Ma27 @tie WDYT about this approach?

I'm a little torn on this. On one hand I like it (and it solves the output issue!), OTOH this seems a little... unexpected?
But given that I'm not aware of any actual use for this, I'm willing to accept that change as long as it's well-documented in both the manual and the release-notes.

IIRC, @Ma27 you added the JIT-stuff and removed the dependency on clang-wrapper from those files explicitly

I can try it out. I mean, the jit test should explode already if that's the case (but to be sure, I can probably dogfood the change on one of my private systems as soon as I've reviewed it).

@Ma27
Copy link
Member

Ma27 commented Jul 26, 2024

Change is looking pretty promising now. Letting it run on my systems to see if I hit any regressions. Other than that, after the enxt iteration this is hopefully good to go.

@reckenrode
Copy link
Contributor

#307880 has landed in master. The Darwin LTO issues should be resolved now. If not, please ping me, so I can investigate.

Dynamic modules are technically libraries, but are not used by other packages.
Instead they are loaded by PostgreSQL itself, e.g. as extensions. Those just
increased the size of the lib output without benefit.

This removes the NIX_PGLIBDIR hack introduced in NixOS#17838 and instead makes sure
that pg_config always returns the correct PGLIBDIR. The test for postgis
introduced in the same PR is still passing with the change.
This library is used by other packages, so should be in the lib output.

By removing unused sections, libecpg will not contain any references to other
outputs and thus does not increase the closure for the lib output anymore.
This will also help massively when splitting a dev output later.

As a side-effect, this also unbreaks pkgsMusl.postgresql_12_jit and
pkgsMusl.postgresql_13_jit. For, at least to me, unknown reasons, those build
fine now.
This splits a dev output to make the default output not depend on any
build dependencies anymore. This also avoids removing references from
pgxs' Makefile this way, which should, at least theoretically, be good
to build extensions via pgxs, making sure they use the same tooling.

ecpg is the "embedded SQL C preprocessor", which is certainly a dev
tool.

Most important, for closure size anyway, is to move pg_config to the dev
output, since it retains paths to all the other outputs.

The only thing with references to the dev output remaining is then the
postgres binary itself. It contains all the output paths, because it
shows those in the pg_config system view. There is no other way than
to nuke those references to avoid circular dependencies between outputs
- and blowing up closure size again.
The !isDarwin condition seems to have been put in place only because of the
use of patchelf, which in turn seemed to be necessary because of nuke-refs.

Replacing the big nuke with the more granular remove-references-to allows
enabling this for darwin, too.
This prevents silently ignoring errors from "find".
glibc is not available on darwin anyway and isGnu is just shorter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants