Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdenv/linux/default.nix: add gcc rebuild during bootstrap #209063

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
109 changes: 67 additions & 42 deletions pkgs/stdenv/linux/default.nix
Expand Up @@ -9,14 +9,11 @@
# is used to build all other packages (including the bootstrapFiles).
#
# Goals of the bootstrap process:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trofi, did you try splitting gcc's stage1 and stage2 into separate derivations? That would achieve the goal without having to build gcc twice. But it is a larger project than simply adding an extra bootstrap stage, so if there is a deadline or urgent need to deal with this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to explore --disable-bootstrap for gcc builds when stdenv.cc.cc and built gcc are of the same versions. But there are extra caveats to consider that might break non-bootstrap form of it. That should shave off 0.5 of gcc build time.

Splitting gcc's build system into a multiple derivations is a very big project. Looking at a smaller scale binutils / libbfd split nixpkgs has I would say it's a maintenance nightmare. Testing it in bootstrap is another level of footgun. Adding support for it upstream would make me less nervous :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already suspected that messing with gcc's stages could help, perhaps skipping some of the stages when we "build twice" anyway, but I don't know gcc's build process really so won't be able to help much. I certainly wouldn't attempt such big projects inside this PR; better leave them for later.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what the big rush is.

This PR makes it more difficult to fix the problem the right way later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "big hurry" is that we are encountering more and more packages that don't build on aarch64, which is stuck on GCC 9 by default due to the libgcc_s issue.

Copy link

@ghost ghost Jan 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright after seven minute-long git blames I found these issues:

The commit message for this PR needs to give this background.

More info:

I'm still investigating this. Pretty sure we do not need to overhaul the entire stdenv fixpoint, on every platform in order to fix this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time believing that this is the reason for the urgency.

If this was the reason for such a rush, we could simply update the bootstrap-files instead.

That's not a long-term solution, but the copy-and-pasting business in this PR, and imposing the rebuild costs on all platforms (instead of aarch64) isn't a long-term solution either.

Rebuilding the aarch64 bootstrap-files is an easy-to-revert change. This PR is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR makes it more difficult to fix the problem the right way later on.

Why do you think it will make things more difficult to fix?

You can just remove one extra stage (and maybe shuffle around a few thing you don't want to rebuild) and add your steps. In fact if you were to change gcc outputs you could just throw all the stages out and add new ones. It's trivial if you know you want to achieve.

I'll cite the things to roll back here in full:

  # 2nd stdenv that contains our own rebuilt binutils and is used for
  # compiling our own gcc.
  #
  # resulting stage2 stdenv:
  # - coreutils, glibc: from bootstrapFiles
  # - binutils, gcc: from nixpkgs, built by bootstrapFiles toolchain
  (prevStage: stageFun prevStage {
    name = "bootstrap-stage2";

    overrides = self: super: {
      inherit (prevStage)
        ccWrapperStdenv
        bash bison gnum4 coreutils gnugrep
        patchelf perl texinfo xz;

      ${localSystem.libc} = getLibc prevStage;

      gcc-unwrapped =
        let makeStaticLibrariesAndMark = pkg:
              lib.makeOverridable (pkg.override { stdenv = self.makeStaticLibraries self.stdenv; })
                .overrideAttrs (a: { pname = "${a.pname}-stage2"; });
        in super.gcc-unwrapped.override {
        # Link GCC statically against GMP etc.  This makes sense because
        # these builds of the libraries are only used by GCC, so it
        # reduces the size of the stdenv closure.
        gmp = makeStaticLibrariesAndMark super.gmp;
        mpfr = makeStaticLibrariesAndMark super.mpfr;
        libmpc = makeStaticLibrariesAndMark super.libmpc;
        isl = makeStaticLibrariesAndMark super.isl_0_20;
        # Use a deterministically built compiler
        # see https://github.com/NixOS/nixpkgs/issues/108475 for context
        reproducibleBuild = true;
        profiledCompiler = false;
      };
    };

    # `libtool` comes with obsolete config.sub/config.guess that don't recognize Risc-V.
    extraNativeBuildInputs =
      lib.optional (localSystem.isRiscV) prevStage.updateAutotoolsGnuConfigScriptsHook;
  })

39 lines of well isolated change.

The other changes do not need to be rolled back and are quality-of-life improvements. They are prerequisites for this change to work though.

If this was the reason for such a rush, we could simply #207135 (comment) instead.

Every time new person encountered libgcc_s.so.1 mismatch it was proposed to update bootstrapTools. It never happened. I remember it ongoing for at least 6 months. Looking at gcc versions I think we are broken for a year.

Is there an issue or link for this?

There are many issues:

  • One of the more comprehensive ones is linked under this PR: Bootstrap Reform and aarch64-linux's GCC Upgrade #208412
  • Another is that nixpkgs glibc is built using bootstrapTools's gcc. If we don't upgrade gcc on a regular basis for all targets (and we don't, and I don't see us doing it) it's a very annoying property to keep in mind.

Rebuilding the aarch64 bootstrap-files is an easy-to-revert change. This PR is not.

Mechanically they are equally easy to revert. Breakage-wise they are both hard to revert as dependencies rely on environment correctness.

Copy link

@ghost ghost Jan 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other changes do not need to be rolled back and are quality-of-life improvements.

Great! Please break those out into a standalone PR and let's get them merged. I have no problems with any of those. I would be happy to immediately approve a PR with those three commits.

39 lines of well isolated change.

No, they're not well-isolated. Also, 6e67eab is more than 39 lines of changes.

It's very easy for external code to become dependent upon the existence of the extra stage due to laziness and evaluation order. Since you're making this change globally across all non-aarch64 platforms nobody will be using the pre-PR codepath, so we will never know if it breaks. This is my main objection.

If 6e67eab is truly "well isolated" it should be easy to make them conditional and enabled only on aarch64. This way we will be constantly testing our ability to revert this change (every x86_64 rebuild will verify that). If you did that -- make all the changes in 6e67eab conditional on isAarch64 -- I would would be a lot less worried about this approach.

My smaller concern is the fact that you're copy-and-pasting large blocks of code (which you admit you don't understand!) instead of at the very least factoring out the common expression. I mean... come on, do you want to work on a codebase where people do stuff like that? Nixpkgs is a pleasure to work on because it doesn't have any of that. Maybe in the "leaf packages", but definitely not in stdenv or lib.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Please break those out into a standalone PR and let's get them merged.

They were broken out. The list is at #208412 (comment)

No, they're not well-isolated.

Can you be more specific? What are the dependencies that will require you to change things you won't be able to do perform or easily diagnose?

Also, 6e67eab is more than 39 lines of changes.

39 lines needs a roll back. I can make it 39 lines and reintroduce back trivial rebuilds for xz and friends. They are trivial fixes though, I don't think it's necessary, but I can make redundancy removal a separate PR as well.

It's very easy for external code to become dependent upon the existence of the extra stage due to laziness and evaluation order.

Can you give more specific of dependency you have in mind and why it will be hard to fix? In any case I will give my help and will spend the time to fix those issues in the future.

If 6e67eab is truly "well isolated" it should be easy to make them conditional and enabled only on aarch64.

I want every arch to benefit from libgcc_s.so.1 to match gcc version and have glibc built by non-bootstrap compiler. I very specifically sent out #208478 to get a consensus that state of today's glibc is a violation of the principles. This PR fixes it. aarch64 linakge fix is a natural outcome of it.

My smaller concern is the fact that you're copy-and-pasting large blocks of code (which you admit you don't understand!) instead of at the very least factoring out the common expression.

I don't understand the implications of removing static library overrides and would like to clean them up separately. I have vague understanding why they are there. I'll reiterate again: I will spend time to clean them up when/if this PR is merged. Switching from shared to static is not a small task as dynamic linking during stdenv bootstrap is very fragile due to incorrect library lookup paths. I expect to spend a lot of time on it. Moving libstdc++.so (the part if this PR) is a clear example of that. It will need to be done for every gcc library it uses. I think this PR is big enough on it's own.

I mean... come on, do you want to work on a codebase where people do stuff like that?

I am not just willing to work on that. I want to improve it in very short term future. I don't think that trivial copy if a gcc attribute is the problem for those who work on the bootstrap of linux for hacking. I want to think this PR shows it very clearly.

Nixpkgs is a pleasure to work on because it doesn't have any of that. Maybe in the "leaf packages", but definitely not in stdenv or lib.

I would like to think this PR would not inhibit anyone to work on bootstrap stdenv and would not make it less straightforward to reason about. If I read between the lines this PR makes something harder for you to work on. Can you, maybe, elaborate? Or is your concern only extra overhead it pulls in?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ericson2314 should weigh in on this PR.

# 1. final stdenv must not reference any of the bootstrap files.
# 2. final stdenv must not contain any of the bootstrap files
# (the only current violation is libgcc_s.so in glibc).
# 1. final stdenv must not reference any of the bootstrap packages.
# 2. final stdenv must not contain any of the bootstrap files.
# 3. final stdenv must not contain any of the files directly
# generated by the bootstrap code generators (assembler, linker,
# compiler). The only current violations are: libgcc_s.so in glibc,
# the lib{mpfr,mpc,gmp,isl} which are statically linked
# into the final gcc).
# compiler).
#
# These goals ensure that final packages and final stdenv are built
# exclusively using nixpkgs package definitions and don't depend
Expand Down Expand Up @@ -109,12 +106,13 @@ let
'';


# The bootstrap process proceeds in several steps.


# Create a standard environment by downloading pre-built binaries of
# coreutils, GCC, etc.

# The bootstrap process proceeds in several steps. Our high-level plan
# is the following:
#
# - build binutils with bootstrapTools
# - build gcc with bootstrapTools
# - build glibc with bootstrapTools + gcc from nixpkgs (embed libgcc_s.so.1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trofi, is there a motivation to rush this through?

Upthread, @K900 suggested a reason for urgency, but I doubt that is the motivation for reasons explained here and the fact that there is no mention of aarch64 or -moutline-atomics in the commit message.

Are ARMv8.1+ atomics the justification for all the technical debt that this PR incurs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about ARMv8.1+ atomics per se. What's happening here is:

  • GCC 11 adds multiple symbols related to aarch64 atomics to libgcc_s, which our bootstrap GCC 9 libgcc_s does not have
  • which means applications built with GCC 11 expect those symbols to be there, and crash when they aren't
  • which means we can't use GCC 11 on aarch64 without patching things to link the correct libgcc_s
  • which is starting to hold back a lot of things that need C++17

For some examples of that, see #209113 (rustc), #201485 (llvm), https://github.com/NixOS/nixpkgs/blob/45ccf597070d2321be4984d2f653e75f5099e4ae/pkgs/desktops/plasma-5/kwin/0001-Revert-x11-Refactor-output-updates.patch (kwin).

"Just update bootstrap-tools" also comes with its own issues as seen in #207135.

I think it's worth building towards a correct solution here, so we don't have to deal with that again.

# - build gcc with nixpkgs toolchain
Comment on lines +113 to +115
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building gcc twice is going to slow down staging rebuilds... there is no parallelism at this point in the build graph because everything depends on stdenv. @vcunat are you okay with this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I've been following this. It seems worth building gcc twice, as it should address quite a lot of issues. The slow-down is mainly for development of stdenv-rebuilding stuff (and bisection); on Hydra there's way more weight on the total amount of work over all packages and that won't change noticeably.

We can instead try making the bootstrap process a bit less slow than in the first version of this PR, partially inside this PR (or the same staging-next iteration).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on Hydra there's way more weight on the total amount of work over all packages

Oh, I didn't mean it would require adding more machines to the Hydra pool. The issue isn't the total number of CPU cycles for a rebuild stdenv. The issue is the total number of wall-clock hours needed to rebuild stdenv, regardless of how many builders you have.

This PR adds several serial stages to the bootstrap (derivations that can't be built in parallel with anything else), and some of those derivations (cough cough gettext) are effectively single-threaded builds that don't even make meaningful use of multiple cores.

It seems worth building gcc twice,

Except that we don't need to do that to get any of the benefits...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that for Hydra the wallclock hours for stdenv don't seem so important because it always needs to build also very many other packages after that. (The total time to rebuild everything is long anyway and very often there's lots of work on another branch that can be done in parallel.)

I don't see rush. I mean, the author did write upfront that it would probably take weeks to merge. I perceived split stages as a further improvement of this, if someone can do that, but maybe it's my lack of knowledge of gcc's build process. Either way, I believe it would certainly be nice to have at least some solution to the issues within a couple months, because aarch64-linux is accumulating hacks in the meantime (old gcc version, recently rustc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems worth building gcc twice,
Except that #208478 (comment) to get any of the benefits...

I think you are arguing for an implementation detail. I would say the comment would not even change if we were to implement your suggestion. You better use latest gcc to build glibc. You better use latest glibc to build gcc. Building gcc as is (this PR) or shredding it in parts by making changes in gcc's build system (AFAIU that's the alternative you propose) does not change this high-level layout. Does it?

Copy link
Contributor

@Shawn8901 Shawn8901 Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am absolute no expert on the bootstrap topic, but I am currently playing a bit around on compiling my system with x86-64-v3. As the boostrapTools version is v9 it does not know those architectures. When building on the default generic we may not require a second rebuild on GCC (as the gcc9 may already build a newer GCC which fits the flags), i one would require features of a newer GCC to build the final GCC, we would require a second GCC rebuild right?

Building x86-64-vX is still a WIP and quite controversial discussion on that PR but on my opinion it would be nice to get it working and this PR does do a lot of nice changes (where I at least guess I understand more about bootstrap process) why I thought leaving the question here may be a good place

If I am wrong or so please just ignore that, I am absolute no expert and still learning on the topic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the boostrapTools version is v9 it does not know those architectures

You mean boostrapTools is on gcc-8 and it might not be enough to generate the code for -v3? It depends. If gcc-8 supports -mavx2, -mmovbe and friends that might be good enough. Worth checking by trying to build a simple program.

But generally yes, you need to make sure not to pass unsupported flags to gcc.

Another note: it should be fine to mix code from a default ISA implied by gcc and fresh v3 set that adds extra instructions on top. No need to do anything beyond the bootstrap .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcc-8 seems to recognize at least those 2 I tried:

$ nix-shell -E 'with import <nixpkgs> {}; pkgs.mkShell.override { stdenv = pkgs.gcc8Stdenv; } {}'

$ gcc -v
Using built-in specs.
COLLECT_GCC=/nix/store/m8hkk78wvnqa0pr5162b0vx26rl5lvvw-gcc-8.5.0/bin/gcc
COLLECT_LTO_WRAPPER=/nix/store/m8hkk78wvnqa0pr5162b0vx26rl5lvvw-gcc-8.5.0/libexec/gcc/x86_64-unknown-linux-gnu/8.5.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: 
Thread model: posix
gcc version 8.5.0 (GCC) 

$ printf "int main() {}" | gcc -x c -c - -o /dev/null -mavx2 -mmovbe

$ echo $?
0

Copy link
Contributor

@Shawn8901 Shawn8901 Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, I meant initially the arch flags (there is something like -march=x86-64-v3 on newer compiler), setting single flags would already work on the old GCC, that's right as this GCC already supports CPUs which are v3s. Current I try to filter out those flags in the bootstrap process.
what my thought was, if the bootstraped GCC is kept without recompiling from a nixpkgs GCC, that should result in a different GCC when doing a recompile, as recompiling then is from a nixpkgs GCC (if I am right on that) and this was basically the question/input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had no idea -march=x86-64-v3 was a thing. Yeah, you either need to rebuild gcc again if you already filtered them out or use CFLAGS_FOR_TARGET (and friends) to only affect the final result. It is certainly using flags for final libraries. Worth double-checking if the build process uses the same for gcc executable itself.


# Download and unpack the bootstrap tools (coreutils, GCC, Glibc, ...).
bootstrapTools = import (if localSystem.libc == "musl" then ./bootstrap-tools-musl else ./bootstrap-tools) {
Expand Down Expand Up @@ -284,16 +282,56 @@ in
};
})


# 2nd stdenv that contains our own rebuilt binutils and is used for
# compiling our own Glibc.
# compiling our own gcc.
#
# resulting stage2 stdenv:
# - coreutils, glibc, gcc: from bootstrapFiles
# - binutils: from nixpkgs, built by bootstrapFiles toolchain
# - coreutils, glibc: from bootstrapFiles
# - binutils, gcc: from nixpkgs, built by bootstrapFiles toolchain
(prevStage: stageFun prevStage {
name = "bootstrap-stage2";

overrides = self: super: {
inherit (prevStage)
ccWrapperStdenv gettext
coreutils gnugrep
perl gnum4 bison;

${localSystem.libc} = getLibc prevStage;

gcc-unwrapped =
let makeStaticLibrariesAndMark = pkg:
lib.makeOverridable (pkg.override { stdenv = self.makeStaticLibraries self.stdenv; })
.overrideAttrs (a: { pname = "${a.pname}-stage2"; });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing a whole lot of packages being rebuilt an extra time with this PR -- binutils, gettext, and others.

Please mark the extra builds with -stageX in their pname like above so we can tell them apart, and so we can understand the rebuild impact of this PR. Right now we don't have a good handle on it.

For example:

# before this PR
$ nix-store -q --requisites $(nix-instantiate . -A stdenv --argstr system powerpc64le-linux) | grep binutils-2.39.drv | wc -l
3
# with this PR
$ nix-store -q --requisites $(nix-instantiate . -A stdenv --argstr system powerpc64le-linux) | grep binutils-2.39.drv | wc -l
4

Copy link

@ghost ghost Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this is one of the reasons why we don't build gcc twice. Everything depends on gcc, so an extra rebuild of gcc means an extra rebuild of all of its dependencies.

I think avoiding this mass-rebuild may have been the motivation for the lib{mpfr,mpc,gmp,isl}.a hack, which has been part of stdenv for over a decade.

Copy link

@ghost ghost Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List of packages with more than one build after the change:

$ nix-store --query ... | uniq -c | ...

Using uniq -c here is misleading. The fact that the lists in the first comment of this PR are roughly the same length is misleading; it's the numbers in the first column that matter. Try uniq -D instead:

nix-store -q --requisites $(nix-instantiate . -A stdenv) \
| cut -b45- \
| sort \
| grep '\.drv$' \
| uniq -D

It also matters a lot which packages are rebuilt. The bootstrap-stageX-stdenv-linux.drv derivations are just linkFarms; they are basically instantaneous. But gettext takes forever (very poor internal build parallelism) and is a serialization point in the build graph.

Based on a diff of the output of the command above before and after, it looks like we've added:

  • two more rebuilds of gcc (only one expected -- update the PR title?)
  • one more rebuild of binutils
  • one more rebuild of gettext 🤮
  • two more rebuilds of gmp-with-cxx
  • two more rebuilds of mpfr
  • one more build of libxcrypt
  • one more build of zlib
  • two more rebuilds of which

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using uniq -c here is misleading.

No, uniq -c removes identical duplicate store paths when target of an arrow is the same store path with multiple inputs. Store hash is removed after the uniq. It should tell the correct numbers of rebuilds (maybe slightly outdated as I added extra xz rebuild since).

two more rebuilds of gcc (only one expected -- update the PR title?)

No, I still think it introduces one extra gcc build. Building it twice is not very trivial from standpoint of used references. You can check which exact stdenv triggers the builds to make sure:

$ nix-store --query --graph $(nix-instantiate -A stdenv) |& grep -P 'stdenv.* -> .*gcc-11'

"3bxxwn8hbz725i081fqlylqljyp5pk8m-bootstrap-stage4-stdenv-linux.drv" -> "1ikpv8jbf5z18hp0gkiw27234ifvgmbk-gcc-11.3.0.drv" [color = "red"];
"xck2dhlvi35c0lnc762z57sh1lv22n2d-bootstrap-stage2-stdenv-linux.drv" -> "imhrq50yi7mkx4wgbv6sifjyhbi1yymn-gcc-11.3.0.drv" [color = "red"];

Tl;DR: one in stage2 (new addition) and one in stage4 (pre-existing one).

Or you can look up gcc nodes in slightly filtered .svg file to ease visual inspection:

$ nix-store --query --graph $(nix-instantiate -A stdenv) |
    grep -vP '[.](sh|tar|bash|patch|c|diff)|bash52-0|wrapper|hook|expand-response-params' |
    dot -Tsvg > /tmp/a.svg; firefox /tmp/a.svg

I still claim we have only 2 gcc builds here: https://trofi.github.io/posts.data/275-nixpkgs-bootstrap-deep-dive/08-nixpkgs-bootstrap-tree.svg.

in super.gcc-unwrapped.override {
# Link GCC statically against GMP etc. This makes sense because
# these builds of the libraries are only used by GCC, so it
# reduces the size of the stdenv closure.
gmp = makeStaticLibrariesAndMark super.gmp;
mpfr = makeStaticLibrariesAndMark super.mpfr;
libmpc = makeStaticLibrariesAndMark super.libmpc;
isl = makeStaticLibrariesAndMark super.isl_0_20;
# Use a deterministically built compiler
# see https://github.com/NixOS/nixpkgs/issues/108475 for context
reproducibleBuild = true;
profiledCompiler = false;
};
Comment on lines +303 to +318
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let makeStaticLibrariesAndMark = pkg:
lib.makeOverridable (pkg.override { stdenv = self.makeStaticLibraries self.stdenv; })
.overrideAttrs (a: { pname = "${a.pname}-stage2"; });
in super.gcc-unwrapped.override {
# Link GCC statically against GMP etc. This makes sense because
# these builds of the libraries are only used by GCC, so it
# reduces the size of the stdenv closure.
gmp = makeStaticLibrariesAndMark super.gmp;
mpfr = makeStaticLibrariesAndMark super.mpfr;
libmpc = makeStaticLibrariesAndMark super.libmpc;
isl = makeStaticLibrariesAndMark super.isl_0_20;
# Use a deterministically built compiler
# see https://github.com/NixOS/nixpkgs/issues/108475 for context
reproducibleBuild = true;
profiledCompiler = false;
};
super.gcc-unwrapped.override {
# Use a deterministically built compiler
# see https://github.com/NixOS/nixpkgs/issues/108475 for context
reproducibleBuild = true;
profiledCompiler = false;
};

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for this hack if we're building gcc twice -- you should remove it from stage3 as well (github won't let me make a ```suggestion for that one).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the reason for building static libraries in the first place, thus cargo-culted it as is in both places. My vague understanding is that we do it to not confuse nixpkgs variants of the libraries with bootstrapTools variants of the same libraries.

I would like to clean it up separately (probably outside this PR, as it's already quite big) and remove the need for static builds entirely. I think we can do it in a similar fashion we sorted libstdc++.so clash in this PR: remove unversioned .so files completely and possibly move out versioned .so or .a files to a private directory. But I need to make sure those libraries are not used for linking anywhere before gcc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the reason for building static libraries in the first place

Yeah I know; it took me a long time to understand it.

I really appreciate the work you did in #208478 to document some of this stuff, but I feel like this PR is very very rushed, without a full understanding of the effects it will have.

I would like to clean it up separately (probably outside this PR, as it's already quite big)

What's the big hurry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate the work you did in #208478 to document some of this stuff, but I feel like this PR is very very rushed, without a full understanding of the effects it will have.

I'm quite confident in final results this PR produces. I'm not confident changing initial gcc build process WRT libraries presented in bootstrapTools. I needs a lot of staring at the library search paths in many packages. This PR does not change that. Thus it's a lot easier to reason about the change. Changing it separately is a lot easier for me than glob everything together in a single PR.

I would like to clean it up separately (probably outside this PR, as it's already quite big)
What's the big hurry?

I am in no hurry. We either do it or not.

Copy link

@ghost ghost Jan 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm rebuilding #209459 plus rebootstrap=stdenv.isAarch64 and gcc=gcc11.

I believe it should fix the aarch64-gcc10+ problem with zero overhead on non-aarch64 platforms and (most important) no technical debt.

It is almost finished on powerpc64le with no problems. My arm builders are slow so that will take overnight.

};

# `libtool` comes with obsolete config.sub/config.guess that don't recognize Risc-V.
extraNativeBuildInputs =
lib.optional (localSystem.isRiscV) prevStage.updateAutotoolsGnuConfigScriptsHook;
})

# 3rd stdenv that contains our own rebuilt binutils and gcc and is
# used for compiling our own Glibc.
#
# resulting stage3 stdenv:
# - coreutils: from bootstrapFiles
# - glibc, binutils, gcc: from nixpkgs, built by bootstrapFiles toolchain
(prevStage: stageFun prevStage {
name = "bootstrap-stage3";

overrides = self: super: {
inherit (prevStage)
ccWrapperStdenv gettext
Expand Down Expand Up @@ -357,15 +395,16 @@ in
})


# Construct a third stdenv identical to the 2nd, except that this
# Construct a fourth stdenv identical to the 3rd, except that this
# one uses the rebuilt Glibc from stage2. It still uses the recent
# binutils and rest of the bootstrap tools, including GCC.
#
# resulting stage3 stdenv:
# resulting stage4 stdenv:
# - coreutils, gcc: from bootstrapFiles
# - glibc, binutils: from nixpkgs, built by bootstrapFiles toolchain
# - binutils, gcc: from nixpkgs, built by bootstrapFiles toolchain
# - glibc: from nixpkgs, built by nixpkgs toolchain
(prevStage: stageFun prevStage {
name = "bootstrap-stage3";
name = "bootstrap-stage4";

overrides = self: super: rec {
inherit (prevStage)
Expand All @@ -376,7 +415,7 @@ in
gcc-unwrapped =
let makeStaticLibrariesAndMark = pkg:
lib.makeOverridable (pkg.override { stdenv = self.makeStaticLibraries self.stdenv; })
.overrideAttrs (a: { pname = "${a.pname}-stage3"; });
.overrideAttrs (a: { pname = "${a.pname}-stage4"; });
in super.gcc-unwrapped.override {
# Link GCC statically against GMP etc. This makes sense because
# these builds of the libraries are only used by GCC, so it
Expand All @@ -398,21 +437,15 @@ in
})


# Construct a fourth stdenv that uses the new GCC. But coreutils is
# Construct a fifth stdenv that uses the new GCC. But coreutils is
# still from the bootstrap tools.
#
# resulting stage4 stdenv:
# resulting stage5 stdenv:
# - coreutils: from bootstrapFiles
# - glibc, binutils: from nixpkgs, built by bootstrapFiles toolchain
# - gcc: from nixpkgs, built by bootstrapFiles toolchain. Can assume
# it has almost no code from bootstrapTools as gcc bootstraps
# internally. The only exceptions are crt files from glibc
# built by bootstrapTools used to link executables and libraries,
# and the bootstrapTools-built, statically-linked
# lib{mpfr,mpc,gmp,isl}.a which are linked into the final gcc
# (see commit cfde88976ba4cddd01b1bb28b40afd12ea93a11d).
# - binutils: from nixpkgs, built by bootstrapFiles toolchain
# - glibc, gcc: from nixpkgs, built by nixpkgs toolchain
(prevStage: stageFun prevStage {
name = "bootstrap-stage4";
name = "bootstrap-stage5";

overrides = self: super: {
# Zlib has to be inherited and not rebuilt in this stage,
Expand All @@ -433,7 +466,7 @@ in
# force gmp to rebuild so we have the option of dynamically linking
# libgmp without creating a reference path from:
# stage5.gcc -> stage4.coreutils -> stage3.glibc -> bootstrap
gmp = lib.makeOverridable (super.gmp.override { stdenv = self.stdenv; }).overrideAttrs (a: { pname = "${a.pname}-stage4"; });
gmp = lib.makeOverridable (super.gmp.override { stdenv = self.stdenv; }).overrideAttrs (a: { pname = "${a.pname}-stage5"; });

# To allow users' overrides inhibit dependencies too heavy for
# bootstrap, like guile: https://github.com/NixOS/nixpkgs/issues/181188
Expand Down Expand Up @@ -469,15 +502,7 @@ in
# binutils built.
#
# resulting stage5 (final) stdenv:
# - coreutils, binutils: from nixpkgs, built by nixpkgs toolchain
# - glibc: from nixpkgs, built by bootstrapFiles toolchain
# - gcc: from nixpkgs, built by bootstrapFiles toolchain. Can assume
# it has almost no code from bootstrapTools as gcc bootstraps
# internally. The only exceptions are crt files from glibc
# built by bootstrapTools used to link executables and libraries,
# and the bootstrapTools-built, statically-linked
# lib{mpfr,mpc,gmp,isl}.a which are linked into the final gcc
# (see commit cfde88976ba4cddd01b1bb28b40afd12ea93a11d).
# - coreutils, binutils, glibc, gcc: from nixpkgs, built by nixpkgs toolchain
(prevStage: {
inherit config overlays;
stdenv = import ../generic rec {
Expand Down