Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: support RHEL/CentOS 6 #13102

Merged
merged 2 commits into from Feb 3, 2017
Merged

build: support RHEL/CentOS 6 #13102

merged 2 commits into from Feb 3, 2017

Conversation

tamird
Copy link
Contributor

@tamird tamird commented Jan 23, 2017

  • cockroachdb/builder is now based on ubuntu xenial
  • cockroachdb/builder targets RHEL/CentOS 6 versions of linux and glibc
  • cockroachdb/builder is upgraded to Go 1.7.5
  • bootstrap-debian is mostly absorbed into cockroachdb/builder
  • STATIC=1 is replaced with TYPE=release
  • a new Makefile option (type=static) produces fully static binaries by
    linking to musl instead of glibc
  • cockroach version now reports the build type (release/static)

This change is Reviewable

@petermattis
Copy link
Collaborator

Nice that you got this working, though I'm mildly anxious about merging this. We have a significant body of C++ code that uses libc/libpthread that I worry this will introduce instability. I'm open to hearing the other side if you think this concern is misplaced.

I think we should run a musl-based build on one or more of the test clusters for a decent amount of time. Can you pull out the non-contentious parts of this change into a separate PR?

@tamird tamird mentioned this pull request Jan 24, 2017
@a-robinson
Copy link
Contributor

So would this effectively switch all our release builds to use musl?

Like @petermattis, I'm very wary of switching over to musl for any sort of wide usage. Being able to support it is great, but how many other widely used projects default to it? We'll at the very least get a handful of questions about DNS resolver behavior, and likely more issues that'll take up our time. Perhaps we should talk about this tomorrow.

@bdarnell
Copy link
Member

I'm also wary of switching to musl all at once, although if we try to offer the two builds in parallel will we get a meaningful amount of testing of the musl build?

At a minimum a change like this should be sequenced after the rocksdb 5.0/DeleteRange upgrade.

We'll at the very least get a handful of questions about DNS resolver behavior

I think that if we get even a handful of questions about DNS then we should stick with a glibc build as our primary offering and offer a musl build only as an alternative for rhel/centos/alpine users. The theory behind switching is that musl has caught up with glibc's resolver in the ways that matter (the specific issue that caused problems for us last time we tried this, gliderlabs/docker-alpine#8, has been fixed) and deployments that need other glibc-specific features are rare enough that we can ask those users to build from source (note that this is speculation with no real data behind it. I'm much more concerned about RHEL users than users of alpine or other musl-based distros, so I'd be most comfortable with an outcome in which we ship builds for an old version of glibc instead of static musl-based builds. However, getting the combination of an old version of glibc with new versions of compilers and other dependencies appears to require quite a lot of yak shaving).


Reviewed 4 of 4 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 13 of 13 files at r6.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


Makefile, line 67 at r6 (raw file):

# and link against it instead.
TAGS += musl
LDFLAGS += -extldflags '-static'

Those quotes don't do anything, do they?


build/Dockerfile, line 20 at r6 (raw file):

    bzip2 \
    curl \
    fontconfig \

While you're adding comments, why do we need fontconfig?


scripts/gceworker.sh, line 25 at r6 (raw file):

    sleep 20 # avoid SSH timeout on copy-files

    gcloud compute copy-files "$(dirname "${0}")" "${name}:scripts"

Both gceworker and azworker need to be updated to copy the parallel-build patches from their new locations.


Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented Jan 24, 2017

Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


Makefile, line 67 at r6 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Those quotes don't do anything, do they?

Probably not; removed.


build/Dockerfile, line 20 at r6 (raw file):
There's already a comment covering that

bzip2 and fontconfig are used by phantomjs-prebuilt to test the UI.


scripts/gceworker.sh, line 25 at r6 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Both gceworker and azworker need to be updated to copy the parallel-build patches from their new locations.

Instead, could we just bake everything we need into the builder image? Unlike before, the builder now has a go toolchain which includes the parallel-build patches.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 10 of 11 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


build/Dockerfile, line 1 at r7 (raw file):

FROM alpine:3.5

Perhaps this should be a Dockerfile-musl along with a separate builder image. Seems like a few more additions to this PR and you could make the musl builds an option so we could get this PR merged without changing the existing static builds.


scripts/gceworker.sh, line 25 at r6 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Instead, could we just bake everything we need into the builder image? Unlike before, the builder now has a go toolchain which includes the parallel-build patches.

And use the builder image on the {gce,az}worker? I don't think we have docker set up on the worker images, and using docker in that fashion feels like it would be a pain.


Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented Jan 24, 2017

Review status: 10 of 11 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


build/Dockerfile, line 1 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Perhaps this should be a Dockerfile-musl along with a separate builder image. Seems like a few more additions to this PR and you could make the musl builds an option so we could get this PR merged without changing the existing static builds.

Yeah, I can do that. I'd rather not, but I will.


scripts/gceworker.sh, line 25 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

And use the builder image on the {gce,az}worker? I don't think we have docker set up on the worker images, and using docker in that fashion feels like it would be a pain.

We definitely have docker on the workers; see bootstrap-debian. I used that workflow heavily in preparing this PR.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 10 of 11 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


build/Dockerfile, line 1 at r7 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Yeah, I can do that. I'd rather not, but I will.

Well, the alternative is to hold off on merging this PR for a while.


Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented Jan 30, 2017

This has run into an unexpected snag; because TC workers have some cached build artifacts that have been linked against debian:jessie's glibc, building CR with this new image causes missing symbol errors when those artifacts are linked with ubuntu:xenial's glibc. Normally, the solution to this would be to use a new installsuffix, but our TC workers are also currently suffering from disk space issues which would be exacerbated by the addition of a new installsuffix.

Also, we still need a reasonable way of toggling between musl and glibc distributions.

@tamird
Copy link
Contributor Author

tamird commented Feb 1, 2017

This PR needs a new title and description, but is otherwise ready to be re-reviewed. The new approach produces binaries dynamically linked to glibc (but otherwise static), with a separate make option to produce fully static binaries with musl.

@bdarnell @petermattis @mberhault

@bdarnell
Copy link
Member

bdarnell commented Feb 1, 2017

Reviewed 2 of 13 files at r6, 2 of 15 files at r8, 2 of 9 files at r9, 19 of 19 files at r10.
Review status: all files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.


Makefile, line 23 at r10 (raw file):

# Allow setting of go build flags from the command line.
GOFLAGS :=
# Possible values:

Comment that both RELEASE and STATIC builds only work within the builder image. Comments about why these particular versions were chosen (even if it's just "these are the oldest versions available with the tools we're using") would probably be helpful the next time we change any of this.

Nit: I slightly prefer lowercase names for the values of TYPE.


Makefile, line 66 at r10 (raw file):

export CC  = /x-tools/x86_64-unknown-linux-gnu/bin/x86_64-unknown-linux-gnu-gcc
export CXX = /x-tools/x86_64-unknown-linux-gnu/bin/x86_64-unknown-linux-gnu-g++
override LDFLAGS += -extldflags "-static-libgcc -static-libstdc++"

We should probably also set a -X github.com/cockroachdb/cockroach/pkg/build.type=release flag so that the build type can show up in cockroach version.


Makefile, line 67 at r10 (raw file):

export CXX = /x-tools/x86_64-unknown-linux-gnu/bin/x86_64-unknown-linux-gnu-g++
override LDFLAGS += -extldflags "-static-libgcc -static-libstdc++"
override GOFLAGS += -installsuffix glibc

Nit: release instead of glibc?


build/builder.sh, line 115 at r10 (raw file):

vols="${vols} --volume=${gopath0}/pkg/docker_amd64:/go/pkg/linux_amd64"
vols="${vols} --volume=${gopath0}/pkg/docker_amd64_glibc:/go/pkg/linux_amd64_glibc"
vols="${vols} --volume=${gopath0}/pkg/docker_amd64_gmusl:/go/pkg/linux_amd64_musl"

Is gmusl a typo?


build/Dockerfile, line 78 at r10 (raw file):

 && gpg --list-keys \
 && gpg --recv-keys 35B871D1 \
 && gpg --verify crosstool-ng-1.22.0.tar.bz2.sig \

Yay!


build/Dockerfile, line 98 at r10 (raw file):

RUN groupadd -r roachbuild && useradd -m -r -g roachbuild roachbuild
RUN mkdir /x-tools && chown roachbuild:roachbuild /x-tools
USER roachbuild

Why make a new user just for this part?


build/Dockerfile, line 110 at r10 (raw file):

USER root

RUN apt-get autoremove -y g++

Why do we need this?


build/Dockerfile, line 118 at r10 (raw file):

 && mkdir llvm/projects/libcxx    && curl -sfSL http://releases.llvm.org/3.9.1/libcxx-3.9.1.src.tar.xz    | tar --strip-components=1 -C llvm/projects/libcxx -xJ \
 && mkdir llvm/projects/libcxxabi && curl -sfSL http://releases.llvm.org/3.9.1/libcxxabi-3.9.1.src.tar.xz | tar --strip-components=1 -C llvm/projects/libcxxabi -xJ \
 && mkdir libcxx_msan && (cd libcxx_msan && cmake ../llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_USE_SANITIZER=Memory && make cxx -j16)

-j$(nproc)


build/Dockerfile, line 123 at r10 (raw file):

COPY parallelbuilds-go1.7.patch /
RUN curl -fsSL https://golang.org/dl/go1.7.5.src.tar.gz -o golang.tar.gz \

Mention the upgrade to Go 1.7.5 in the commit message.


scripts/azworker.sh, line 14 at r10 (raw file):

USER="${USER-$(id -un)}"
CLUSTER="azworker-${USER}"
NAME=${AZWORKER_NAME-${CLUSTER}-azworker}

Remove the -azworker; it's already included in ${CLUSTER}


scripts/with-msan, line 9 at r10 (raw file):

#
# MemorySanitizer is only available on linux; this script assumes the machine has
# been set up with bootstrap-debian.sh.

Change this comment to say that it assumes it's running in the builder container.


scripts/with-msan, line 12 at r10 (raw file):


export CC=clang
export CXX=clang++

With these lines gone, isn't it using gcc instead of clang?


Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented Feb 1, 2017

Review status: all files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.


Makefile, line 23 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Comment that both RELEASE and STATIC builds only work within the builder image. Comments about why these particular versions were chosen (even if it's just "these are the oldest versions available with the tools we're using") would probably be helpful the next time we change any of this.

Nit: I slightly prefer lowercase names for the values of TYPE.

Done.


Makefile, line 66 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We should probably also set a -X github.com/cockroachdb/cockroach/pkg/build.type=release flag so that the build type can show up in cockroach version.

Done.


Makefile, line 67 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Nit: release instead of glibc?

Done.


build/builder.sh, line 115 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Is gmusl a typo?

Done.


build/Dockerfile, line 98 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why make a new user just for this part?

crosstool-ng/crosstool-ng@1cabb74

Added a comment.


build/Dockerfile, line 110 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why do we need this?

To avoid having two toolchains (clang is also present). As mentioned in the mega comment above, g++ is for bootstrapping the cross compilation toolchains.


build/Dockerfile, line 118 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

-j$(nproc)

Done.


build/Dockerfile, line 123 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Mention the upgrade to Go 1.7.5 in the commit message.

Done (will squash it into the resulting commit).


scripts/azworker.sh, line 14 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Remove the -azworker; it's already included in ${CLUSTER}

Done.


scripts/with-msan, line 9 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Change this comment to say that it assumes it's running in the builder container.

Done.


scripts/with-msan, line 12 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

With these lines gone, isn't it using gcc instead of clang?

gcc is no longer available in the build container, so clang is used instead.

I'd consider going the other way - restoring these lines and defaulting the build container to target glibc 2.12, but the cross toolchain is terribly slow for some reason.

Defaulting to the glibc 2.12 toolchain also has the benefit of producing correct ./configure results when updating c-* projects. Perhaps we should change to that until slow compilation becomes a problem. What do you think?


Comments from Reviewable

@bdarnell
Copy link
Member

bdarnell commented Feb 1, 2017

Reviewed 2 of 13 files at r6, 1 of 9 files at r9, 2 of 19 files at r10, 1 of 19 files at r11, 2 of 7 files at r12, 9 of 17 files at r13, 15 of 15 files at r14.
Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


build/Dockerfile, line 110 at r10 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

To avoid having two toolchains (clang is also present). As mentioned in the mega comment above, g++ is for bootstrapping the cross compilation toolchains.

Oh, I see. So release/musl builds differ from the default configuration in which compiler they use, not just which version of libc they target. I'm wary of switching the default from gcc to clang like this.


scripts/with-msan, line 12 at r10 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

gcc is no longer available in the build container, so clang is used instead.

I'd consider going the other way - restoring these lines and defaulting the build container to target glibc 2.12, but the cross toolchain is terribly slow for some reason.

Defaulting to the glibc 2.12 toolchain also has the benefit of producing correct ./configure results when updating c-* projects. Perhaps we should change to that until slow compilation becomes a problem. What do you think?

Ah. Probably still good to be explicit about the compiler, though.

How slow is "terribly slow"? We go out of our way to speed up the builds for {gce,az}workers with the parallel cgo patch, so we don't want to slow things down too much. If the crosstool gcc is significantly slower than the one we get from apt, then I'd rather have that available too for default builds (even though it is a fourth toolchain and a third build of gcc).


Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented Feb 1, 2017

Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


build/Dockerfile, line 110 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Oh, I see. So release/musl builds differ from the default configuration in which compiler they use, not just which version of libc they target. I'm wary of switching the default from gcc to clang like this.

What's the concern? We're already shipping OSX binaries with clang.


scripts/with-msan, line 12 at r10 (raw file):

Ah. Probably still good to be explicit about the compiler, though.

Sure, done.

How slow is "terribly slow"? We go out of our way to speed up the builds for {gce,az}workers with the parallel cgo patch, so we don't want to slow things down too much. If the crosstool gcc is significantly slower than the one we get from apt, then I'd rather have that available too for default builds (even though it is a fourth toolchain and a third build of gcc).

Well, it is available (though it's currently clang, not gcc), but the question is should we use it by default in the build container; doing so requires special handling for default builds; not doing so requires special handling for dependency upgrades.


Comments from Reviewable

@bdarnell
Copy link
Member

bdarnell commented Feb 1, 2017

Review status: 23 of 24 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


build/Dockerfile, line 110 at r10 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

What's the concern? We're already shipping OSX binaries with clang.

Just fear of change, and of differences between regular and release builds. The fact that jemalloc has options referring to "libgcc" and "gcc intrinsics" give me pause (both of these options at least compile with clang, but do they work? I don't think we've tried heap profiling on a clang/linux build).


scripts/with-msan, line 12 at r10 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Ah. Probably still good to be explicit about the compiler, though.

Sure, done.

How slow is "terribly slow"? We go out of our way to speed up the builds for {gce,az}workers with the parallel cgo patch, so we don't want to slow things down too much. If the crosstool gcc is significantly slower than the one we get from apt, then I'd rather have that available too for default builds (even though it is a fourth toolchain and a third build of gcc).

Well, it is available (though it's currently clang, not gcc), but the question is should we use it by default in the build container; doing so requires special handling for default builds; not doing so requires special handling for dependency upgrades.

What do you mean by special handling here? I'd prefer to minimize the difference between the default and release builds (ideally, the only difference would be the use of static vs dynamic linking)


Comments from Reviewable

@bdarnell
Copy link
Member

bdarnell commented Feb 2, 2017

Review status: 20 of 25 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


build/x86-linux-gnu.config, line 385 at r15 (raw file):

CT_CC_GCC_HAS_LIBQUADMATH=y
CT_CC_GCC_HAS_LIBSANITIZER=y
CT_CC_GCC_VERSION="4.9.3"

Wait, so this is using gcc 4.9.3? How does that work? Our contributing.md file says that we don't support gcc (more specifically, g++) prior to 6.0.


Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented Feb 2, 2017

Review status: 20 of 25 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


build/Dockerfile, line 110 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Just fear of change, and of differences between regular and release builds. The fact that jemalloc has options referring to "libgcc" and "gcc intrinsics" give me pause (both of these options at least compile with clang, but do they work? I don't think we've tried heap profiling on a clang/linux build).

OK; I think we agree that clang should be for msan only, and we should use the release toolchain in all other cases.


build/x86-linux-gnu.config, line 385 at r15 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Wait, so this is using gcc 4.9.3? How does that work? Our contributing.md file says that we don't support gcc (more specifically, g++) prior to 6.0.

4.9.3 is "known good" - see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48891


scripts/with-msan, line 12 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What do you mean by special handling here? I'd prefer to minimize the difference between the default and release builds (ideally, the only difference would be the use of static vs dynamic linking)

Yeah, I agree. The only problem is that the cross toolchain is terrible (empirically) slow. Perhaps I'm building an unoptimized gcc? Ideas welcome, but I'll dig more this afternoon.


Comments from Reviewable

@tamird tamird changed the title build: revive fully static builds by using alpine build: support RHEL/CentOS 6 Feb 2, 2017
@tamird
Copy link
Contributor Author

tamird commented Feb 2, 2017

FYI the builder image is now 1.864 GB :(

Set `GOCACHEPATH` to mount somewhere other than GOPATH.
- cockroachdb/builder is now based on ubuntu xenial
- cockroachdb/builder targets RHEL/CentOS 6 versions of linux and glibc
- cockroachdb/builder is upgraded to Go 1.7.5
- bootstrap-debian is mostly absorbed into cockroachdb/builder
- STATIC=1 is replaced with TYPE=release
- a new Makefile option (type=static) produces fully static binaries by
  linking to musl instead of glibc
- `cockroach version` now reports the build type (release/static)
@tamird
Copy link
Contributor Author

tamird commented Feb 3, 2017

Still looking for an LGTM here. @bdarnell?

@tamird
Copy link
Contributor Author

tamird commented Feb 3, 2017

Review status: 0 of 28 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


build/Dockerfile, line 1 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Well, the alternative is to hold off on merging this PR for a while.

By the way, this is now implemented.


Comments from Reviewable

@bdarnell
Copy link
Member

bdarnell commented Feb 3, 2017

:lgtm:


Reviewed 2 of 13 files at r6, 1 of 9 files at r9, 2 of 19 files at r10, 1 of 7 files at r12, 2 of 17 files at r13, 5 of 5 files at r15, 1 of 1 files at r16, 3 of 3 files at r17, 1 of 1 files at r18, 31 of 31 files at r20.
Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


Comments from Reviewable

@tamird tamird merged commit f56f38b into cockroachdb:master Feb 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants