Skip to content

Update Erlang version + some other modernizations#90

Merged
kocolosk merged 2 commits intomainfrom
packaging-updates
Feb 10, 2022
Merged

Update Erlang version + some other modernizations#90
kocolosk merged 2 commits intomainfrom
packaging-updates

Conversation

@kocolosk
Copy link
Member

@kocolosk kocolosk commented Feb 7, 2022

Overview

This PR uses the new multi-architecture container images that we have in CI, relying on the --platform flag to select the appropriate architecture at runtime. It also upgrades to Debian 11 for the base OS that we use for multi-platform packaging by default (and the one we use for Docker images), and bumps the underlying Erlang version to Erlang 23.

Updating to Erlang 23 should fix #86 .

I'm planning to upload new packages that incorporate these changes and bump the package version number / release number appropriately. We don't currently have packaging version numbers parameterized in the code so I figured I'd just create a "3.2.1" branch and a 3.2.1-1 tag with the package numbers bumped off the side of the main branch for now.

The bugs seem to have been worked out here, so we can use multi-platform
container images like the ones we've generated for CI and still select
the runtime architecture we want if we need to build packages via QEMU.
docker run \
--mount type=bind,src=${SCRIPTPATH},dst=/home/jenkins/couchdb-pkg \
-u 0 -w /home/jenkins/couchdb-pkg \
--platform linux/amd64 \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omitting this seems to cause Docker to pick up whatever image it might have in the cache and run it, even if it's not the native architecture.

XPLAT_BASE="debian-buster"
XPLAT_ARCHES="arm64v8 ppc64le"
XPLAT_BASE="debian-bullseye"
XPLAT_ARCHES="arm64 ppc64le"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be able to revert this and change the architecture name in the underlying image, e.g. https://hub.docker.com/layers/apache/couchdbci-debian/bullseye-erlang-23/images/sha256-d72059db579e0497a44076c22ef3c32755ad49f0c2766c78f7550fff36e9fb4f?context=explore. But it's also not clear to me what the "right" answer is, e.g. I see lots of "arm64/v8" in Docker Hub. Regardless, it seems to be recognized as an ARM image just ine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This StackOverflow answer provides some extra detail:

https://stackoverflow.com/a/70889505/4797770

It looks like containerd will happily normalize arm64/v8 as arm64, but arm64v8 is treated as a separate architecture entirely.

I also found that while we would use arm64v8 in the name of the single-arch images that we had been publishing, the architecture was actually labeled as arm64.

Bottom line: because we are identifying the image to use via the --platform flag instead of the prefix in the single-arch image name, we need to use arm64 or arm64/v8 and not arm64v8 here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks for digging in and and figuring it out

@nickva
Copy link
Contributor

nickva commented Feb 7, 2022

We don't currently have packaging version numbers parameterized in the code

Good point that we don't have package / docker specific versioning. Something like 3.2.1-1 might work. Technically, the Docker image might have it's own -1, if we updated any docker specific things (like here apache/couchdb-docker#215 (comment)), and the binary packages might get their own if, say, we bumped Erlang versions, or updated pre/post install scripts. The RPMs could have a -1 but DEBs might not.

Noticed Ubuntu have a similar issue for their package versioning, and they have a scheme like: A.B.C-XubuntuY with A.B.C is the base version of the package, X is the Debian version and Y the Ubuntu specific one.

... create a "3.2.1" branch and a 3.2.1-1 tag with the package numbers bumped off the side of the main branch for now.

Just to clarify, that would be 3.2.1 branch off of the 3.2.1 tag not off of the current 3.x? Something like:

3.x
   ...
   3.2.1 [branch][tag]
      ...
      3.2.1-1 [tag]

I hope at least after apache/couchdb#3889 is reviewed and merges, and with all improvement like the sharded index server, replicator target re-creation handling, and change feed rewind fixes, we could probably have a good reason to roll out a 3.3 as well.

@kocolosk
Copy link
Member Author

kocolosk commented Feb 7, 2022

Hi @nickva, yes, I hacked in metadata updates at d59bd6c, I only meant that we don't allow to drive the package versioning via environment variables currently. Convention seems to be that Debian packages can have an optional version 0, while RPMs start at release 1, so the packages I have ready to publish are e.g.

couchdb_3.2.1-1~bullseye_amd64.deb

and

couchdb-3.2.1-2.el7.x86_64.rpm

and

couchdb_3.2.1-1~bionic_amd64.deb

That last one doesn't seem to quite follow the Ubuntu scheme exactly but maybe that's OK.

Yes, I agree with the notion of being able to publish multiple package versions for a given CouchDB release, e.g. with updated Erlang builds. We haven't relied on this much to date.

I'm not as familiar with specific guidance on the Docker side. It's complicated by the fact that a CI pipeline rebuilding images on a regular basis e.g. to pick up security updates in the underlying OS could have significantly more frequent updates. A 1:1 mapping between the container image tag and the CouchDB package could make sense, although you'd run into a little bit of friction because I'd expect you also want to tag an updated 3.2.1-1 package with the 3.2.1 tag, since unlike apt and yum Docker won't automatically pull the 3.2.1-1 image for you when you ask for 3.2.1.

Just to clarify, that would be 3.2.1 branch off of the 3.2.1 tag not off of the current 3.x? Something like:

Yes, exactly.

@kocolosk
Copy link
Member Author

kocolosk commented Feb 7, 2022

And by the way I was referring to the couchdb-pkg repo as the location for the 3.2.1 branch and 3.2.1-1 tag, not the couchdb repo itself, in case that wasn't clear.

@nickva
Copy link
Contributor

nickva commented Feb 7, 2022

And by the way I was referring to the couchdb-pkg repo as the location for the 3.2.1 branch and 3.2.1-1 tag, not the couchdb repo itself, in case that wasn't clear.

You're right, I did get confused there. Thanks for clarifying.

Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks good - creating a 3.2.1 branch and a 3.2.1-1

The one thing I am worried a bit is bumping Erlang from 20 to 23 for the 3.2.1 release.

There is the case of mixed clusters during upgrade, where some nodes are on 20 and some will be 23. From what I remember, the OTP team guarantees only up to 2 version of backward compatibility. (Though they also wouldn't go out of their way to break it on purpose, I imagine). So everything might work perfectly, but unlike the previous major upgrade from 17 to 20, we've haven't tested it in production at Cloudant yet. Initial experiments in a test environment hasn't revealed anything worrying so far. So, maybe that's good enough?

Another aspect is the different performance characteristics resulting from jumping 3 version ahead. Hopefully everything would be a bit faster. However, when we upgraded from 17 to 20, the upgrade wasn't an automatic performance win. We had to disable some performance gathering metrics and tweak other OTP build options to get up to par with 17. One specific update I can think of for the current jump is the re-written disk I/O subsystem in 21. Again, ideally it should be faster and better, but I don't have any data to support and am slightly worried if it had any regressions.

@kocolosk
Copy link
Member Author

kocolosk commented Feb 8, 2022

https://stackoverflow.com/questions/65071806/how-do-i-run-a-cluster-of-nodes-with-both-otp-23-1-and-r15b03-1 goes into a bit of detail on the distribution protocol compatibility and doesn't indicate any issues for a mixed 20/23 cluster, although admittedly they just checked connectivity and not much in the way of richer RPCs.

I'm not sure what to do about the performance QA piece. We do need to upgrade; Erlang 20 hasn't seen a patch release in nearly two years and has known bugs that are preventing replication with one of the more popular CouchDB-compatible endpoints out there. I suspect our user community would be better off running newer releases instead of waiting for Cloudant to successfully upgrade its estate, but demonstrating that with any level of rigor is not a small project.

If you're saying, "pin each CouchDB version to a single Erlang major version for convenience packages", I do think that makes some sense. Even if we did adopt that policy (which I floated on dev@), one counter I have in this case is that CouchDB 3.2.1 is kinda/sorta not fully released yet from a packaging perspective, e.g. we never published any blog post on it or updated the homepage.

@nickva
Copy link
Contributor

nickva commented Feb 8, 2022

If you're saying, "pin each CouchDB version to a single Erlang major version for convenience packages", I do think that makes some sense. Even if we did adopt that policy (which I floated on dev@), one counter I have in this case is that CouchDB 3.2.1 is kinda/sorta not fully released yet from a packaging perspective, e.g. we never published any blog post on it or updated the homepage.

I guess, or we'd decide release by release based on befits vs risk. I was thinking from the point of view of a user who updated to 3.2.1-0 and setup their repo to auto-upgrade then, they do to 3.2.1-1 but now their they might get slightly different performance characteristics or even breakage. However, in this case, like we saw there are already serious breakages so I after thinking more on it, we're probably fine updating.

I suspect our user community would be better off running newer releases instead of waiting for Cloudant to successfully upgrade its estate, but demonstrating that with any level of rigor is not a small project.

I guess going by the point that we haven't actually officially announced 3.2.1 and only published the binary package (my fault for jumping the gun there), I see just going with 23 and updating. It strengthens the case for that at least.

It looks like Cloudant is in the testing process and running benchmarks as we speak, we can always return back and make another package and pick another version or update some default vm.args flags or something.

Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion and deliberating in the comments I think it would be ok to go ahead

+1 from me at least

@kocolosk kocolosk merged commit 7f1e10a into main Feb 10, 2022
@kocolosk kocolosk deleted the packaging-updates branch February 10, 2022 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CouchDB failed to replicate over TLS due to outdated Erlang SSL version in package

2 participants