#9 Use a C* build from the source as the target cluster image #34

grighetto · 2020-10-22T02:45:33Z

┆Issue is synchronized with this Asana task by Unito

…fly based on a git hash.

…for overriding the C* image used by the Operator.

…nt API.

…C* image.

…r-overriding-cass-version

…some recent change upstream had broken the build); Added missing entry-point script.

…ion for the target cluster; Fixed the `jvm-server-options` param in the target CassandraDatacenter template.

absurdfarce

Seems reasonable for what it does, but I have some serious concerns about bringing all this code into the code base rather than referencing external sources. I think I have an even larger concern about the increase in work that we're now asking the k8s cluster to do; we're now adding logic to download, build and package C* into Docker containers. This cost is paid every time we run the Helm chart (unless there's a bypass or some kind of opt-out logic I missed). I'm very concerned that this general approach just won't scale over time as more and more features are added.

absurdfarce · 2020-10-25T04:08:08Z

helm/adelphi/templates/cassandra-image-builder.yaml

+      args:
+        - git clone https://github.com/apache/cassandra.git cassandra;
+          cd cassandra;
+          git checkout {{`{{inputs.parameters.commit_hash}}`}};


Nit: in the case of, say, "cassandra-4.0-beta2" (which looks to be the default given values.yaml) this represents a tag rather than a commit hash. "tree-ish" is probably the right name here (since that's ultimately what's being passed to "git checkout") but that name... isn't great. This name could be a branch, it could be a tag, or it could indeed be a commit. Maybe... "git_identifier" or something?

That said, the term "tree-ish" is reasonably well understood, at least within the git community. So maybe we just go with that... ?

I guess I lean a bit more towards "git_identifier"... but I could be swayed either way.

@absurdfarce I gave this some more thought as well, I wasn't really happy with "hash" either for the same reasons you mentioned.
Tree-ish is a bit confusing to me actually (it makes me think of a path) even though that's what I see in the "git checkout" command help. Strictly speaking, it seems like "commit" and "tree" are two different object types in git: https://matthew-brett.github.io/curious-git/git_object_types.html.

Having said that, I'm also leaning towards "git_identifier" or maybe even "git_checkout" to be self-explanatory (either way, this will be documented).
Let me know what you think.

I do think "git_identifier" is the way to go here. We're talking about something in the git log so "checkout" doesn't really seem right since that's not really what the user is specifying with that value. That value specifies a tag or a specific commit, something like that... and "identifier" seems general enough to account for all of those options.

absurdfarce · 2020-10-25T04:14:18Z

helm/adelphi/templates/cassandra-dockerfile-configmap.yaml

+  name: cassandra-dockerfile-configmap
+  namespace: {{ .Values.namespace }}
+data:
+  Dockerfile: |


The remainder of this file represents an implicit import of the logic in https://github.com/docker-library/cassandra/blob/b6554329fe112243d16861b441067227eedcbdf9/4.0/Dockerfile and https://github.com/docker-library/cassandra/blob/c03e9828c699b8c22bcb17f82356ae90123d1d10/4.0/docker-entrypoint.sh into the project. I'm not sure this is ideal for the following reasons:

Fixes/enhancements to these files over time won't be automatically picked up and re-used here

These files aren't easily identifiable in the context in which they serve, which makes them harder to find and review. If we had a file named "Dockerfile" in a directory named "cassandra-4.0" the intent is pretty clear, but as it stands we have to get all of this from a config map.

Is there some reason we can't grab these files directly (from github, via git, or via some other mechanism) and then build the YAML programmatically to include them here? I'm just concerned we're seeing ourselves up for a maintenance nightmare by bringing this stuff in in this way.

The Dockerfile in Adelphi is a modified version from the one you linked in docker-library. The original own downloads a prebuilt tarball from the Apache site, while ours copies it from the volume where we buid C* from the source.

Usually these files are "templates" that we modify to our needs, the DataStax Management API follows a similar approach: https://github.com/datastax/management-api-for-apache-cassandra/blob/master/upstream-4.0/Dockerfile

Adelphi is all about building images and orchestrating containers, so I think it makes sense to have those things under our control instead of having it change under our feet.
Also, that Dockerfile is pretty straightforward and looks stable, it hasn't changed much in docker-library since older versions of C*.

Another example of this approach are the CRDs manifests for C* Operator and Argo.
We can't import them as subcharts in Helm (because they're CRDs) and they are also tweaked for changing RoleBindings and stuff like that. The recommendation I heard from Jim Dickinson (from the C* Operator project) is that these files are mostly templates that have to be modified depending on one's needs.

absurdfarce · 2020-10-25T04:16:49Z

helm/adelphi/templates/workflow-adelphi.yaml

@@ -1,19 +1,44 @@
 apiVersion: argoproj.io/v1alpha1
 kind: Workflow
 metadata:
-  name: workflow-nosqlbench-cassdiff-{{ .Release.Revision }}
+  name: workflow-adelphi-{{ .Release.Revision }}


absurdfarce · 2020-10-25T04:18:38Z

helm/adelphi/templates/cassandra-image-builder.yaml

+      args:
+      - --dockerfile=/dockerfile/Dockerfile
+      - --context=dir:///workspace
+      - {{`--destination={{inputs.parameters.registry_ip}}:30000/cassandra-quality/cassandra`}}


Don't we want this to be "adelphi" rather than "cassandra-quality"?

Fixed in the follow-up PR #35

absurdfarce · 2020-10-25T04:19:04Z

helm/adelphi/templates/cassandra-mgmt-image-builder.yaml

+      args:
+      - --dockerfile=/generated/Dockerfile
+      - --context=dir:///build
+      - {{`--destination={{inputs.parameters.registry_ip}}:30000/cassandra-quality/management-api-for-apache-cassandra`}}


Same here: s/cassandra-quality/adelphi/

Fixed in the follow-up PR #35

grighetto · 2020-10-25T18:06:53Z

I have some serious concerns about bringing all this code into the code base rather than referencing external sources

I tried to address that concern in the comments above, but simply put the idea is that IMO Adelphi should own the image building process and we can't use the referenced resources directly because they do other things that we don't want to do. We have modified versions.

I think I have an even larger concern about the increase in work that we're now asking the k8s cluster to do; we're now adding logic to download, build and package C* into Docker containers. This cost is paid every time we run the Helm chart

For serious testing, most people will run this on some cloud environment. In my GKE test, it took 4 minutes to build C* from the source and 3 minutes to build the C* Management API, I think that's mostly negligible for workload generation tests that will run for maybe 30min on each anonymized schema (so it could be a few hours total in CI).

The image has to come from somewhere before running the workflow and it has to be built from the source since we'll want to test trunk or a dev branch. In a follow-up work we can let the user specify a pre-built tarball for local testing (let's investigate that in #32), but I still think there's value in building C* in the k8s cluster for real testing for the following reasons:
1- it guarantees reproducibility (we know what's inside the tarball and we'll have a git hash)
2- makes it easy to test with different JDK versions - Adelphi should be able to test C* compiled with JDK 8 and run with both JDK 8 and 11; and compiled and run with JDK 11.
3- the management of the custom image is transparent, otherwise each cloud vendor has its way to publish an image. Also, the image is destroyed when the k8s cluster is destroyed, we don't have to worry about cleaning things up.
4- we make no assumptions about the user environment. We want this to be as simple as possible to execute, if everything is self-contained inside the workflow/k8s, we don't have to worry about the user's operating system, git, java version, ant, etc.

@absurdfarce Let me know if that clarifies things a bit or if you still have any questions or suggestions for improvement.

…r-overriding-cass-version # Conflicts: # helm/adelphi/templates/spark-image-builder.yaml

grighetto · 2020-10-28T02:00:45Z

@absurdfarce I just fixed a conflict with the base branch (it turns out I had also added that same -DskipTests option at the same line you did).
Please, let me know if you have any other feedback, otherwise we should merge this.

absurdfarce

After discussing this with @grighetto recently the plan is to move towards Docker images (where possible) for C* versions falling back to source builds only when necessary. Approving this PR to get source builds in place; we'll add the logic around using the image (if available) later.

grighetto · 2020-10-28T17:42:53Z

Thanks @absurdfarce.

For future reference, these are the follow-up improvement tickets:

#32
#39

grighetto added 11 commits October 8, 2020 02:39

Templates for building a custom C* image from the source code on the …

27a00a6

…fly based on a git hash.

Fixed Cassandra build command.

e9178cf

Added templates for building the C* Management API image that allows …

1253433

…for overriding the C* image used by the Operator.

Dynamically add the registry URL in the Dockerfile of the C* Manageme…

7bd043a

…nt API.

Set serverImage in the target cluster definition to use the custom …

1bff67e

…C* image.

Added C* build steps to the main workflow.

d975c0e

Merge remote-tracking branch 'origin/master' into 9-custom-sidecar-fo…

f18ed93

…r-overriding-cass-version

Moved new templates to the adelphi folder after merging with master.

915001d

Renamed main workflow manifest.

d8ff60e

Set a static version for adoptopenjdk image in the C* Dockerimage (…

2d17af2

…some recent change upstream had broken the build); Added missing entry-point script.

#9 Build off cassandra-4.0-beta2 tag and use it as the default vers…

ee3b8be

…ion for the target cluster; Fixed the `jvm-server-options` param in the target CassandraDatacenter template.

grighetto requested a review from absurdfarce October 22, 2020 02:45

grighetto mentioned this pull request Oct 22, 2020

#6 Fixes for running on GKE #35

Merged

absurdfarce reviewed Oct 25, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into 9-custom-sidecar-fo…

8db1f3e

…r-overriding-cass-version # Conflicts: # helm/adelphi/templates/spark-image-builder.yaml

absurdfarce approved these changes Oct 28, 2020

View reviewed changes

grighetto merged commit 664e0c6 into master Oct 28, 2020

grighetto mentioned this pull request Oct 30, 2020

Rename commit_hash param to git_identifier #41

Closed

grighetto linked an issue Nov 18, 2020 that may be closed by this pull request

Create new sidecar image to override C* version #9

Closed

grighetto added the medium label Nov 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#9 Use a C* build from the source as the target cluster image #34

#9 Use a C* build from the source as the target cluster image #34

grighetto commented Oct 22, 2020 •

edited by sync-by-unito bot

Loading

absurdfarce left a comment

absurdfarce Oct 25, 2020

grighetto Oct 28, 2020

absurdfarce Oct 28, 2020

absurdfarce Oct 25, 2020

grighetto Oct 25, 2020 •

edited

Loading

grighetto Oct 25, 2020

absurdfarce Oct 25, 2020

absurdfarce Oct 25, 2020

grighetto Oct 25, 2020

absurdfarce Oct 25, 2020

grighetto Oct 25, 2020

grighetto commented Oct 25, 2020 •

edited

Loading

grighetto commented Oct 28, 2020

absurdfarce left a comment

grighetto commented Oct 28, 2020

#9 Use a C* build from the source as the target cluster image #34

#9 Use a C* build from the source as the target cluster image #34

Conversation

grighetto commented Oct 22, 2020 • edited by sync-by-unito bot Loading

absurdfarce left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grighetto Oct 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grighetto commented Oct 25, 2020 • edited Loading

grighetto commented Oct 28, 2020

absurdfarce left a comment

Choose a reason for hiding this comment

grighetto commented Oct 28, 2020

grighetto commented Oct 22, 2020 •

edited by sync-by-unito bot

Loading

grighetto Oct 25, 2020 •

edited

Loading

grighetto commented Oct 25, 2020 •

edited

Loading