Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#9 Use a C* build from the source as the target cluster image #34

Merged
merged 12 commits into from
Oct 28, 2020

Conversation

grighetto
Copy link
Collaborator

@grighetto grighetto commented Oct 22, 2020

┆Issue is synchronized with this Asana task by Unito

Copy link
Collaborator

@absurdfarce absurdfarce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable for what it does, but I have some serious concerns about bringing all this code into the code base rather than referencing external sources. I think I have an even larger concern about the increase in work that we're now asking the k8s cluster to do; we're now adding logic to download, build and package C* into Docker containers. This cost is paid every time we run the Helm chart (unless there's a bypass or some kind of opt-out logic I missed). I'm very concerned that this general approach just won't scale over time as more and more features are added.

args:
- git clone https://github.com/apache/cassandra.git cassandra;
cd cassandra;
git checkout {{`{{inputs.parameters.commit_hash}}`}};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in the case of, say, "cassandra-4.0-beta2" (which looks to be the default given values.yaml) this represents a tag rather than a commit hash. "tree-ish" is probably the right name here (since that's ultimately what's being passed to "git checkout") but that name... isn't great. This name could be a branch, it could be a tag, or it could indeed be a commit. Maybe... "git_identifier" or something?

That said, the term "tree-ish" is reasonably well understood, at least within the git community. So maybe we just go with that... ?

I guess I lean a bit more towards "git_identifier"... but I could be swayed either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@absurdfarce I gave this some more thought as well, I wasn't really happy with "hash" either for the same reasons you mentioned.
Tree-ish is a bit confusing to me actually (it makes me think of a path) even though that's what I see in the "git checkout" command help. Strictly speaking, it seems like "commit" and "tree" are two different object types in git: https://matthew-brett.github.io/curious-git/git_object_types.html.

Having said that, I'm also leaning towards "git_identifier" or maybe even "git_checkout" to be self-explanatory (either way, this will be documented).
Let me know what you think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think "git_identifier" is the way to go here. We're talking about something in the git log so "checkout" doesn't really seem right since that's not really what the user is specifying with that value. That value specifies a tag or a specific commit, something like that... and "identifier" seems general enough to account for all of those options.

name: cassandra-dockerfile-configmap
namespace: {{ .Values.namespace }}
data:
Dockerfile: |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remainder of this file represents an implicit import of the logic in https://github.com/docker-library/cassandra/blob/b6554329fe112243d16861b441067227eedcbdf9/4.0/Dockerfile and https://github.com/docker-library/cassandra/blob/c03e9828c699b8c22bcb17f82356ae90123d1d10/4.0/docker-entrypoint.sh into the project. I'm not sure this is ideal for the following reasons:

  • Fixes/enhancements to these files over time won't be automatically picked up and re-used here
  • These files aren't easily identifiable in the context in which they serve, which makes them harder to find and review. If we had a file named "Dockerfile" in a directory named "cassandra-4.0" the intent is pretty clear, but as it stands we have to get all of this from a config map.

Is there some reason we can't grab these files directly (from github, via git, or via some other mechanism) and then build the YAML programmatically to include them here? I'm just concerned we're seeing ourselves up for a maintenance nightmare by bringing this stuff in in this way.

Copy link
Collaborator Author

@grighetto grighetto Oct 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Dockerfile in Adelphi is a modified version from the one you linked in docker-library. The original own downloads a prebuilt tarball from the Apache site, while ours copies it from the volume where we buid C* from the source.

Usually these files are "templates" that we modify to our needs, the DataStax Management API follows a similar approach: https://github.com/datastax/management-api-for-apache-cassandra/blob/master/upstream-4.0/Dockerfile

Adelphi is all about building images and orchestrating containers, so I think it makes sense to have those things under our control instead of having it change under our feet.
Also, that Dockerfile is pretty straightforward and looks stable, it hasn't changed much in docker-library since older versions of C*.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example of this approach are the CRDs manifests for C* Operator and Argo.
We can't import them as subcharts in Helm (because they're CRDs) and they are also tweaked for changing RoleBindings and stuff like that. The recommendation I heard from Jim Dickinson (from the C* Operator project) is that these files are mostly templates that have to be modified depending on one's needs.

@@ -1,19 +1,44 @@
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: workflow-nosqlbench-cassdiff-{{ .Release.Revision }}
name: workflow-adelphi-{{ .Release.Revision }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

args:
- --dockerfile=/dockerfile/Dockerfile
- --context=dir:///workspace
- {{`--destination={{inputs.parameters.registry_ip}}:30000/cassandra-quality/cassandra`}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want this to be "adelphi" rather than "cassandra-quality"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the follow-up PR #35

args:
- --dockerfile=/generated/Dockerfile
- --context=dir:///build
- {{`--destination={{inputs.parameters.registry_ip}}:30000/cassandra-quality/management-api-for-apache-cassandra`}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: s/cassandra-quality/adelphi/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the follow-up PR #35

@grighetto
Copy link
Collaborator Author

grighetto commented Oct 25, 2020

I have some serious concerns about bringing all this code into the code base rather than referencing external sources

I tried to address that concern in the comments above, but simply put the idea is that IMO Adelphi should own the image building process and we can't use the referenced resources directly because they do other things that we don't want to do. We have modified versions.

I think I have an even larger concern about the increase in work that we're now asking the k8s cluster to do; we're now adding logic to download, build and package C* into Docker containers. This cost is paid every time we run the Helm chart

For serious testing, most people will run this on some cloud environment. In my GKE test, it took 4 minutes to build C* from the source and 3 minutes to build the C* Management API, I think that's mostly negligible for workload generation tests that will run for maybe 30min on each anonymized schema (so it could be a few hours total in CI).

The image has to come from somewhere before running the workflow and it has to be built from the source since we'll want to test trunk or a dev branch. In a follow-up work we can let the user specify a pre-built tarball for local testing (let's investigate that in #32), but I still think there's value in building C* in the k8s cluster for real testing for the following reasons:
1- it guarantees reproducibility (we know what's inside the tarball and we'll have a git hash)
2- makes it easy to test with different JDK versions - Adelphi should be able to test C* compiled with JDK 8 and run with both JDK 8 and 11; and compiled and run with JDK 11.
3- the management of the custom image is transparent, otherwise each cloud vendor has its way to publish an image. Also, the image is destroyed when the k8s cluster is destroyed, we don't have to worry about cleaning things up.
4- we make no assumptions about the user environment. We want this to be as simple as possible to execute, if everything is self-contained inside the workflow/k8s, we don't have to worry about the user's operating system, git, java version, ant, etc.

@absurdfarce Let me know if that clarifies things a bit or if you still have any questions or suggestions for improvement.

…r-overriding-cass-version

# Conflicts:
#	helm/adelphi/templates/spark-image-builder.yaml
@grighetto
Copy link
Collaborator Author

@absurdfarce I just fixed a conflict with the base branch (it turns out I had also added that same -DskipTests option at the same line you did).
Please, let me know if you have any other feedback, otherwise we should merge this.

Copy link
Collaborator

@absurdfarce absurdfarce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing this with @grighetto recently the plan is to move towards Docker images (where possible) for C* versions falling back to source builds only when necessary. Approving this PR to get source builds in place; we'll add the logic around using the image (if available) later.

@grighetto
Copy link
Collaborator Author

Thanks @absurdfarce.

For future reference, these are the follow-up improvement tickets:

#32
#39

@grighetto grighetto merged commit 664e0c6 into master Oct 28, 2020
@grighetto grighetto linked an issue Nov 18, 2020 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create new sidecar image to override C* version
2 participants