-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kong] revamp database wait init containers #285
Conversation
Add a new waitForDatabase value, default false. When disabled, wait-for-db and wait-for-postgres init containers do not run. When using the Postgres subchart, Kong Pods will crash loop for a while before Postgres comes online, but should still eventually enter Running. This change provides compatibility with clusters that use a mesh proxy to authenticate requests to the database.
Default to using the Kong image for migration job database check init containers. The standard Kong images provide bash and can run our bash-based check.
When we added jobAnnotations, the initial migrations job didn't have an existing annotations block, and we forgot to add it. Annotations were unintentionally placed in the labels block.
Prune some outdated Cassandra information. We haven't supported a Cassandra sub-chart in quite some time (0.26.0 in Nov 2019) and have no plans to re-add it. I don't see any reason to call out this old functionality in the README still. Break out the Postgres sub-chart info into a new subsection and flesh it out with some additional info about what it is, how to enable it, and what it does. Add our recommendation not to use it outside temporary environments.
Kuma supports Jobs since 1.0.7, therefore no need to disable the sidecar injection for it. Signed-off-by: Nikolay Nikolaev <nikolay.nikolaev@konghq.com>
Flip the database wait toggle to enable it by default. We want to preserve the existing init wait behavior for most users, while still providing a viable configuration for mesh users. As this default may not work for them, add an FAQ describing the issue, remediation, and side effects.
charts/kong/templates/_helpers.tpl
Outdated
@@ -365,10 +365,12 @@ The name of the service used for the ingress controller's validation webhook | |||
emptyDir: {} | |||
- name: {{ template "kong.fullname" . }}-tmp | |||
emptyDir: {} | |||
{{- if (and (eq .Values.env.database "postgres") .Values.waitImage.enabled) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic, as I understand it, is about "database being present", not about "database being of type postgres". Therefore, != off
rather than == "postgres"
feels more appropriate to me. Please change the logic to the former unless there's a compelling reason not to do so.
(applies to other similar checks "if database is postgres" in this PR as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a reason it's like this, though the check is a bit off. These waits, for the initial migrations, are only useful if Postgres being online or not is influenced by the chart lifecycle. This is only true if you use the sub-chart and Postgres spawns alongside the migration Jobs.
If you manage Postgres or Cassandra independently (all Cassandra instances are managed independently--we have no sub-chart for it), the chart lifecycle has no influence on whether the database is available or not. It either is, and migrations can start immediately, or it isn't, and they cannot. No amount of waiting helps if an independently-managed database isn't online; it makes sense to just let the failure happen immediately so users know they need to take action to make the database reachable. e5012be is more correct for these, since it changes the check from "any Postgres" to "only sub-chart Postgres".
The Deployment waits, however, are to see if migrations have completed. Those apply to any database and are always determined by the chart lifecycle. I did, however, copy the wrong logic when adding the waitImage.enabled
for them 🤦 e4b534e fixes that.
charts/kong/templates/_helpers.tpl
Outdated
{{- if .Values.waitImage.unifiedRepoTag }} | ||
image: "{{ .Values.waitImage.unifiedRepoTag }}" | ||
{{- else }} | ||
{{- else if .Values.waitImage.repository }} | ||
image: "{{ .Values.waitImage.repository }}:{{ .Values.waitImage.tag }}" | ||
{{- else }} {{/* default to the Kong image */}} | ||
{{- if .Values.image.unifiedRepoTag }} | ||
image: "{{ .Values.image.unifiedRepoTag }}" | ||
{{- else }} | ||
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
ladder seems to be adding some duplicated and rather complex logic. Could this be avoided by something similar to the below:
- implementing a helper template (called e.g.
getRepoTag
) that takes an "image" object as scope, and returns either.unifiedRepoTag
or a{{.repository}}:{{.tag}}
- using the
default
function, something like this:image: {{ default (include "getRepoTag" .Values.Image) (include "getRepoTag" .Values.Image) }}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good idea. This gets repeated in several locations and doesn't need to be. The tests for it were also incorrect. 3f64601
Check .Values.postgresql.enabled instead of .Values.env.database = "postgres" for migration job waits and associated volume mounts. These checks will now run only if the sub-chart is enabled, not for any release that uses Postgres. When Postgres is spawned by the sub-chart, it starts alongside migration jobs and needs some time to come online, so migrations cannot start immediately. If Postgres is not managed by this chart, it is either online or not when migrations happen, and no amount of waiting will allow it to come online. If it is not online already, the Postgres administrator will need to take manual action outside the chart lifecycle in order for migrations to proceed.
Co-authored-by: Michał Flendrich <michal@flendrich.pro>
eb56a7d
to
cf31f24
Compare
Replace per-template logic for selecting either repository+tag image or unifiedRepoTag with a helper. Fix CI values that used an outdated "image.single" value and were in fact using defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Before merging, please fix the error in the PR description (labels -> annotations).
Thank you to everyone involved in resolving this, seemingly small, yet blocking issue. |
@rainest Thanks for working through this multiple times! |
What this PR does / why we need it:
The main rationale for these changes is similar to the proposal to remove these waits entirely. Based on discussion in that PR, we favor this approach instead.
Special notes for your reviewer:
Supersedes #283 (similar goal, different approach) and #278 (cherry-picks that commit and rewords the title to follow this repo's format).
Checklist
[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]
next
branch and targetsnext
, notmain
[kong]
)