fix(deployment): run migrations using init containers instead #325

migmartri · 2023-08-31T15:43:58Z

Instead of running the migration as a separate job, which had some issues #309, #271, we now will run the migration as an init container.

I also double checked that our backend, Postgres can handle safely concurrent migration requests from different replicasm using advisory locking as stated in Atlas upstream documentation

--- /tmp/before.yaml	2023-08-31 17:28:50.707396512 +0200
+++ /tmp/after.yaml	2023-08-31 17:31:46.650255138 +0200
@@ -763,6 +763,17 @@
       serviceAccountName: foo-chainloop-controlplane
       securityContext:
         null
+      initContainers:
+        - name: migrate
+          image: "ghcr.io/chainloop-dev/chainloop/control-plane-migrations:v0.16.2"
+          imagePullPolicy: 
+          args:
+            - migrate
+            - apply
+            - --url
+            - "postgres://chainloop:chainlooppwd@foo-postgresql:5432/chainloop-cp?sslmode=disable"
+            - --dir
+            - file:///migrations
       containers:
         
         - name: chainloop
@@ -1192,46 +1203,3 @@
       volumeMounts:
   volumes:
   restartPolicy: Never
----
-# Source: chainloop/templates/controlplane/migrate-job.yaml
-apiVersion: batch/v1
-kind: Job
-metadata:
-  name: foo-chainloop-controlplane-migrate
-  labels:
-    app.kubernetes.io/name: chainloop
-    helm.sh/chart: chainloop-1.13.0
-    app.kubernetes.io/instance: foo
-    app.kubernetes.io/managed-by: Helm
-    app.kubernetes.io/part-of: chainloop
-    app.kubernetes.io/component: controlplane-migration
-  annotations:
-    "helm.sh/hook": post-install,pre-upgrade
-    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
-spec:
-  template:
-    metadata:
-      labels:
-        app.kubernetes.io/name: chainloop
-        helm.sh/chart: chainloop-1.13.0
-        app.kubernetes.io/instance: foo
-        app.kubernetes.io/managed-by: Helm
-        app.kubernetes.io/part-of: chainloop
-        app.kubernetes.io/component: controlplane-migration
-    spec:
-      restartPolicy: OnFailure
-      serviceAccountName: foo-chainloop-controlplane
-      securityContext:
-        null
-      containers:
-        
-        - name: migrate
-          image: "ghcr.io/chainloop-dev/chainloop/control-plane-migrations:v0.16.2"
-          imagePullPolicy: 
-          args:
-            - migrate
-            - apply
-            - --url
-            - "postgres://chainloop:chainlooppwd@foo-postgresql:5432/chainloop-cp?sslmode=disable"
-            - --dir
-            - file:///migrations

Closes #309 and #271

zaibon · 2023-08-31T16:06:23Z

I think this will be an issue when using the sql-proxy. Since the migration is run as an init container, the sql-proxy side car is not yet started, so the migration will not be able to reach the database.

migmartri · 2023-08-31T16:08:39Z

I think this will be an issue when using the sql-proxy. Since the migration is run as an init container, the sql-proxy side car is not yet started, so the migration will not be able to reach the database.

oh, that's a good point, let me think...

This proxy makes things so tricky :(

migmartri · 2023-08-31T16:20:47Z

There might be another way of running the proxy. Maybe we can run the proxy as its own deployment, and then talk to it through its internal DNS, what do you think?

what I mean is:

1 - Deploy the proxy as its own deployment in the same namespace
2 - Configure the connection string to instead of using localhost, use the servicename of that newly exposed deployment
3 - See if it works

If we do that, we could keep either the job approach or the init container approach, either should work, I think.

I like the init container approach, it makes the setup more predictable since the controlplane will not work if the DB settings are not properly set.

zaibon · 2023-08-31T18:15:00Z

I think the approach of having a single deployment for the SQL poxy is the right one. There is no need for the sql-proxy to be run multiple time in each deployment of the control-plane.

I like the init container approach, it makes the setup more predictable since the control-plane will not work if the DB settings are not properly set.

Would it be possible that the migration code changes somehow, then a new control-plane pods starts, run the migration while other control-plan pods are still alive. Making the pods that did not restart incompatible with the latest version of the database schema?

migmartri · 2023-08-31T21:07:05Z

Would it be possible that the migration code changes somehow, then a new control-plane pods starts, run the migration while other control-plan pods are still alive. Making the pods that did not restart incompatible with the latest version of the database schema?

Yes, I guess that's indeed a possibility. But I do not see it different than having a job that modifies the DB schema while we are still running an old version of the code as we currently do no?

In any case to me, schema changes need to be non-destructive and take into account both the rollout process and possible rollbacks. We can achieve that now through review process since the migrations are now part of the code.

What I could see though is a old version of the migration code trying to update the DB that it might be in a newer state, the good news are that in that case migrate apply will just skip since there is nothing new to apply.

Does it make sense?

BTW, I'll halt merging this code until we figure out how to proceed re: proxy, thanks for raising the issue.

zaibon · 2023-09-01T14:03:55Z

Yes, I guess that's indeed a possibility. But I do not see it different than having a job that modifies the DB schema while we are still running an old version of the code as we currently do no?

Indeed this could also happen with the current setup.

Does it make sense?

Yes it does. I think we can go ahead with your proposal.

migmartri · 2023-09-01T14:41:39Z

Yes it does. I think we can go ahead with your proposal.

Perfect, let me know if updating the Chart to run the proxy as its own component is something you could give a try since you have a GKE setup. Otherwise it will take me more time. Happy either way.

Thanks

zaibon · 2023-09-05T14:19:26Z

Perfect, let me know if updating the Chart to run the proxy as its own component is something you could give a try since you have a GKE setup

Yes I can work on that and test on GKE.

Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>

migmartri requested review from danlishka and zaibon August 31, 2023 15:43

migmartri marked this pull request as draft September 5, 2023 08:33

migmartri mentioned this pull request Sep 8, 2023

feat: deploy cloud-sql-proxy in its down deployment #340

Merged

chore: rebase

4f5ea9c

Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>

migmartri force-pushed the fix-migration branch from c244786 to 4f5ea9c Compare September 11, 2023 12:25

migmartri marked this pull request as ready for review September 11, 2023 12:26

migmartri merged commit 4ca2a82 into chainloop-dev:main Sep 14, 2023

migmartri deleted the fix-migration branch September 14, 2023 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(deployment): run migrations using init containers instead #325

fix(deployment): run migrations using init containers instead #325

Uh oh!

migmartri commented Aug 31, 2023 •

edited

Loading

Uh oh!

zaibon commented Aug 31, 2023

Uh oh!

migmartri commented Aug 31, 2023 •

edited

Loading

Uh oh!

migmartri commented Aug 31, 2023

Uh oh!

zaibon commented Aug 31, 2023

Uh oh!

migmartri commented Aug 31, 2023

Uh oh!

zaibon commented Sep 1, 2023

Uh oh!

migmartri commented Sep 1, 2023

Uh oh!

zaibon commented Sep 5, 2023

Uh oh!

Uh oh!

fix(deployment): run migrations using init containers instead #325

fix(deployment): run migrations using init containers instead #325

Uh oh!

Conversation

migmartri commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zaibon commented Aug 31, 2023

Uh oh!

migmartri commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

migmartri commented Aug 31, 2023

Uh oh!

zaibon commented Aug 31, 2023

Uh oh!

migmartri commented Aug 31, 2023

Uh oh!

zaibon commented Sep 1, 2023

Uh oh!

migmartri commented Sep 1, 2023

Uh oh!

zaibon commented Sep 5, 2023

Uh oh!

Uh oh!

migmartri commented Aug 31, 2023 •

edited

Loading

migmartri commented Aug 31, 2023 •

edited

Loading