Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(deploy): use recreate strategy and 1 replica #499

Merged
merged 1 commit into from
Nov 22, 2022

Conversation

ebaron
Copy link
Member

@ebaron ebaron commented Nov 21, 2022

There are two bugs that occur due to replicas contending for access to the credentials database, and replicas on different nodes competing for the PVC itself. Cryostat is currently meant to work as a single replica, but during rolling updates there could be two simultaneously.

To avoid these issues, we're changing the deployment strategy from the default of RollingUpdate to Recreate. As a result there will be some brief downtime in response to upgrades or configuration changes. The other change is the operator managing the Replicas field of the main Cryostat deployment. This is to ensure that the replica count stays at 1 to avoid the above bugs.

To test:

  1. make deploy create_cryostat_cr OPERATOR_IMG=....
  2. kubectl edit cryostat cryostat-sample. Change some field such as setting reportOptions.replicas to 1.
  3. The Cryostat deployment should scale down to 0 and then back up to 1, reflecting this new configuration.
  4. Try manually scaling up/down the Cryostat deployment. After a brief time, the operator should revert the deployment to 1 replica.

Fixes: #487, #491

@ebaron ebaron requested a review from tthvo November 21, 2022 21:32
@ebaron ebaron linked an issue Nov 21, 2022 that may be closed by this pull request
@tthvo
Copy link
Member

tthvo commented Nov 22, 2022

kubectl edit cryostat cryostat-sample. Change some field such as setting reportOptions.replicas to 1.

About this, I am bit unsure what causes the main deployment to rollout new pods if this does not change its spec? Setting reportOptions.replicas just change the report deployment, right?

@tthvo
Copy link
Member

tthvo commented Nov 22, 2022

I also got a bit of an issue when creating a report replica as pod seems to in CrashLoopBackOff with logs show:

exec /deployments/run-java.sh: permission denied

I am using Cluster Bot with OCP 4.11.1

@ebaron
Copy link
Member Author

ebaron commented Nov 22, 2022

kubectl edit cryostat cryostat-sample. Change some field such as setting reportOptions.replicas to 1.

About this, I am bit unsure what causes the main deployment to rollout new pods if this does not change its spec? Setting reportOptions.replicas just change the report deployment, right?

This ends up setting the CRYOSTAT_REPORT_GENERATOR environment variable with the URL to the reports service:

if specs.ReportsURL != nil {
reportsEnvs := []corev1.EnvVar{
{
Name: "CRYOSTAT_REPORT_GENERATOR",
Value: specs.ReportsURL.String(),
},
}
envs = append(envs, reportsEnvs...)
.

So there is a subtle config change on the main deployment as well.

@ebaron
Copy link
Member Author

ebaron commented Nov 22, 2022

I also got a bit of an issue when creating a report replica as pod seems to in CrashLoopBackOff with logs show:

exec /deployments/run-java.sh: permission denied

I am using Cluster Bot with OCP 4.11.1

Oh this again. We've encountered this with some of our sample apps. It's a matter of updating our Dockerfiles. You should still be able to test this PR though. Instead of the reports replicas, you could try switching enableCertManager.

@tthvo
Copy link
Member

tthvo commented Nov 22, 2022

kubectl edit cryostat cryostat-sample. Change some field such as setting reportOptions.replicas to 1.

About this, I am bit unsure what causes the main deployment to rollout new pods if this does not change its spec? Setting reportOptions.replicas just change the report deployment, right?

This ends up setting the CRYOSTAT_REPORT_GENERATOR environment variable with the URL to the reports service:

if specs.ReportsURL != nil {
reportsEnvs := []corev1.EnvVar{
{
Name: "CRYOSTAT_REPORT_GENERATOR",
Value: specs.ReportsURL.String(),
},
}
envs = append(envs, reportsEnvs...)

.

So there is a subtle config change on the main deployment as well.

Ah niceee! I didnt notice before. Make sense! Thanks :D

Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this again. We've encountered this with some of our sample apps. It's a matter of updating our Dockerfiles. You should still be able to test this PR though. Instead of the reports replicas, you could try switching enableCertManager.

Nice, so its unrelated then. And looks great to me and deployments are recreated as expected. And thanks a lot for explaining those above :D

@ebaron ebaron merged commit ef81336 into cryostatio:main Nov 22, 2022
mergify bot pushed a commit that referenced this pull request Nov 22, 2022
ebaron added a commit that referenced this pull request Nov 22, 2022
(cherry picked from commit ef81336)

Co-authored-by: Elliott Baron <ebaron@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Default PVC can cause rolling updates to fail Redeployment fails due to H2 DB file contention
2 participants