-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-14169] Add Credentials rotation cron job for clusters #17383
[BEAM-14169] Add Credentials rotation cron job for clusters #17383
Conversation
Can one of the admins verify this patch? |
2 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
R: @kileys |
R: @kennknowles |
R: @kennknowles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be cloned by me and have the seed job run, yes?
Yes. please. |
Are you able to validate this now? The code looks fine but I am not certain how I can add another pair of eyes to checking on the functionality. |
Sorry Kenneth, I got confused with the names, now I realized I asked for the seed job with the same person in the chat and here, my apologies for the confusion. |
I found that Kerry Donny-Clark was in charge of the manual rotation for BEAM-13763, maybe we could add him as an additional reviewer? |
R: @kerrydc |
What is the next step on this PR? |
After Kerry or someone else review the approach, the next step is to seed the job in a Jenkins Worker to test the functionality so it can be added as a permanent cron job and finally merge it. |
--start-credential-rotation --zone=us-central1-a --quiet''') | ||
|
||
//Rebuilding the nodes | ||
shell('''gcloud container clusters upgrade metrics \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if these commands close with an error state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anything goes wrong after line 50, GCP will automatically complete the process after 7 days (Automatic completion).
If something fails during the first 2 commands (--start-credential-rotation), the rotation will not start and the cluster will keep the previous credentials. We will add a condition to only execute the following steps if the start-credential-rotation part was successful. I could also add an email alert in case of failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the condition and email alert. I also recommend making each cluster independent, with separate alerts in case of failure. So, try to start rotation for io-datastores, complete upgrade and update, then do the same for metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I checked Jenkins configuration and it has the -xe flag when executing DSL Jobs, so if a command returns an error the script stops and it will be marked as a fail, I separated the rotation with one job per cluster with separated email notifications to dev, pointing to JOB_URL and JENKINS_URL for further details.
--start-credential-rotation --zone=us-central1-a --quiet''') | ||
|
||
//Rebuilding the nodes | ||
shell('''gcloud container clusters upgrade metrics \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the condition and email alert. I also recommend making each cluster independent, with separate alerts in case of failure. So, try to start rotation for io-datastores, complete upgrade and update, then do the same for metrics.
* Now the rotations for each cluster are performed on different jobs. * As Jenkins already has the -xe flag set while executing DSL jobs, if a command returns an error then the job will be stopped and marked as fail. * Email notifications to dev were added using the 'publishers' method, as in others jobs in the project.
A new cron job was created to rotate cluster credentials automatically, the cron job will be executed each 2 months.
Also a maintenance window was set to avoid disruption in the clusters.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.