Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-14169] Add Credentials rotation cron job for clusters #17383

Merged
merged 6 commits into from
May 9, 2022
Merged

[BEAM-14169] Add Credentials rotation cron job for clusters #17383

merged 6 commits into from
May 9, 2022

Conversation

elink21
Copy link
Contributor

@elink21 elink21 commented Apr 18, 2022

A new cron job was created to rotate cluster credentials automatically, the cron job will be executed each 2 months.

Also a maintenance window was set to avoid disruption in the clusters.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@asf-ci
Copy link

asf-ci commented Apr 18, 2022

Can one of the admins verify this patch?

2 similar comments
@asf-ci
Copy link

asf-ci commented Apr 18, 2022

Can one of the admins verify this patch?

@asf-ci
Copy link

asf-ci commented Apr 18, 2022

Can one of the admins verify this patch?

@github-actions github-actions bot added the infra label Apr 18, 2022
@elink21 elink21 changed the title Add cron groovy job for credentials rotation [BEAM-14169] Add Credentials rotation cron job for clusters Apr 18, 2022
@elink21
Copy link
Contributor Author

elink21 commented Apr 18, 2022

R: @kileys
Hi Kiley, can you review this PR please?
Also could you please run a seed job to test the new credentials rotation? Thanks in advance.

@kileys
Copy link
Contributor

kileys commented Apr 18, 2022

R: @kennknowles

@kennknowles kennknowles self-requested a review April 18, 2022 16:54
@elink21
Copy link
Contributor Author

elink21 commented Apr 20, 2022

R: @kennknowles
Hi Kenn, could you review this PR please? Thanks in advance

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be cloned by me and have the seed job run, yes?

.test-infra/jenkins/job_ClusterCredentialsRotation.groovy Outdated Show resolved Hide resolved
@elink21
Copy link
Contributor Author

elink21 commented Apr 20, 2022

This needs to be cloned by me and have the seed job run, yes?

Yes. please.

@kennknowles
Copy link
Member

Are you able to validate this now? The code looks fine but I am not certain how I can add another pair of eyes to checking on the functionality.

@elink21
Copy link
Contributor Author

elink21 commented Apr 21, 2022

Sorry Kenneth, I got confused with the names, now I realized I asked for the seed job with the same person in the chat and here, my apologies for the confusion.
We have tested this against replicas of the actual clusters and also we executed each step manually on the real ones, but I agree that extra validation is required , do you know who can I ask for this?.
Thanks in advance.

@elink21
Copy link
Contributor Author

elink21 commented Apr 22, 2022

I found that Kerry Donny-Clark was in charge of the manual rotation for BEAM-13763, maybe we could add him as an additional reviewer?

@kennknowles
Copy link
Member

R: @kerrydc

@aaltay
Copy link
Member

aaltay commented May 5, 2022

What is the next step on this PR?

@elink21
Copy link
Contributor Author

elink21 commented May 5, 2022

What is the next step on this PR?

After Kerry or someone else review the approach, the next step is to seed the job in a Jenkins Worker to test the functionality so it can be added as a permanent cron job and finally merge it.

.test-infra/jenkins/job_ClusterCredentialsRotation.groovy Outdated Show resolved Hide resolved
.test-infra/jenkins/job_ClusterCredentialsRotation.groovy Outdated Show resolved Hide resolved
--start-credential-rotation --zone=us-central1-a --quiet''')

//Rebuilding the nodes
shell('''gcloud container clusters upgrade metrics \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if these commands close with an error state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anything goes wrong after line 50, GCP will automatically complete the process after 7 days (Automatic completion).

If something fails during the first 2 commands (--start-credential-rotation), the rotation will not start and the cluster will keep the previous credentials. We will add a condition to only execute the following steps if the start-credential-rotation part was successful. I could also add an email alert in case of failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the condition and email alert. I also recommend making each cluster independent, with separate alerts in case of failure. So, try to start rotation for io-datastores, complete upgrade and update, then do the same for metrics.

Copy link
Contributor Author

@elink21 elink21 May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I checked Jenkins configuration and it has the -xe flag when executing DSL Jobs, so if a command returns an error the script stops and it will be marked as a fail, I separated the rotation with one job per cluster with separated email notifications to dev, pointing to JOB_URL and JENKINS_URL for further details.

--start-credential-rotation --zone=us-central1-a --quiet''')

//Rebuilding the nodes
shell('''gcloud container clusters upgrade metrics \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the condition and email alert. I also recommend making each cluster independent, with separate alerts in case of failure. So, try to start rotation for io-datastores, complete upgrade and update, then do the same for metrics.

Elias Segundo added 2 commits May 9, 2022 10:16
* Now the rotations for each cluster are performed on different jobs.
* As Jenkins already has the -xe flag set while executing DSL jobs, if a command returns an error then the job will be stopped and marked as fail.
* Email notifications to dev were added using the 'publishers' method, as in others jobs in the project.
@kennknowles kennknowles merged commit 85d9824 into apache:master May 9, 2022
@damccorm damccorm mentioned this pull request Aug 31, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants