Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds zombie removal tool #13718

Merged
merged 6 commits into from
Jun 13, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/terminate-zombie-build-instances.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,9 @@ jobs:

# See https://docs.aws.amazon.com/cli/latest/reference/ec2/terminate-instances.html for terminate command.
echo $to_terminate | jq '.[] | .InstanceId' | xargs --no-run-if-empty --max-args=1 aws ec2 terminate-instances --instance-ids

steps:
- shell: List and Terminate GH actions runners Older than One Day
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see logic to filter out runners older than a day. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that may have been an aspirational day when I started writing the script, I'll update the comment.

We are deleting anything not "online" but I'm changing that in an upcoming PR to == offline which seems to be the end state of these runners. Happy to add day logic though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need. wanted to make sure I understood the logic. I think filtering for offline makes sense. I believe the action status is online -> offline. would double check that runners don't start in offline.

env:
GITHUB_PAT: ${{ secrets.OCTAVIA_PAT }}
run: ./tools/bin/gh_action_zombie_killer
77 changes: 77 additions & 0 deletions tools/bin/gh_action_zombie_killer
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/usr/bin/env bash
supertopher marked this conversation as resolved.
Show resolved Hide resolved

# ------------- Import some defaults for the shell

# Source shell defaults
# $0 is the currently running program (this file)
this_file_directory=$(dirname $0)
relative_path_to_defaults=$this_file_directory/../shell_defaults

# if a file exists there, source it. otherwise complain
if test -f $relative_path_to_defaults; then
# source and '.' are the same program
source $relative_path_to_defaults
else
echo -e "\033[31m\nFAILED TO SOURCE TEST RUNNING OPTIONS.\033[39m"
echo -e "\033[31mTried $relative_path_to_defaults\033[39m"
exit 1
fi

echo "To run locally use GITHUB_PAT=\$YOUR_PAT_HERE before running"
token=$GITHUB_PAT
org=airbytehq
# FUN POSIX fact, every string is an array!
repo_list="airbyte airbyte-cloud"
supertopher marked this conversation as resolved.
Show resolved Hide resolved


for repo in $repo_list; do
# Start the while loop to check for all runners
runner_for_page_count=1
page_count=0
all_runner_ids=""
# keep paging through until we find them all
while test $runner_for_page_count -gt 0; do
page_count=$(($page_count+1))
set +o xtrace
# API for endpoint:
# https://docs.github.com/en/rest/actions/self-hosted-runners#list-self-hosted-runners-for-a-repository
runner_response=$(curl \
supertopher marked this conversation as resolved.
Show resolved Hide resolved
--silent \
--header "Accept: application/vnd.github.v3+json" \
--header "Authorization: token $token" \
--request GET https://api.github.com/repos/$org/$repo/actions/runners?page=$page_count&per_page=100)
runner_response_wc=$(echo $runner_response | wc -w)
# For auth errors because auth errors are short
if test $runner_response_wc -lt 100; then
echo -e "$blue_text""\$runner_response is \n\n$runner_response\n\n""$default_text"
fi

supertopher marked this conversation as resolved.
Show resolved Hide resolved
runner_ids_for_page=$(echo $runner_response | \
jq '.runners[] | select(.status=="offline") | .id')

runner_for_page_count=$(echo $runner_ids_for_page | wc -w)
echo -e "$blue_text""jq returned $runner_for_page_count runners for page $page_count""$default_text"
all_runner_ids="$runner_ids_for_page $all_runner_ids"
all_runner_ids_count=$(echo $all_runner_ids | wc -w)
echo -e "$blue_text""Total count is now $all_runner_ids_count""$default_text"
done

echo -e "$blue_text""Total ids returned: $all_runner_ids_count""$default_text"
# DELETE THEM ALL!
cursor=0
for this_runner in $all_runner_ids; do
cursor=$(($cursor+1))
echo -e "$blue_text""Removing $cursor / $all_runner_ids_count""$default_text"
# API for endpoint:
# https://docs.github.com/en/rest/actions/self-hosted-runners#delete-a-self-hosted-runner-from-a-repository
curl \
--silent \
--request DELETE \
--header "Accept: application/vnd.github.v3+json" \
--header "Authorization: token $token" \
https://api.github.com/repos/$org/$repo/actions/runners/$this_runner && \
supertopher marked this conversation as resolved.
Show resolved Hide resolved
echo -e "$blue_text""OK ID $this_runner""$default_text" || \
echo -e "$red_text""FAIL! ID $this_runner""$default_text"
done

done