Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix ListTrainingJobs throttling for E2E tests #634

Merged
merged 5 commits into from
Apr 21, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 28 additions & 7 deletions .github/workflows/end-to-end-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,36 @@ jobs:
role-to-assume: ${{ secrets.PROD_AWS_INTEG_TEST_ROLE_ARN }}
role-session-name: integtestsession
aws-region: ${{ env.AWS_DEFAULT_REGION }}
- name: Install boto3
run: python -m pip install boto3
- name: Stop all left-over training jobs
run: |
aws sagemaker list-training-jobs --status-equals InProgress > running_jobs.json
jq -c '.[][]["TrainingJobName"]' running_jobs.json | while read i; do
jobName=`echo $i | cut -d "\"" -f 2`
echo "stopping training job $jobName"
aws sagemaker stop-training-job --training-job-name $jobName
sleep 5
done
import boto3
from time import sleep

def get_in_progress_training_jobs(sagemaker):
in_progress_jobs = []
paginator = sagemaker.get_paginator('list_training_jobs')

for page in paginator.paginate(StatusEquals='InProgress'):
in_progress_jobs.extend(page['TrainingJobSummaries'])
sleep(2)

return in_progress_jobs

def stop_training_jobs(sagemaker, in_progress_jobs):
for job in in_progress_jobs:
job_name = job['TrainingJobName']
print(f'Stopping training job: {job_name}')
sagemaker.stop_training_job(TrainingJobName=job_name)
sleep(1)

sagemaker = boto3.client('sagemaker')
in_progress_jobs = get_in_progress_training_jobs(sagemaker)
stop_training_jobs(sagemaker, in_progress_jobs)

shell: python


# Longer-running code in examples/, may need BB repository

Expand Down