-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ci image versions #129
base: master
Are you sure you want to change the base?
Conversation
eb4754b
to
5892e42
Compare
@@ -152,7 +152,7 @@ jobs: | |||
filters: | | |||
other_than_docs: | |||
- '!(docs/**)**' | |||
- name: Test Tabular Cloud | |||
- name: Test Text Cloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
@@ -395,6 +395,10 @@ def _get_image_uri(self, framework_version: str, instance_type: str, custom_imag | |||
image_scope="training", | |||
instance_type=instance_type, | |||
) | |||
if instance_type == "gpu": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for testing, I created a few images with the correct ray version. I can remove them before merging.
@@ -44,3 +44,4 @@ head_node_type: head | |||
initialization_commands: | |||
# To get access to DLC image | |||
- aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com | |||
- aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-west-2.amazonaws.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this region support with us-east-1
image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user uses images before 0.8.0
, I think they are hosted in us-east-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, shall we host all images in the same region say "us-east-1"?
tests/conftest.py
Outdated
gpu_training_image = "369469875935.dkr.ecr.us-east-1.amazonaws.com/autogluon-nightly-training:gpu-latest" | ||
cpu_inference_image = "369469875935.dkr.ecr.us-east-1.amazonaws.com/autogluon-nightly-inference:cpu-latest" | ||
gpu_inference_image = "369469875935.dkr.ecr.us-east-1.amazonaws.com/autogluon-nightly-inference:gpu-latest" | ||
cpu_training_image = "369469875935.dkr.ecr.us-east-1.amazonaws.com/weisu:cpu-latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to use weisu's image instead of fixing the nightly image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this is for testing. I created those just now.
backend_kwargs={ | ||
"initialization_commands": [ | ||
"aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com", | ||
"aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-west-2.amazonaws.com", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
ae4a28d
to
7d9fda5
Compare
Job PR-129-7d9fda5 is done. |
7d9fda5
to
4ea70d5
Compare
Job PR-129-4ea70d5 is done. |
9247508
to
737a840
Compare
Job PR-129-737a840 is done. |
737a840
to
1d1811f
Compare
@tonyhoo @prateekdesai04 CI is green after limiting the |
@@ -395,6 +395,10 @@ def _get_image_uri(self, framework_version: str, instance_type: str, custom_imag | |||
image_scope="training", | |||
instance_type=instance_type, | |||
) | |||
if "g4dn" in instance_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPU image shall be applicable to all GPU instances. We shall update the logics here to be more robust for all GPU instance types:
def is_gpu_instance(instance_type):
# A list of instance types that have GPUs
gpu_instances = [
'g2', 'g3', 'g4', 'p2', 'p3', 'p4', 'p4d', 'p5', 'p3dn', 'g4dn', 'g5', 'g5g'
]
return any(instance_type.startswith(prefix) for prefix in gpu_instances)
def select_image():
instance_type = get_instance_type()
if is_gpu_instance(instance_type):
return "369469875935.dkr.ecr.us-east-1.amazonaws.com/weisu:gpu-latest"
else:
return "369469875935.dkr.ecr.us-east-1.amazonaws.com/weisu:cpu-latest"
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will actually remove the whole commit 6cc23a7. This is to override the logic of getting the versioned DLC images using the sagemaker api for testing my custom images
@@ -6,7 +6,7 @@ max_workers: 1 | |||
|
|||
docker: | |||
# The image uri will be dynamically replaced by cloud predictor | |||
image: "763104351884.dkr.ecr.us-east-1.amazonaws.com/autogluon-training:0.7.0-cpu-py39-ubuntu20.04" | |||
image: "763104351884.dkr.ecr.us-west-2.amazonaws.com/autogluon-training:1.1.0-cpu-py310-ubuntu20.04" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend to host the image to us-east-1
if not there already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the image is already present in east-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@@ -39,6 +39,10 @@ def __init__( | |||
image_scope="training", | |||
instance_type=instance_type, | |||
) | |||
if "g4dn" in instance_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, let's consolidate the image selection logic in a common function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, I will drop the commit.
1d1811f
to
4b9c762
Compare
Issue #, if available:
Description of changes:
This PR fixes a few bugs in CI.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.