-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compute clusters with MSI created #2024
base: main
Are you sure you want to change the base?
Conversation
infra/bootstrap.sh
Outdated
|
||
echo_title "Ensuring GPU compute" | ||
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute "gpu-cluster" 0 20 "Standard_NC6" | ||
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute "automl-gpu-cluster" 0 4 "STANDARD_NC6" | ||
|
||
echo_title "Ensuring GPU compute with MSI" | ||
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute_msi "gpu-cluster-msi" 0 20 "Standard_NC6" "uaimevnet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why do we need a new dedicated 20 node GPU cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I am trying to replicate the cpu-cluster/gpu-cluster ..these clusters are getting used in all automl pipelines.
So I choose the exact configuration that they have
infra/bootstrap.sh
Outdated
@@ -102,11 +102,17 @@ if [[ ! -z "${RUN_BOOTSTRAP:-}" ]]; then | |||
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute "automl-cpu-cluster" 0 4 "Standard_DS3_v2" | |||
# Larger CPU cluster for Dask and Spark examples | |||
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute "cpu-cluster-lg" 0 4 "Standard_DS15_v2" | |||
|
|||
echo_title "Ensuring CPU compute with MSI" | |||
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute_msi "cpu-cluster-msi" 0 20 "Standard_DS3_v2" "uaimevnet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason of choosing 20 nodes for the CPU clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I am trying to replicate the cpu-cluster/gpu-cluster ..these clusters are getting used in all automl pipelines.
So I choose the exact configuration that they have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will end up with outofquota error since we do not have unlimited quota for GPU in the subscription. Please reconfigure the quota numbers based on what these new clusters will be used for.
56cb1fd
to
06c3638
Compare
Description
Creating computes which will have MSI assigned to it, so that we can get MLClient on such computes if required.
Currently we do not have such computes,so automl components such as registeration/deployment are failing , which requires MLClient
Checklist