Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute clusters with MSI created #2024

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

HrishikeshGeedMS
Copy link
Collaborator

Description

Creating computes which will have MSI assigned to it, so that we can get MLClient on such computes if required.
Currently we do not have such computes,so automl components such as registeration/deployment are failing , which requires MLClient

Checklist

  1. Added two lines in infra/bootstrap.sh line number - 107,114
  2. Added new function in infra/sdk_helper.sh to create computes with MSI - line number 176 to 205


echo_title "Ensuring GPU compute"
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute "gpu-cluster" 0 20 "Standard_NC6"
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute "automl-gpu-cluster" 0 4 "STANDARD_NC6"

echo_title "Ensuring GPU compute with MSI"
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute_msi "gpu-cluster-msi" 0 20 "Standard_NC6" "uaimevnet"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why do we need a new dedicated 20 node GPU cluster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I am trying to replicate the cpu-cluster/gpu-cluster ..these clusters are getting used in all automl pipelines.
So I choose the exact configuration that they have

@@ -102,11 +102,17 @@ if [[ ! -z "${RUN_BOOTSTRAP:-}" ]]; then
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute "automl-cpu-cluster" 0 4 "Standard_DS3_v2"
# Larger CPU cluster for Dask and Spark examples
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute "cpu-cluster-lg" 0 4 "Standard_DS15_v2"

echo_title "Ensuring CPU compute with MSI"
"$SCRIPT_DIR"/sdk_helpers.sh ensure_aml_compute_msi "cpu-cluster-msi" 0 20 "Standard_DS3_v2" "uaimevnet"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason of choosing 20 nodes for the CPU clusters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I am trying to replicate the cpu-cluster/gpu-cluster ..these clusters are getting used in all automl pipelines.
So I choose the exact configuration that they have

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will end up with outofquota error since we do not have unlimited quota for GPU in the subscription. Please reconfigure the quota numbers based on what these new clusters will be used for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants