New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Impossible to configure shm_size when launching a CommandJob with AzureML SDK v2 #6571

Closed

tmignot63 opened this issue Jul 27, 2023 · 4 comments

Assignees

Labels

Auto-Assign bug customer-reported CXP Attention extension/ml Machine Learning question

tmignot63 commented Jul 27, 2023

Describe the bug

I get an Validation error when I want to launch a CommandJob with a custom shm_size

Related command

az ml job create --file file.yaml

Errors

Configured default 't-bs-mf-explore-iris2-phd' for arg resource_group_name
Configured default 'aml-dcy-int-iris2-phd' for arg workspace_name
Met error <class 'marshmallow.exceptions.ValidationError'>:Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code
Please check log in debug mode for more details.
Command ran in 2.107 seconds (init: 0.441, invoke: 1.666)

Issue script & Debug output

Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 143, in load_from_dict
return schema(context=context).load(data, **kwargs)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 722, in load
return self._do_load(
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 909, in _do_load
raise exc
marshmallow.exceptions.ValidationError: {'resources': {'shm_size': ['Unknown field.']}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/custom/job.py", line 60, in ml_job_create
job = load_job(path=file, params_override=params_override)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 74, in load_job
return load_common(Job, path, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 59, in load_common
return cls._load(data=yaml_dict, yaml_path=path, params_override=params_override, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/job.py", line 235, in _load
return job_type._load_from_dict(
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/command_job.py", line 166, in _load_from_dict
loaded_data = load_from_dict(CommandJobSchema, data, context, additional_message, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 146, in load_from_dict
raise ValidationError(decorate_validation_error(schema, pretty_error, additional_message))
marshmallow.exceptions.ValidationError: Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code
cli: None
cli: Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 143, in load_from_dict
return schema(context=context).load(data, **kwargs)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 722, in load
return self._do_load(
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 909, in _do_load
raise exc
marshmallow.exceptions.ValidationError: {'resources': {'shm_size': ['Unknown field.']}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/custom/job.py", line 60, in ml_job_create
job = load_job(path=file, params_override=params_override)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 74, in load_job
return load_common(Job, path, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 59, in load_common
return cls._load(data=yaml_dict, yaml_path=path, params_override=params_override, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/job.py", line 235, in _load
return job_type._load_from_dict(
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/command_job.py", line 166, in _load_from_dict
loaded_data = load_from_dict(CommandJobSchema, data, context, additional_message, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 146, in load_from_dict
raise ValidationError(decorate_validation_error(schema, pretty_error, additional_message))
marshmallow.exceptions.ValidationError: Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code

cli.azure.cli.core.azclierror: Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 143, in load_from_dict
return schema(context=context).load(data, **kwargs)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 722, in load
return self._do_load(
File "/anaconda/envs/torch12/lib/python3.8/site-packages/marshmallow/schema.py", line 909, in _do_load
raise exc
marshmallow.exceptions.ValidationError: {'resources': {'shm_size': ['Unknown field.']}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/az/extensions/ml/azext_mlv2/manual/custom/job.py", line 60, in ml_job_create
job = load_job(path=file, params_override=params_override)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 74, in load_job
return load_common(Job, path, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_load_functions.py", line 59, in load_common
return cls._load(data=yaml_dict, yaml_path=path, params_override=params_override, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/job.py", line 235, in _load
return job_type._load_from_dict(
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_job/command_job.py", line 166, in _load_from_dict
loaded_data = load_from_dict(CommandJobSchema, data, context, additional_message, **kwargs)
File "/opt/az/extensions/ml/azext_mlv2/manual/vendored_curated_sdk/azure/ai/ml/entities/_util.py", line 146, in load_from_dict
raise ValidationError(decorate_validation_error(schema, pretty_error, additional_message))
marshmallow.exceptions.ValidationError: Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/anaconda/envs/torch12/lib/python3.8/site-packages/knack/cli.py", line 233, in invoke
cmd_result = self.invocation.execute(args)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/core/commands/init.py", line 663, in execute
raise ex
File "/anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/core/commands/init.py", line 726, in _run_jobs_serially
results.append(self._run_job(expanded_arg, cmd_copy))
File "/anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/core/commands/init.py", line 697, in _run_job
result = cmd_copy(params)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/core/commands/init.py", line 333, in call
return self.handler(*args, **kwargs)
File "/anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/core/commands/command_operation.py", line 121, in handler
return op(**command_args)
File "/opt/az/extensions/ml/azext_mlv2/manual/custom/job.py", line 77, in ml_job_create
log_and_raise_error(err, debug)
File "/opt/az/extensions/ml/azext_mlv2/manual/custom/raise_error.py", line 117, in log_and_raise_error
raise cli_error
knack.util.CLIError: Met error <class 'marshmallow.exceptions.ValidationError'>:Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code
Please check log in debug mode for more details.

cli.azure.cli.core.azclierror: Met error <class 'marshmallow.exceptions.ValidationError'>:Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code
Please check log in debug mode for more details.
az_command_data_logger: Met error <class 'marshmallow.exceptions.ValidationError'>:Validation for CommandJobSchema failed:

{
"resources": {
"shm_size": [
"Unknown field."
]
}
}

If you are trying to configure a job that is not of type command, please specify the correct job type in the 'type' property.
For a more detailed breakdown of the CommandJob schema, please see: https://aka.ms/ml-cli-v2-job-command-yaml-reference.
The easiest way to author a specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning
To set up: https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code
Please check log in debug mode for more details.
cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x7fd1e33cb040>]
az_command_data_logger: exit code: 1
cli.main: Command ran in 1.322 seconds (init: 0.568, invoke: 0.753)
telemetry.main: Begin splitting cli events and extra events, total events: 1
telemetry.client: Accumulated 0 events. Flush the clients.
telemetry.main: Finish splitting cli events and extra events, cli events: 1
telemetry.save: Save telemetry record of length 4340 in cache
telemetry.check: Returns Positive.
telemetry.main: Begin creating telemetry upload process.
telemetry.process: Creating upload process: "/anaconda/envs/torch12/bin/python /anaconda/envs/torch12/lib/python3.8/site-packages/azure/cli/telemetry/init.py /home/azureuser/.azure"
telemetry.process: Return from creating process
telemetry.main: Finish creating telemetry upload process.

Expected behavior

The job should be launched

Environment Summary

azure-cli 2.50.0

core 2.50.0
telemetry 1.0.8

Extensions:
ml 2.5.0

Dependencies:
msal 1.22.0
azure-mgmt-resource 23.1.0b2

Python location '/anaconda/envs/torch12/bin/python'
Extensions directory '/opt/az/extensions'

Python (Linux) 3.8.15 (default, Nov 24 2022, 15:19:38)
[GCC 11.2.0]

Legal docs and information: aka.ms/AzureCliLegal

Additional context

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json

experiment_name: model_run_semantic_v2
display_name: DUAT_model_run

resources:
  shm_size: 28g

environment_variables:
  DATASET_MOUNT_CACHE_SIZE: '50 GB'
  DATASET_MOUNT_FILE_CACHE_PRUNE_TARGET: '0'

code: ../../../../ # Relative from yaml file in order to get the root directory.
command: >-
  python pipelines/scripts/04_model_run_segmentation.py 
  --input_data ${{inputs.input_data}}
  --input_labels ${{inputs.input_labels}}
  --input_model_root ${{inputs.input_model_root}}
  --pretrained_model_path ${{inputs.pretrained_model_path}}
  --label_version ${{inputs.label_version1}}
  --n_channels ${{inputs.n_channels}} 
  --zones_camera ${{inputs.zones_cameras}} 
  --separate_all_cameras ${{inputs.separate_all_cameras}} 
  --log_dir ${{outputs.output_dir}} 
  --alpha_fce ${{inputs.alpha_fce}}
  --input_height ${{inputs.input_height}}
  --input_width ${{inputs.input_width}}
  --detection_mode ${{inputs.detection_mode}}
  --include_metadata ${{inputs.include_metadata}}
  --model_seg_type ${{inputs.model_seg_type}}
  --lb_model_run_name ${{inputs.lb_model_run_name}}
  --tyre_type ${{inputs.tyre_type}}
  --usines ${{inputs.usine1}}
  --dataset_mode ${{inputs.dataset_mode}}
  --thresholds_model_run ${{inputs.thresholds_model_run}}
  --input_model_segmentation_path ${{inputs.input_model_segmentation_path}}
  --lb_model_name ${{inputs.lb_model_name}}

inputs:
  input_data: 
    type: uri_folder
    path: azureml://datastores/ara_b2b_qualif_data/paths/
    mode: ro_mount

  input_labels: 
    type: uri_folder
    path: azureml://datastores/ara_b2b_qualif_label/paths/
    mode: ro_mount

  input_model_root: 
    type: uri_folder
    path: azureml://datastores/workspaceblobstore/paths/
    mode: ro_mount

outputs:
  output_dir:
    type: uri_folder
    path: azureml://datastores/workspaceblobstore/paths/semantic_seg
    mode: rw_mount

environment: azureml:torch@latest
compute: azureml:T4-TC-light-illimited

I have removed personnal arguments

tmignot63 added the bug label

Collaborator

yonzhan commented Jul 27, 2023

Thank you for opening this issue, we will look into it.

microsoft-github-policy-service bot added question customer-reported Auto-Assign CXP Attention Machine Learning extension/ml labels

navba-MSFT self-assigned this

Contributor

navba-MSFT commented Jul 28, 2023

@tmignot63 Thanks for reaching out to us and reporting this issue. Could you please update your ml extension by running the below command and check if that helps ?

az extension update -n ml

Awaiting your reply.

navba-MSFT added the needs-author-feedback label

Author

tmignot63 commented Jul 28, 2023 via email •

edited

Loading

Hello, It's working now with the update, many thanks, Le ven. 28 juil. 2023, 06:30, navba-MSFT ***@***.***> a écrit :

@tmignot63 <https://github.com/tmignot63> Thanks for reaching out to us and reporting this issue. Could you please update your ml extension by running the below command and check if that helps ? az extension update -n ml Awaiting your reply. — Reply to this email directly, view it on GitHub <#6571 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APH6W6BM22CUMAWCPUJSG73XSM56TANCNFSM6AAAAAA22KKRR4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

microsoft-github-policy-service bot added needs-team-attention and removed needs-author-feedback labels

Contributor

navba-MSFT commented Jul 28, 2023

@tmignot63 Thanks for getting back. We will now proceed with closure of this GitHub issue. If you need any further assistance on this issue in future, please feel free to reopen this thread. We would be happy to help.

navba-MSFT closed this as completed

navba-MSFT removed the needs-team-attention label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Auto-Assign bug customer-reported CXP Attention extension/ml Machine Learning question