Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass a DataPath PipelineParameter from AzureDatafactory to MachineLearningExecutePipeline Activity? #216

Closed
SriramAvatar opened this issue Sep 10, 2020 · 1 comment

Comments

@SriramAvatar
Copy link

  • I am trying to read a file from a Blob Storage, load to pandas and write it to a BlobStorage
  • I have an Azure Machine Learning Pipeline with a PythonScriptStep that takes 2 PipelineParameters and are DataPaths as below.
from azureml.data.datapath import DataPath, DataPathComputeBinding, DataReference
from azureml.pipeline.core import PipelineParameter

datastore = Datastore(ws, "SampleStore")
in_raw_path_default = 'somefolder/raw/alerts/2020/08/03/default_in.csv'
in_cleaned_path_default= 'somefolder/cleaned/alerts/2020/08/03/default_out.csv'

in_raw_datapath = DataPath(datastore=datastore, path_on_datastore=in_raw_path_default)
in_raw_path_pipelineparam = PipelineParameter(name="inrawpath", default_value=in_raw_datapath)
raw_datapath_input = (in_raw_path_pipelineparam, DataPathComputeBinding(mode='mount'))

in_cleaned_datapath = DataPath(datastore=datastore, path_on_datastore=in_cleaned_path_default)
in_cleaned_path_pipelineparam = PipelineParameter(name="incleanedpath", default_value=in_cleaned_datapath)
cleaned_datapath_input = (in_cleaned_path_pipelineparam, DataPathComputeBinding(mode='mount'))

from azureml.pipeline.steps import PythonScriptStep

source_directory = script_folder + '/pipeline_Steps'
dataprep_step = PythonScriptStep(
    script_name="SimpleTest.py", 
    arguments=["--input_data", raw_datapath_input, "--cleaned_data", cleaned_datapath_input],
    inputs=[raw_datapath_input, cleaned_datapath_input],    
    compute_target=default_compute, 
    source_directory=source_directory,
    runconfig=run_config,
    allow_reuse=True
)

from azureml.pipeline.core import Pipeline
pipeline_test = Pipeline(workspace=ws, steps=[dataprep_step])

Triggering this AzureMLPipeline from AzureDataFactory :
The above code works fine by executing the ML pipeline in AzureMLWorkspace/PipelineSDK as follows.

from azureml.pipeline.core import Pipeline
pipeline_test = Pipeline(workspace=ws, steps=[dataprep_step])

test_raw_path = DataPath(datastore=datastore, path_on_datastore='samplefolder/raw/alerts/2017/05/31/test.csv')
test_cleaned_path = DataPath(datastore=datastore, path_on_datastore='samplefolder/cleaned/alerts/2020/09/03')
pipeline_run_msalerts = Experiment(ws, 'SampleExperiment').submit(pipeline_test, pipeline_parameters={"inrawpath"  : test_raw_path, "incleanedpath" : test_cleaned_path})

I am trying to trigger the same AzureMLpipeline from AzureDataFactory(AzureMachineLearningExecutePipeline) activity as follows.
image

image

I receive the following error.

Current directory:  /mnt/batch/tasks/shared/LS_root/jobs/myazuremlworkspace/azureml/d8ee11ea-5838-46e5-a8ce-da2fbff5aade/mounts/workspaceblobstore/azureml/d8ee11ea-5838-46e5-a8ce-da2fbff5aade
Preparing to call script [ SimpleTest.py ] 
with arguments:
 ['--input_data', '/mnt/batch/tasks/shared/LS_root/jobs/myazuremlworkspace/azureml/d8ee11ea-5838-46e5-a8ce-da2fbff5aade/mounts/SampleStore/somefolder/raw/alerts/2020/08/03/default_in.csv',
 '--cleaned_data', '/mnt/batch/tasks/shared/LS_root/jobs/myazuremlworkspace/azureml/d8ee11ea-5838-46e5-a8ce-da2fbff5aade/mounts/SampleStore/somefolder/cleaned/alerts/2020/08/03/default_out.csv']
After variable expansion, calling script [ SimpleTest.py ] with arguments:
 ['--input_data', '/mnt/batch/tasks/shared/LS_root/jobs/myazuremlworkspace/azureml/d8ee11ea-5838-46e5-a8ce-da2fbff5aade/mounts/SampleStore/somefolder/raw/alerts/2020/08/03/default_in.csv',
 '--cleaned_data', '/mnt/batch/tasks/shared/LS_root/jobs/myazuremlworkspace/azureml/d8ee11ea-5838-46e5-a8ce-da2fbff5aade/mounts/SampleStore/somefolder/cleaned/alerts/2020/08/03/default_out.csv']

Script type = None
Argument 1: /mnt/batch/tasks/shared/LS_root/jobs/myazuremlworkspace/azureml/d8ee11ea-5838-46e5-a8ce-da2fbff5aade/mounts/SampleStore/somefolder/raw/alerts/2020/08/03/default_in.csv
Argument 2: /mnt/batch/tasks/shared/LS_root/jobs/myazuremlworkspace/azureml/d8ee11ea-5838-46e5-a8ce-da2fbff5aade/mounts/SampleStore/somefolder/cleaned/alerts/2020/08/03/default_out.csv
.......................
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/batch/tasks/shared/LS_root/jobs/myazuremlworkspace/azureml/d8ee11ea-5838-46e5-a8ce-da2fbff5aade/mounts/SampleStore/somefolder/raw/alerts/2020/08/03/default_in.csv'

I think this is because the input parameters are passed as a string instead of datapath. How can I pass a datapath from AzureDatafactory to Azure ML Pipeline ?

@anderl80
Copy link

Just sharing some related discussion https://docs.microsoft.com/en-us/answers/questions/91785/azure-data-factory-how-to-pass-datapath-as-a-param.html

@fhljys fhljys closed this as completed Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants