New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguity on PythonScriptStep's allow_reuse parameter #298
Comments
The default behavior of a Step execution in Pipelines is that when the script specified in the Step using script_name, inputs, and the parameters of a step remain the same, the output of a previous step run will be reused instead of running the step again. When a step is reused, the job is not submitted to the compute, instead, the results from the previous run are immediately available to the next step runs. ### allow_reuse Flag
### regenerate_outputs Flag
### hash_paths Parameter
|
@sanpil ok this is great information. thank you. If I do the following, will the step be reused the second time i run the pipeline?
If it does not re-run, I would make the case that this case should be reflected in the |
In the above scenario, if you make a change to |
hello, |
If the data is in a datastore, we would not be able to detect the data change. If the data is uploaded as part of the snapshot (under source_directory) [this is not recommended though], then the hash will change and will trigger a rerun. |
On this same note, it's confusing that the wording/explanation changes between docs. In the main how-to guides and even in the comments it says that if the script changes the pipeline will not reuse the previous results. Seems like the actual behavior is if the snapshot changes the pipeline will not be reused as stated in the remarks section: https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py#remarks This behavior makes sense but the documentation is inconsistent which makes it confusing. |
I'm curious what if I modified the underlying script of a PythonStep then run the pipeline again, will the pipeline pickup already finished steps' outputs (reuse) and only rerun steps with modified underlying script and onwards? For instance,
my_pipeline = Pipeline(workspace=ws, steps=[split_data_step, train_step]) Would AML reuse the outputs from the successfully finished split_data_step and only rerun train_step?
|
@yychenca, yes it will! To me, this is the killer feature of Azure ML pipelines. The important thing to note here is that you must have unique Feel free to reply back if you aren't seeing the intended behavior. |
This is awesome! I just found the same suggestion from #734 (comment) Will give it a try and report back my findings. Thanks! |
What exactly does "settings/inputs" mean in this scenario?
For example, If
allow_reuse = True
will a new run be generated I change:script_name
parameter,script_arg
?I might be wrong, but I think in both cases, a new run will not be generated. This is rather frustrating when developing a pipeline with many steps...
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
The text was updated successfully, but these errors were encountered: