Skip to content

ParallelRunStep for R scripts #16564

@yeamusic21

Description

@yeamusic21

Is your feature request related to a problem? Please describe.
Currently, we're able to deploy Python ML models with distributed batch scoring using the ParallelRunStep, and then we can call that deployed pipeline from Data Factory. Some of the data scientists that I work with model in R. ParallelRunStep can only run Python scripts and not R scripts. Due to this, we're looking at other deployment alternatives, but it would be great if we could use Python to build a pipeline but have the scoring code be R.

Describe the solution you'd like
Currently, we build ML pipelines in ML Studio using the Python SDK and the ParallelRunStep. We still want to use the Python SDK to build our pipelines, but it would be great if there was a way to run an R script from the ParallelRunStep instead of only Python scripts.

Describe alternatives you've considered
Azure Batch. Azure Databricks + SparkR. Azure Databricks + Sparklyr.

Additional context
For e.g., it would be great if the entry_script in ParallelRunConfig could be an R script if we wanted. Then the R batch scoring script would follow the same format as a Python scoring script (init function, run function, etc.). In addition, it would be great if we could specify the version of R as well.

env.r_version = '3.4.3'

parallel_run_config = ParallelRunConfig(
    environment=env,
    entry_script="batch_scoring.R",
    source_directory=".",
    output_action="append_row",
    mini_batch_size="20",
    error_threshold=1,
    compute_target=compute_target,
    process_count_per_node=2,
    node_count=1
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ClientThis issue points to a problem in the data-plane of the library.ML-InferenceAreaPathMachine LearningService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions