Skip to content

[azureml python sdk v2] access files in URI_FOLDER output after job has finished? #1891

@movingabout

Description

@movingabout

I have a training job that persists some files in an URI_FOLDER output.
How can I access those through the v2 SDK API after the job has finished?

1. job setup

The output is set up like this in the command:

job = command(
    # ...
    outputs=dict(
        outputs=Output(type=AssetTypes.URI_FOLDER, mode='rw_mount'),
    ),
    command="python training_script.py " + 
            "--outputs_dir ${{outputs.outputs}} " +
            # ...other arguments...
)

This seems to work fine, the corresponding folder is mounted correctly and accessible in the training script.

2. training script

In the training script, I persist a dataframe like this:

parser.add_argument("--outputs_dir", dest="outputs_dir", default=DEFAULT_MODEL_DIR)
# ...
some_dataframe.to_csv(os.path.join(args.outputs_dir, 'some_dataframe.csv'), index=True)

This works fine.

3. resulting dataset

After the job has finished, the outputs are available as a dataset.
This is what is shown in Azure ML Studio in the "Overview" tab for job ivory_octopus_yd6by49kxf:
image

The dataset is successfully stored in the workspaceblobstore datastore. I checked it in the Azure ML Studio and it looks fine.

4. accessing the persisted data

After the job has finished, I access the run using a MlflowClient()

MLFLOW_TRACKING_URI = ml_client.workspaces.get(name=ml_client.workspace_name).mlflow_tracking_uri
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
mlflow_client = MlflowClient()

mlflow_run = mlflow_client.get_run("ivory_octopus_yd6by49kxf")

or

run = ml_client.jobs.get('ivory_octopus_yd6by49kxf')
# returns NodeOutput class

How can I programmatically list / get / download the outputs connected to the job?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions