-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
I have a training job that persists some files in an URI_FOLDER output.
How can I access those through the v2 SDK API after the job has finished?
1. job setup
The output is set up like this in the command:
job = command(
# ...
outputs=dict(
outputs=Output(type=AssetTypes.URI_FOLDER, mode='rw_mount'),
),
command="python training_script.py " +
"--outputs_dir ${{outputs.outputs}} " +
# ...other arguments...
)
This seems to work fine, the corresponding folder is mounted correctly and accessible in the training script.
2. training script
In the training script, I persist a dataframe like this:
parser.add_argument("--outputs_dir", dest="outputs_dir", default=DEFAULT_MODEL_DIR)
# ...
some_dataframe.to_csv(os.path.join(args.outputs_dir, 'some_dataframe.csv'), index=True)
This works fine.
3. resulting dataset
After the job has finished, the outputs are available as a dataset.
This is what is shown in Azure ML Studio in the "Overview" tab for job ivory_octopus_yd6by49kxf:

The dataset is successfully stored in the workspaceblobstore datastore. I checked it in the Azure ML Studio and it looks fine.
4. accessing the persisted data
After the job has finished, I access the run using a MlflowClient()
MLFLOW_TRACKING_URI = ml_client.workspaces.get(name=ml_client.workspace_name).mlflow_tracking_uri
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
mlflow_client = MlflowClient()
mlflow_run = mlflow_client.get_run("ivory_octopus_yd6by49kxf")
or
run = ml_client.jobs.get('ivory_octopus_yd6by49kxf')
# returns NodeOutput class
How can I programmatically list / get / download the outputs connected to the job?
Thanks!