Skip to content

Allow training jobs to have outputs and Property Files the same way processing jobs have #4267

@lorenzwalthert

Description

@lorenzwalthert

Describe the feature you'd like

During training, the main output artefact is the model, and then there are metrics that are best logged to an experiment tracking tool, but there is still potentially other information that is generated during the training process that one wants to access later. According to the docs, there are three different output directories that are uploaded to S3:

  • /opt/ml/model: This directory should contain the trained model.
  • /opt/ml/output/data: Any data you want to upload
  • /opt/ml/output/failure: Diagnostic data for failed jobs.

The first two files are compressed, while it seems according to the API and SDK docs, compression can be disabled for the model.

I would like to use a training output the same way I can use ProcessingOutput in processing jobs. And in a next step, also Property Files. The fact that they don't exist creates friction in the architecture of our pipelines since there needs to be workaround if a consecutive step in a pipeline needs to access training output other than the model.

How would this feature be used? Please describe.

Developing Sagemaker Pipelines while passing outputs from training job with script mode to next steps like for processing jobs.

Describe alternatives you've considered

  • upload data as part of the script you run in training: In a pipeline, there is no way to refer to that data and mark it as a dependency on the DAG. Hence, consecutive steps may fail.
  • Using the data directory: Seems compressed, which is not ideal if I want to access individual files. If the location the directory is written to is deterministic and dependent on the name of the training job, constructing the path there using the sagemaker.workflow.functions.Join() might solve the DAG problem.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions