-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the feature you'd like
During training, the main output artefact is the model, and then there are metrics that are best logged to an experiment tracking tool, but there is still potentially other information that is generated during the training process that one wants to access later. According to the docs, there are three different output directories that are uploaded to S3:
/opt/ml/model: This directory should contain the trained model./opt/ml/output/data: Any data you want to upload/opt/ml/output/failure: Diagnostic data for failed jobs.
The first two files are compressed, while it seems according to the API and SDK docs, compression can be disabled for the model.
I would like to use a training output the same way I can use ProcessingOutput in processing jobs. And in a next step, also Property Files. The fact that they don't exist creates friction in the architecture of our pipelines since there needs to be workaround if a consecutive step in a pipeline needs to access training output other than the model.
How would this feature be used? Please describe.
Developing Sagemaker Pipelines while passing outputs from training job with script mode to next steps like for processing jobs.
Describe alternatives you've considered
- upload data as part of the script you run in training: In a pipeline, there is no way to refer to that data and mark it as a dependency on the DAG. Hence, consecutive steps may fail.
- Using the
datadirectory: Seems compressed, which is not ideal if I want to access individual files. If the location the directory is written to is deterministic and dependent on the name of the training job, constructing the path there using thesagemaker.workflow.functions.Join()might solve the DAG problem.