Skip to content

metadata storage available to all pipeline steps (read+write) #3169

@jonathanhillwebsite

Description

@jonathanhillwebsite

Describe the feature you'd like
Each step in the Pipeline will produce some sort of metadata, and this metadata may be needed in a later step. It would be a lot easier to pass a JSON file to each sequential step in the pipeline to store metadata. E.g., Number of samples [per class] in the processed training data (which would later be required in calculating number_of_steps in a tensorflow model) or can impact the decisions made in future steps (e.g. a conditional statement that says if eval_accuracy > n && samples_per_class > s).

How would this feature be used? Please describe.
Much like how data is passed through AWS Step Functions, each sequential step can have access to metadata from past steps.

Describe alternatives you've considered
I've created a file called metadata.json that I append data to in the source code of each step and then push these changes to S3, however this could potentially cause issues using eventually-consistent s3 updates.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions