Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input_mappings and output_mappings attributes #367

Merged
merged 10 commits into from
Mar 1, 2024

Conversation

gabrielmbmb
Copy link
Member

@gabrielmbmb gabrielmbmb commented Feb 29, 2024

Description

This PR adds two new arguments:

  • input_mappings argument to all the kind of steps allowing to map columns/keys from the data used as input for the step to the name of the inputs required by it.

    class GenerateResponse(Step):
        @property
        def inputs(self) -> List[str]:
            return ["instruction"]
    
        @property
        def outputs(self) -> List[str]:
            return ["response"]
    
        def process(self, inputs: StepInput) -> StepOutput:
            for input in inputs:
                input["response"] = "-------> " + input["instruction"]
            yield inputs
    
    step = GenerateResponse(name="generate_response", input_mappings={"instruction": "prompt})
    # The `process_applying_mappings` will replace all the keys `prompt` by `instruction`
    step.process_applying_mappings([{"prompt": "Is `distilabel` awesome?"}])
  • output_mappings argument for the Step and GlobalStep allowing to rename the outputs of the step:

    class GenerateResponse(Step):
        @property
        def inputs(self) -> List[str]:
            return ["instruction"]
    
        @property
        def outputs(self) -> List[str]:
            return ["response"]
    
        def process(self, inputs: StepInput) -> StepOutput:
            for input in inputs:
                input["response"] = "-------> " + input["instruction"]
            yield inputs
    
    step = GenerateResponse(name="generate_response", output_mappings={"response": "generation})
    # The `process_applying_mappings` will replace all the output keys `response` by `generation`
    step.process_applying_mappings([{"instruction": "Is `distilabel` awesome?"}])

@gabrielmbmb gabrielmbmb added the enhancement New feature or request label Feb 29, 2024
@gabrielmbmb gabrielmbmb added this to the 1.0.0 milestone Feb 29, 2024
@gabrielmbmb gabrielmbmb self-assigned this Feb 29, 2024
@gabrielmbmb gabrielmbmb changed the base branch from main to core-refactor February 29, 2024 17:05
@alvarobartt alvarobartt changed the title Add inputs_mapping and outputs_mapping attributes to Step Add inputs_mapping and outputs_mapping attributes to _Step Mar 1, 2024
Copy link
Member

@alvarobartt alvarobartt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi here! LGTM overall, just to take into account that the DAG validation will need to use the mappings if there, so maybe a get_inputs function needs to be implemented. Besides that I would rename outputs_mappings to output_mappings to be consistent with input_mappings and to not add the plural to both words which IMO is confusing. Additionally I would also like to see what would the user workflow be for a step that has a predefined set of inputs already.

src/distilabel/pipeline/step/base.py Outdated Show resolved Hide resolved
@alvarobartt
Copy link
Member

alvarobartt commented Mar 1, 2024

Additionally I would also like to see what would the user workflow be for a step that has a predefined set of inputs already.

Would this be an user workflow to define the correct {input,output}_mappings?

from ... import CustomTask

print(CustomTask.inputs)
# ["question", "options"]
print(CustomTask.outputs)
# ["answer"]

input_mappings = {"question": "Q", "options": "O"}
output_mappings = {"answer": "A"}

i.e. to be able to explore the defaults before applying the mappings? Also the mappings should be disabled when there are no inputs / outputs right? Are we also handling that?

@gabrielmbmb gabrielmbmb changed the title Add inputs_mapping and outputs_mapping attributes to _Step Add input_mappings and output_mappings attributes to _Step Mar 1, 2024
@gabrielmbmb gabrielmbmb changed the title Add input_mappings and output_mappings attributes to _Step Add input_mappings and output_mappings attributes Mar 1, 2024
@gabrielmbmb gabrielmbmb marked this pull request as ready for review March 1, 2024 16:32
src/distilabel/pipeline/step/base.py Outdated Show resolved Hide resolved
@gabrielmbmb gabrielmbmb merged commit bd31f69 into core-refactor Mar 1, 2024
4 checks passed
@gabrielmbmb gabrielmbmb deleted the input_output_step_mapping branch March 1, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants