Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation mode for mapper #59

Merged
merged 2 commits into from
Jun 13, 2024

Conversation

Breaka84
Copy link
Owner

@Breaka84 Breaka84 commented Jun 5, 2024

Adds new mode for the Mapper transformer: rename_and_validate:

All built-in, custom transformations (except renaming) and casts are disabled. The Mapper only renames the columns and validates that the output data type is the same as the input data type. The transformation will fail if any spooq / custom transformations (except as_is) are defined!

@Breaka84 Breaka84 added the test-it Triggers github test action label Jun 5, 2024
@Breaka84 Breaka84 requested a review from rt-phb June 5, 2024 06:13
]
# fmt: on

def test_matching_mapping_with_casting(self, input_df, matching_mapping):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data types match so this should not trigger any casting, right? maybe test_matching_mapping_without_validation. Some use _with_validation suffix and some not. As all except two tests apply validation would remove the suffix as the class name already indicates that it is about validation.

spooq/transformer/mapper.py Show resolved Hide resolved
column_to_nullify = "col_d"
mapping_ = deepcopy(matching_mapping)
mapping_.append((column_to_nullify, column_to_nullify, T.StringType()))
mapped_df = Mapper(mapping_, mode="rename_and_validate", missing_column_handling="nullify").transform(input_df)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you think about making the mode an enum, like:

Suggested change
mapped_df = Mapper(mapping_, mode="rename_and_validate", missing_column_handling="nullify").transform(input_df)
mapped_df = Mapper(mapping_, mode=MapperMode.RENAME_AND_VALIDATE, missing_column_handling="nullify").transform(input_df)

same as above, no need to implement this now

@rt-phb rt-phb merged commit c64b104 into master Jun 13, 2024
@rt-phb rt-phb deleted the implement-data-type-validation-in-spooq-mapper branch June 13, 2024 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test-it Triggers github test action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants