Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple transformers #82

Closed
kevinwallimann opened this issue Jan 31, 2020 · 0 comments · Fixed by #144
Closed

Support for multiple transformers #82

kevinwallimann opened this issue Jan 31, 2020 · 0 comments · Fixed by #144
Assignees
Milestone

Comments

@kevinwallimann
Copy link
Collaborator

kevinwallimann commented Jan 31, 2020

Currently, only one transformer can be specified.

It's likely that there might be a use-case in the future which requires multiple chainable transformers.

The configuration parameter component.transformer could take a comma-separated list instead of just one value. The order of the list would specify the order of execution of the transformers. That wouldn't support the same transformer to be used multiple times.

Components should be configured like this:

component.transformer.id.1=[id]
component.transformer.class.[id]=za.co.absa.hyperdrive.ingestor.my.transformer
transformer.[id].property.a = "value a"
transformer.[id].property.b = "value b"

component.transformer.id.2=csst
component.transformer.class.csst=za.co.absa.hyperdrive.ingestor.implementation.transformer.column.selection.ColumnSelectorStreamTransformer
transformer.csst.columns.to.select=*

component.transformer.id.3=csst2
component.transformer.class.csst2=za.co.absa.hyperdrive.ingestor.implementation.transformer.column.selection.ColumnSelectorStreamTransformer
transformer.csst2.columns.to.select="special_column"

Why the prefixes component.transformer.class and transformer? This prevents name conflicts

At runtime, transformers would only receive their specific config subset, i.e. in the above example, my-transformer gets property.a => "value a", property.b => "value b" in the transform method instead of the full configuration like now. If cross-component configuration is necessary, the HyperdriveContext may be utilized.

The order of the transformers is determined by the number after component.transformer.id. An error is thrown if it's not an integer. The number may be negative. Order numbers need not be consecutive, i.e. no error is thrown if one transformer has component.transformer.id.2 and the other component.transformer.id.-1

Tasks

  • StreamTransformerAbstractFactory.build should return a list of transformers.
  • build should call the apply method of the companion object only with a configuration subset using the id (e.g. csst in the above example)
  • SparkIngestor.ingest should accept a list of transformers and loop through them (fold)
  • Update tests

Note

  • As a by-product, transformers will be optional. If no transformer is specified, the list of transformers will be empty, thus the dataframe will directly be passed from the decoder (reader from v4.0.0) to the writer.

How to migrate
ColumnSelectorStreamTransformer

  • ConfigurationsKeys.ColumnSelectorStreamTransformerKeys.KEY_COLUMNS_TO_SELECT should be "columns.to.select"
  • Existing configuration in the Trigger DB
  1. Replace
"component.transformer=za.co.absa.hyperdrive.ingestor.implementation.transformer.column.selection.ColumnSelectorStreamTransformer"

with

"component.transformer.id.1=column.selector", "component.transformer.class.column.selector=za.co.absa.hyperdrive.ingestor.implementation.transformer.column.selection.ColumnSelectorStreamTransformer"
  1. Replace
transformer.columns.to.select=

with

transformer.column.selector.columns.to.select=

Alternatively, the whole column selector transformation config can be removed if all of the jobs only use select all.

HyperConformance

  • In za.co.absa.enceladus.conformance.HyperConformanceAttributes, search and replace s"$keysPrefix. with "
  • Existing configuration in the Trigger DB
    Replace
"component.transformer=za.co.absa.enceladus.conformance.HyperConformance"

with

"component.transformer.id.1=hyperconformance","component.transformer.class.hyperconformance=za.co.absa.enceladus.conformance.HyperConformance"

Transformer specific configuration already happens to be correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant