Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Multi-yield transformers #396

Merged
merged 4 commits into from Jan 29, 2021

Conversation

Spacerat
Copy link
Contributor

@Spacerat Spacerat commented Oct 28, 2020

Summary of Changes

This makes it possible for transformers to yield multiple records. This allows transformers to enrich records by yielding additional data.

For example, if my company has a convention for documenting owners in table descriptions using @owner: name, a transformer in the table extraction pipeline could yield TableOwner (as well as the original record).

A small knock-on effect of this approach was that, since ChainedTransformer now always returns an iterator, I code was expecting a ChainedTransformer to return a value, I had to wrap it with next(..., None).

Tests

I added an integration test which exercises the task and chained transformer changes end-to-end. This is pretty much copied out of the test I wrote for this in my own codebase. I know it doesn't quite fit the existing convention, but, I do think having end-to-end tests like this can be useful anyway. Would be interesting to get thoughts.

Documentation

I updated the README.

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
  • PR passes make test

@Spacerat Spacerat changed the title [feat] Multi-yield transformers feat: Multi-yield transformers Oct 28, 2020
@Spacerat Spacerat force-pushed the multi-yield-transformers branch 2 times, most recently from b8eaba9 to cc3037d Compare October 28, 2020 23:35
Signed-off-by: Joseph Atkins-Turkish <jatkins-turkish@brex.com>
Signed-off-by: Joseph Atkins-Turkish <jatkins-turkish@brex.com>
@Spacerat Spacerat marked this pull request as ready for review October 29, 2020 00:28
@feng-tao feng-tao added the keep fresh Disables stalebot from closing an issue label Nov 5, 2020
@feng-tao
Copy link
Member

feng-tao commented Nov 7, 2020

will take a look this pr early next week.

Joseph Atkins-Turkish added 2 commits January 7, 2021 11:54
Signed-off-by: Joseph Atkins-Turkish <jatkins-turkish@brex.com>
@Spacerat
Copy link
Contributor Author

Spacerat commented Jan 7, 2021

@feng-tao I just fixed the conflicts and sorted the imports so this passes CI again. Anything else I can do to help get it in?

@feng-tao
Copy link
Member

sorry for the wait, will look at this week.

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall

@feng-tao feng-tao merged commit 49ae0ed into amundsen-io:master Jan 29, 2021
Wonong pushed a commit to Wonong/amundsendatabuilder that referenced this pull request Mar 4, 2021
* Implement multi-yield transformers

Signed-off-by: Joseph Atkins-Turkish <jatkins-turkish@brex.com>

* add license

Signed-off-by: Joseph Atkins-Turkish <jatkins-turkish@brex.com>

* isort

Signed-off-by: Joseph Atkins-Turkish <jatkins-turkish@brex.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep fresh Disables stalebot from closing an issue
Projects
None yet
2 participants