explodeAnnotations logic added for Finisher #645

vkocaman · 2019-09-25T11:47:56Z

adding explodeAnnotations param to finisher so that it can be used not just at the end but whenever it's required across the pipelines.

Description

explodedCol is added as a param and if it is not specified, it will be empty as a default and ignored as it's already not in the flled names.
So, any string that's not inside fieldnames will be ignored and will not trigger anything.
This will also override the previous steps and the only fields returned will be the exploded struct fields.

Motivation and Context

sometimes we need to export the processed output to csv or somewhere.. so it would make Finisher an intermediary solution that can be used across the pipeline.

How Has This Been Tested?

local test applied but I didn't update the test scripts in repo. We may also need to specify if the type of explodedCol is Array. Otherwise it would fail.

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change which fixes an issue)
Code improvements with no or little impact
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING page.
I have added tests to cover my changes.
All new and existing tests passed.

maziyarpanahi · 2019-09-26T04:12:29Z

Could you please add unit tests in both Python and Scala covering few scenarios? What if user sets output as an array and explodeCol? What if user has multiple inputCols and one column inside explodeCol?
In any case, No param should make the others obsolete or redundant. Please make sure this param doesn’t do that otherwise this should be outside annotators and up to the user.

vkocaman · 2019-09-26T09:33:24Z

@maziyarpanahi actually, adding this param will make the others obsolete or redundant as each array type outputCol would have different number of items inside and we cannot explode more than col at a time (we can, but we shouldn't let).. so, Finisher wouldn't be a perfect fit for this feature.. lets either have another annotator for exploding certain cols or instead emphasise this issue in articles so that users could be aware of..

maziyarpanahi · 2019-09-26T09:43:14Z

@maziyarpanahi actually, adding this param will make the others obsolete or redundant as each array type outputCol would have different number of items inside and we cannot explode more than col at a time (we can, but we shouldn't let).. so, Finisher wouldn't be a perfect fit for this feature.. lets either have another annotator for exploding certain cols or instead emphasise this issue in articles so that users could be aware of..

That's a perfect assessment. Let's first add the explode function in Spark SQL into our documentation for those who may not be aware of it. We can make an example in the Finisher section that if anyone wishes to export a CSV out of a column they can do so by simply using select(explode(col...)) after Finisher. Then a quick article with end to end.

As an example, I couldn't make sentence embeddings work in WordEmbeddings and BertEmbeddings without being anti-pattern so I created a new annotator which is clean and has lots of opportunities. However, now I like to output the results not just in array but in Vectors format so it can be accepted by Spark ML, and Finisher is not a good fit for that. Let's talk more about this and maybe we can introduce two more annotators one to finish for the outside of Apache Spark (your use case) and one to finish for Spark ML where it needs Vectors for features (my use case).

vkocaman · 2019-09-26T09:53:11Z

Sounds great.. let me add a use case in documentation using the explode function in Spark SQL.. and looks like we'll need a more capable ColumnTransformer annotator soon.. with params like inputCols, transformType, etc..

maziyarpanahi · 2019-09-26T10:40:03Z

Very nice, please make a pull request for the documentation and I’ll review the styling super fast. 👍

vkocaman · 2019-09-26T10:54:48Z

ok I just did.. please review

vkocaman added 2 commits September 25, 2019 13:38

explodeAnnotations for Finisher

d92e3ca

explodeAnnotation logic added for Finisher

0a2e34a

vkocaman requested a review from saif-ellafi September 25, 2019 21:56

maziyarpanahi self-requested a review September 26, 2019 09:35

maziyarpanahi closed this Sep 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

explodeAnnotations logic added for Finisher #645

explodeAnnotations logic added for Finisher #645

Uh oh!

vkocaman commented Sep 25, 2019 •

edited

Loading

Uh oh!

maziyarpanahi commented Sep 26, 2019

Uh oh!

vkocaman commented Sep 26, 2019

Uh oh!

maziyarpanahi commented Sep 26, 2019

Uh oh!

vkocaman commented Sep 26, 2019

Uh oh!

maziyarpanahi commented Sep 26, 2019

Uh oh!

vkocaman commented Sep 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

explodeAnnotations logic added for Finisher #645

explodeAnnotations logic added for Finisher #645

Uh oh!

Conversation

vkocaman commented Sep 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Uh oh!

maziyarpanahi commented Sep 26, 2019

Uh oh!

vkocaman commented Sep 26, 2019

Uh oh!

maziyarpanahi commented Sep 26, 2019

Uh oh!

vkocaman commented Sep 26, 2019

Uh oh!

maziyarpanahi commented Sep 26, 2019

Uh oh!

vkocaman commented Sep 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vkocaman commented Sep 25, 2019 •

edited

Loading