Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write mode "Ignore" is incorrectly treated as "Overwrite" #52

Closed
wajda opened this issue Oct 22, 2018 · 2 comments
Closed

Write mode "Ignore" is incorrectly treated as "Overwrite" #52

wajda opened this issue Oct 22, 2018 · 2 comments
Assignees
Labels
Milestone

Comments

@wajda
Copy link
Contributor

wajda commented Oct 22, 2018

Shouldn't we skip a write event completely if the target exists and the mode is "Ignore"?

@wajda wajda added this to the 0.4.next milestone Dec 6, 2018
@wajda wajda added the bug label Dec 6, 2018
@wajda wajda modified the milestones: 0.4.next, 0.3.next Dec 6, 2018
@wajda wajda self-assigned this Dec 19, 2018
@wajda
Copy link
Contributor Author

wajda commented Dec 21, 2018

Since Spline listener is triggered only after the result has been written to the destination, we cannot really check if the destination existed before writing on not.
Luckily along with the executed physical plan Spark 2.3+ also provides associated metrics. including numOutputRows one. What can we do is to look at the number of rows produced by all the terminal nodes, and if equals to zero then it could only mean the plan wasn't actually executed, so we can skip the lineage as well.

@wajda
Copy link
Contributor Author

wajda commented Jan 2, 2019

Correcting myself. We are interested in numFiles on a write node only. Zero changed files means that result was ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

1 participant