Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[SPARK-22217] [SQL] ParquetFileFormat to support arbitrary OutputCommitters #19448
What changes were proposed in this pull request?
This enables output committers which don't write to the filesystem the way
Before a committer which isn't a subclass of
(It could downgrade, of course, but raising an exception makes it clear there won't be an summary. It also makes the behaviour testable.)
How was this patch tested?
The patch includes a test suite,
All tests are happy.
LGTM too, just few tiny nits while double checking.
pushed a commit
this pull request
Oct 13, 2017
How come fixing the behaviour as documented is not a bug fix? I think that basically mean we don't backport fixes for things not working as documented for other internal configurations.
This does not extend the functionailities. This fixes functionalities to work as documented and expected, and I call it a bugfix.
Thanks for reviewing this/getting it in. Personally, I had it in the "improvement" category rather than bug fix. If it wasn't for that line in the docs, there'd be no ambiguity about improve/vs fix, and there is always a lower-risk way to fix doc/code mismatch: change the docs.
But I'm grateful for it being in; with the backport to branch-2 ryan should be able to use it semi-immediately
@HyukjinKwon branch-2.2 is in a maintenance branch, I am not sure it is appropriate to merge this change to branch-2.2 since it is not really a bug fix. If the doc is not accurate, we should fix the doc. For a maintenance branch, we need to be very careful on what we merge and we should always avoid of unnecessary changes.
I have a lot of sympathy for the argument that infrastructure software shouldn't have too many backports and that those should be generally bug fixes. But, if I were working on a Spark distribution at a vendor, this is something I would definitely include because it's such a useful feature. I think that by not backporting this, we're just pushing that work downstream. Plus, the risk to adding this is low: the main behavior change is that users can specify a previously-banned committer for Parquet writes. Is it a bug fix? Probably not. But it fixes a big blocker.
I am not really worried about this particular change. It's already merged and it seems a small and safe change. I am not planning to revert it.
But, in general, let's avoid of merging changes that are not bug fixes to a maintenance branch. If there is an exception, it will be better to make it clear earlier.