[SPARK] Simplify shouldProcess check in Spark3 streaming source#3268
Conversation
|
There are a few corner cases that don't appear to be handled. For example if somebody runs a But given that the Spark streaming source is presently only able to handle a limited set of use cases, I don't think convoluting it for edge cases is the best idea. If anybody would like to discuss my findings, I'd be happy to do so as I spent some time looking into CDC in other systems and how it can be efficiently applied for the Spark stream (which doesn't have an in-built notion of CDC or deletes in general). There's a bit more of a break down on the other PR mentioned above. |
|
Merged. Thanks, @kbendick! |
Replaces a somewhat complicated set of calls to Preconditions.checkState in the Spark 3 MicroBatchStream
shouldProcessmethod with a simpler switch statement for readability.Also places the most common checks first (instead of doing a string comparison check twice against both
"DELETE"and"APPEND").This method was a bit confusing for me on first read and given that the APPEND check is the most common case, processing it first and reducing the number of string comparison checks will likely be more performant. It is more readable in my opinion.
I was going to allow for also skipping OVERWRITE if users set a flag, but there is already open discussion around this and some older PRs. So I closed my original PR in favor of just simplify the check and continuing the discussion elsewhere: #3267