copy over the script to enable pyspark as well#4
Closed
prabodh1194 wants to merge 1 commit intoapache:comet-upstreamfrom
Closed
copy over the script to enable pyspark as well#4prabodh1194 wants to merge 1 commit intoapache:comet-upstreamfrom
prabodh1194 wants to merge 1 commit intoapache:comet-upstreamfrom
Conversation
Author
|
opened accidentally. |
parthchandra
pushed a commit
to parthchandra/datafusion-comet
that referenced
this pull request
Dec 20, 2024
schenksj
added a commit
to schenksj/datafusion-comet
that referenced
this pull request
Apr 12, 2026
Fixes all 14 previously-deferred review findings: apache#4 Case-sensitivity in DV column detection: isDeltaDvFilterPattern and findAndStripDeltaScanBelow now use equalsIgnoreCase for the __delta_internal_is_row_deleted column name match. apache#8 S3 key documentation: added comment in JNI documenting the Hadoop-style key names that storageOptions carries and how extract_storage_config maps them. apache#10 Proto comment inaccuracy: updated reserved field number comments to describe purpose rather than referencing (now-stale) phase numbers. Added field numbering strategy note on DeltaScanCommon. apache#11 Module quarantine docs: updated delta/mod.rs doc comment to note that create_object_store returns Arc<dyn ObjectStore> from object_store_kernel 0.12, and that it never escapes the module. Updated public API listing to match current exports. apache#12 Optimizer rule double-init: added synchronized double-checked locking on the CometDeltaDvConfigRule to prevent concurrent threads from racing on the config set. apache#14 Incomplete partition type support: castPartitionString now throws IllegalArgumentException for unsupported types (STRUCT, ARRAY, MAP, etc.) instead of silently converting to UTF8String. apache#6 DV materialization clarity: added comment explaining why .unwrap_or_default() is safe (get_row_indexes returns Ok(None) only if has_vector() lied, which kernel guarantees doesn't happen; Err propagates via ?). apache#17 Consistent JNI null handling: extracted read_string_array helper for reading Java String[] into Vec<String>, consolidating the null-check + iteration pattern. apache#18-19 Proto field ordering: added numbering strategy comment to DeltaScanCommon and DeltaScanTask messages. apache#20 Memory note: added comment about potential driver OOM on extremely large tables (millions of files) with suggestion for future streaming/chunked processing. Tests: succeeded 35, failed 0, canceled 0, ignored 0, pending 0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
let's run pyspark as well using the comet engine