Skip to content

copy over the script to enable pyspark as well#4

Closed
prabodh1194 wants to merge 1 commit intoapache:comet-upstreamfrom
prabodh1194:add_pyspark
Closed

copy over the script to enable pyspark as well#4
prabodh1194 wants to merge 1 commit intoapache:comet-upstreamfrom
prabodh1194:add_pyspark

Conversation

@prabodh1194
Copy link
Copy Markdown

let's run pyspark as well using the comet engine

@prabodh1194
Copy link
Copy Markdown
Author

opened accidentally.

@prabodh1194 prabodh1194 closed this Feb 9, 2024
parthchandra pushed a commit to parthchandra/datafusion-comet that referenced this pull request Dec 20, 2024
schenksj added a commit to schenksj/datafusion-comet that referenced this pull request Apr 12, 2026
Fixes all 14 previously-deferred review findings:

apache#4  Case-sensitivity in DV column detection: isDeltaDvFilterPattern
    and findAndStripDeltaScanBelow now use equalsIgnoreCase for the
    __delta_internal_is_row_deleted column name match.

apache#8  S3 key documentation: added comment in JNI documenting the
    Hadoop-style key names that storageOptions carries and how
    extract_storage_config maps them.

apache#10 Proto comment inaccuracy: updated reserved field number comments
    to describe purpose rather than referencing (now-stale) phase
    numbers. Added field numbering strategy note on DeltaScanCommon.

apache#11 Module quarantine docs: updated delta/mod.rs doc comment to note
    that create_object_store returns Arc<dyn ObjectStore> from
    object_store_kernel 0.12, and that it never escapes the module.
    Updated public API listing to match current exports.

apache#12 Optimizer rule double-init: added synchronized double-checked
    locking on the CometDeltaDvConfigRule to prevent concurrent
    threads from racing on the config set.

apache#14 Incomplete partition type support: castPartitionString now throws
    IllegalArgumentException for unsupported types (STRUCT, ARRAY,
    MAP, etc.) instead of silently converting to UTF8String.

apache#6  DV materialization clarity: added comment explaining why
    .unwrap_or_default() is safe (get_row_indexes returns Ok(None)
    only if has_vector() lied, which kernel guarantees doesn't happen;
    Err propagates via ?).

apache#17 Consistent JNI null handling: extracted read_string_array helper
    for reading Java String[] into Vec<String>, consolidating the
    null-check + iteration pattern.

apache#18-19 Proto field ordering: added numbering strategy comment to
    DeltaScanCommon and DeltaScanTask messages.

apache#20 Memory note: added comment about potential driver OOM on
    extremely large tables (millions of files) with suggestion for
    future streaming/chunked processing.

Tests: succeeded 35, failed 0, canceled 0, ignored 0, pending 0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant