Skip to content

[SPARK-51634][SQL] Support TIME in off-heap column vectors#50428

Closed
MaxGekk wants to merge 1 commit intoapache:masterfrom
MaxGekk:time-parquet-tests
Closed

[SPARK-51634][SQL] Support TIME in off-heap column vectors#50428
MaxGekk wants to merge 1 commit intoapache:masterfrom
MaxGekk:time-parquet-tests

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented Mar 27, 2025

What changes were proposed in this pull request?

In the PR, I propose to modify OffHeapColumnVector.java to support the new data type TIME, and move a test from ParquetIOSuite to ParquetFileFormatSuite to test read/write TIME value in parquet for both V1 and V2 versions.

Why are the changes needed?

To fix the failure:

scala> spark.conf.set("spark.sql.columnVector.offheap.enabled", true)
scala> spark.read.parquet("/Users/maxim.gekk/tmp/test5").show()
org.apache.spark.SparkException: [FAILED_READ_FILE.NO_HINT] Encountered error while reading file file:///Users/maxim.gekk/tmp/test5/part-00000-855fbee0-1e15-460e-99ea-edf614e2415b-c000.snappy.parquet.  SQLSTATE: KD001

Does this PR introduce any user-facing change?

No.

How was this patch tested?

By running the modified test suite:

$ build/sbt "test:testOnly *ParquetFileFormatV1Suite"
$ build/sbt "test:testOnly *ParquetFileFormatV2Suite"

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Mar 27, 2025
}
}

test("Write and read back TIME values") {
Copy link
Copy Markdown
Member Author

@MaxGekk MaxGekk Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it here from ParquetIOSuite to test for both V1 and V2 Parquet datasource, and avoid the test duplication.

@MaxGekk MaxGekk changed the title [WIP][SQL] Support TIME in off-heap column vectors [WIP][SPARK-51634][SQL] Support TIME in off-heap column vectors Mar 27, 2025
@MaxGekk MaxGekk changed the title [WIP][SPARK-51634][SQL] Support TIME in off-heap column vectors [SPARK-51634][SQL] Support TIME in off-heap column vectors Mar 27, 2025
@MaxGekk MaxGekk marked this pull request as ready for review March 27, 2025 09:03
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Mar 27, 2025

Merging to master. Thank you, @sarutak @yaooqinn for review.

@MaxGekk MaxGekk closed this in 9fe78e3 Mar 27, 2025
Copy link
Copy Markdown
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants