Skip to content

Conversation

@vinodkc
Copy link
Contributor

@vinodkc vinodkc commented Nov 23, 2025

What changes were proposed in this pull request?

This PR adds CSV serialization and deserialization support for Spark's TIME type

Why are the changes needed?

TIME type currently lacks CSV support, preventing users from:

  • Reading/writing CSV files with TIME columns
  • Using from_csv() and to_csv() functions with TIME type
  • Integrating TIME data with external CSV-based systems

Does this PR introduce any user-facing change?

Yes.
Users can now:

  • Read CSV with TIME: spark.read.schema("time TIME(6)").csv("data.csv")
  • Write CSV with TIME: df.write.csv("output.csv")14:30:45.123456
  • Use from_csv/to_csv:
    SELECT from_csv('14:30:45.123456', 'time TIME(6)');
    SELECT to_csv(named_struct('time', TIME'14:30:45'));
  • Custom format: spark.read.option("timeFormat", "HH-mm-ss.SSSSSS").csv("data.csv")
  • New option: timeFormat - controls TIME formatting/parsing (default: HH:mm:ss with fractional seconds)

How was this patch tested?

Added new test cases in CsvExpressionsSuite, CsvFunctionsSuite, SQL tests (csv-functions.sql), and CsvSuite

Was this patch authored or co-authored using generative AI tooling?

Yes.
Generated-by: Claude 3.5 Sonnet

AI assistance was used for:

  • Code pattern analysis and design discussions
  • Implementation guidance following Spark conventions
  • Test case generation and organization
  • Documentation and examples

@github-actions github-actions bot added the SQL label Nov 23, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @vinodkc .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master for Apache Spark 4.2.0.

huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…for TIME type

### What changes were proposed in this pull request?
This PR adds CSV serialization and deserialization support for Spark's TIME type

### Why are the changes needed?

TIME type currently lacks CSV support, preventing users from:

- Reading/writing CSV files with TIME columns
- Using `from_csv()` and `to_csv()` functions with TIME type
- Integrating TIME data with external CSV-based systems

### Does this PR introduce _any_ user-facing change?

Yes.
Users can now:
- Read CSV with TIME: `spark.read.schema("time TIME(6)").csv("data.csv")`
- Write CSV with TIME: `df.write.csv("output.csv")` → `14:30:45.123456`
- Use from_csv/to_csv:
  ```sql
  SELECT from_csv('14:30:45.123456', 'time TIME(6)');
  SELECT to_csv(named_struct('time', TIME'14:30:45'));
  ```
- Custom format: `spark.read.option("timeFormat", "HH-mm-ss.SSSSSS").csv("data.csv")`
- New option: `timeFormat` - controls TIME formatting/parsing (default: `HH:mm:ss` with fractional seconds)
### How was this patch tested?

Added new test cases in `CsvExpressionsSuite`, `CsvFunctionsSuite`, SQL tests (`csv-functions.sql`), and `CsvSuite`

### Was this patch authored or co-authored using generative AI tooling?

Yes.
Generated-by: Claude 3.5 Sonnet

AI assistance was used for:
- Code pattern analysis and design discussions
- Implementation guidance following Spark conventions
- Test case generation and organization
- Documentation and examples

Closes apache#53175 from vinodkc/br_time_csv_read_write.

Authored-by: vinodkc <vinod.kc.in@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants