Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Jan 15, 2026

What changes were proposed in this pull request?

This PR enhances JavaUtils.byteStringAs to support parsing the input string that has suffixes Ki, KiB, Mi, MiB, and so on, which allows users to use, for example, 2GiB, as the value of byte type configurations.

Why are the changes needed?

Strictly speaking, 1KB = 1000B and 1KiB = 1024B, while currently, Spark only accepts 1K or 1KB and interprets it as 1KiB.

I'm not intending to "correct" it, but I think it should at least accept 1Ki or 1KiB as input, which usually gets complain by users who are familiar with K8s, as suffix Mi, GiB are widely used in the K8s ecosystem.

Does this PR introduce any user-facing change?

Yes, users are allowed to use 1Ki, 2MiB, etc. as the value of byte type configurations.

How was this patch tested?

UTs are added.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions
Copy link

JIRA Issue Information

=== Improvement SPARK-55051 ===
Summary: Byte string accepts KiB, MiB, GiB, TiB, PiB
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

@github-actions github-actions bot added the CORE label Jan 15, 2026
@pan3793
Copy link
Member Author

pan3793 commented Jan 16, 2026

cc @dongjoon-hyun @LuciferYang

@LuciferYang
Copy link
Contributor

Merged into master. Thanks @pan3793 and @peter-toth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants