Skip to content

[SPARK-43847][PYTHON] Throw structured error when reading Protobuf descriptor file fails#56174

Open
brijrajk wants to merge 1 commit into
apache:masterfrom
brijrajk:SPARK-43847-protobuf-structured-error
Open

[SPARK-43847][PYTHON] Throw structured error when reading Protobuf descriptor file fails#56174
brijrajk wants to merge 1 commit into
apache:masterfrom
brijrajk:SPARK-43847-protobuf-structured-error

Conversation

@brijrajk
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

_read_descriptor_set_file() in pyspark.sql.protobuf.functions now raises
PySparkRuntimeError with error class PROTOBUF_DESCRIPTOR_FILE_NOT_FOUND
instead of propagating a raw Python OSError / FileNotFoundError when the
descriptor file cannot be read.

The error class and its message already existed on the Scala side
(common/utils/src/main/resources/error/error-conditions.json); this PR adds
the matching entry to python/pyspark/errors/error-conditions.json and wires
it up in the Python code.

Why are the changes needed?

Before this fix, users who pass a wrong or missing descriptor file path get a
bare Python OS error with no Spark context:

FileNotFoundError: [Errno 2] No such file or directory: '/path/to/file.desc'

After this fix they get a structured Spark error consistent with the rest of
the PySpark error framework:

PySparkRuntimeError: [PROTOBUF_DESCRIPTOR_FILE_NOT_FOUND]
Error reading Protobuf descriptor file at path: /path/to/file.desc.

Does this PR introduce any user-facing change?

Yes. Users who catch OSError / FileNotFoundError around from_protobuf /
to_protobuf calls that supply a descFilePath will now need to catch
PySparkRuntimeError instead (or the common base PySparkException). The
error message is clearer and includes the file path.

How was this patch tested?

Added python/pyspark/sql/tests/test_protobuf.py with
test_read_descriptor_set_file_not_found, which asserts that a missing path
raises PySparkRuntimeError with the correct error class and filePath
message parameter.

Run locally with:

PYTHONPATH=python python3 -m unittest pyspark.sql.tests.test_protobuf -v

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

…scriptor file fails

Raise `PySparkRuntimeError` with error class `PROTOBUF_DESCRIPTOR_FILE_NOT_FOUND`
in `_read_descriptor_set_file()` instead of propagating a raw Python `OSError`,
matching the structured error already defined on the Scala side.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@brijrajk brijrajk force-pushed the SPARK-43847-protobuf-structured-error branch from f64f232 to be7a373 Compare May 28, 2026 05:05
@brijrajk
Copy link
Copy Markdown
Contributor Author

Could a committer please review this? It raises a structured PySparkRuntimeError with error class PROTOBUF_DESCRIPTOR_FILE_NOT_FOUND instead of a bare FileNotFoundError when the descriptor file path is invalid — consistent with the rest of the PySpark error framework (SPARK-43847).

cc @HyukjinKwon @LuciferYang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant