[SPARK-49798][DOCS] Fix inaccurate documentation of RuntimeConfig.get#56154
[SPARK-49798][DOCS] Fix inaccurate documentation of RuntimeConfig.get#56154brijrajk wants to merge 1 commit into
Conversation
5bce46c to
255eed3
Compare
|
Could a committer please review this documentation fix? It clarifies the Scaladoc and PySpark docstring for |
|
cc @cloud-fan |
cloud-fan
left a comment
There was a problem hiding this comment.
0 blocking, 1 non-blocking, 0 nits.
Docs-only PR clarifying RuntimeConfig.get/getOption Scaladoc + the PySpark docstring. I verified each reworded behavioral claim against SQLConf.getConfString and confirmed the doctest values (partitionOverwriteMode default STATIC, DYNAMIC a valid override) — all accurate. One non-blocking nit on a self-contradicting example header.
Correctness (1)
- python/pyspark/sql/conf.py:95: example header says "raises an exception" but the example supplies a
defaultand returns it — see inline
|
|
||
| Examples | ||
| -------- | ||
| A key with no built-in default raises an exception when not explicitly set: |
There was a problem hiding this comment.
This header contradicts the example beneath it: spark.conf.get("non-existent-key", "my_default") supplies a default and returns 'my_default', so nothing is raised — the example shows the default-provided case, not the exception case. Since the whole point of this PR is to remove misleading doc prose, worth rewording so it doesn't introduce a fresh one:
| A key with no built-in default raises an exception when not explicitly set: | |
| A key with no built-in default returns the provided ``default`` when not explicitly set: |
There was a problem hiding this comment.
Good catch, fixed! The example was showing the default fallback path, not the exception path — the header now correctly reads "A key with no built-in default returns the provided default when not explicitly set".
The existing Scaladoc and PySpark docstring for RuntimeConfig.get, get(key, default), and getOption do not accurately describe the interaction between the user-supplied default and the key's built-in (ConfigEntry) default value. Changes: - get(key): clarify that if the key is not explicitly set, its built-in default is returned (not just "default if possible") - get(key, default): clarify that the user-supplied default is returned instead of the key's built-in default when the key is not explicitly set - getOption(key): clarify that Some(builtInDefault) is returned when the key is not explicitly set but has a built-in default - PySpark RuntimeConfig.get: remove the misleading "assuming it is set" phrase; document both calling forms and add a proper Raises section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
255eed3 to
c395299
Compare
|
thanks, merging to master/4.x/4.2 (doc only) |
### What changes were proposed in this pull request? Clarify the Scaladoc and PySpark docstring for three public `RuntimeConfig` methods whose descriptions did not accurately explain the interaction between an explicitly-set value and the key's built-in (`ConfigEntry`) default value. **`get(key: String)` / `get(key)`** - Before: *"If the key is not set yet, return its default value if possible"* - After: *"If the key is not explicitly set, return its built-in default value if one exists"* **`get(key: String, default: String)` / `get(key, default)`** - Before: *"If the key is not set yet, return the user given `default`"* — silent about overriding the ConfigEntry default - After: explicitly states the user-supplied `default` is returned *instead of* the key's built-in default value **`getOption(key: String)`** - Before: *"return its default value if possible, otherwise `None`"* — vague - After: *"return `Some` of its built-in default value if one exists, otherwise `None`"* **PySpark `RuntimeConfig.get()`** - Removed the misleading phrase *"assuming it is set"* - Rewrote the description to explain both calling forms separately - Added a `Raises` section documenting `SparkNoSuchElementException` - Fixed the `Returns` section to follow NumPy docstring format - Added four illustrative doctest examples using `spark.sql.sources.partitionOverwriteMode` that demonstrate the built-in-default behaviour --- ### Why are the changes needed? The documentation was misleading in a subtle but important way. Users reading `get(key)` would think it throws whenever the key is *unset*, but in fact it returns the ConfigEntry's built-in default silently. Similarly, users reading `get(key, default)` had no way to know the supplied default overrides the built-in default — not just fills a gap when no default exists at all. A concrete example (from SPARK-49798): ```python spark.conf.unset("spark.sql.session.timeZone") # Old doc implied this would throw — it does NOT; returns built-in default "Etc/UTC" spark.conf.get("spark.sql.session.timeZone") # => "Etc/UTC" # Old doc did not mention this ignores the built-in default "Etc/UTC" spark.conf.get("spark.sql.session.timeZone", "Europe/Berlin") # => "Europe/Berlin" ``` --- ### Does this PR introduce _any_ user-facing change? No. --- ### How was this patch tested? No new tests added. The described behaviour is already fully covered by existing tests: - **Scala:** `RuntimeConfigSuite` — *"set and get a config with defaultValue"* tests all three cases (`get(key)` returning built-in default, `getOption(key)` returning `Some(builtInDefault)`, and `get(key, userDefault)` overriding the built-in default) - **Python:** `pyspark/sql/tests/test_conf.py` — `test_conf` (lines 38–45) asserts the same behaviour using `spark.sql.sources.partitionOverwriteMode` Four new doctest examples were added to `pyspark/sql/conf.py`. They are executable via `python python/pyspark/sql/conf.py` and assert the same cases the unit tests cover. --- ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Sonnet 4.6 Closes #56154 from brijrajk/SPARK-49798-fix-runtimeconfig-docs. Authored-by: BRIJ RAJ KISHORE <22271048+brijrajk@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit d1ef991) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Clarify the Scaladoc and PySpark docstring for three public `RuntimeConfig` methods whose descriptions did not accurately explain the interaction between an explicitly-set value and the key's built-in (`ConfigEntry`) default value. **`get(key: String)` / `get(key)`** - Before: *"If the key is not set yet, return its default value if possible"* - After: *"If the key is not explicitly set, return its built-in default value if one exists"* **`get(key: String, default: String)` / `get(key, default)`** - Before: *"If the key is not set yet, return the user given `default`"* — silent about overriding the ConfigEntry default - After: explicitly states the user-supplied `default` is returned *instead of* the key's built-in default value **`getOption(key: String)`** - Before: *"return its default value if possible, otherwise `None`"* — vague - After: *"return `Some` of its built-in default value if one exists, otherwise `None`"* **PySpark `RuntimeConfig.get()`** - Removed the misleading phrase *"assuming it is set"* - Rewrote the description to explain both calling forms separately - Added a `Raises` section documenting `SparkNoSuchElementException` - Fixed the `Returns` section to follow NumPy docstring format - Added four illustrative doctest examples using `spark.sql.sources.partitionOverwriteMode` that demonstrate the built-in-default behaviour --- ### Why are the changes needed? The documentation was misleading in a subtle but important way. Users reading `get(key)` would think it throws whenever the key is *unset*, but in fact it returns the ConfigEntry's built-in default silently. Similarly, users reading `get(key, default)` had no way to know the supplied default overrides the built-in default — not just fills a gap when no default exists at all. A concrete example (from SPARK-49798): ```python spark.conf.unset("spark.sql.session.timeZone") # Old doc implied this would throw — it does NOT; returns built-in default "Etc/UTC" spark.conf.get("spark.sql.session.timeZone") # => "Etc/UTC" # Old doc did not mention this ignores the built-in default "Etc/UTC" spark.conf.get("spark.sql.session.timeZone", "Europe/Berlin") # => "Europe/Berlin" ``` --- ### Does this PR introduce _any_ user-facing change? No. --- ### How was this patch tested? No new tests added. The described behaviour is already fully covered by existing tests: - **Scala:** `RuntimeConfigSuite` — *"set and get a config with defaultValue"* tests all three cases (`get(key)` returning built-in default, `getOption(key)` returning `Some(builtInDefault)`, and `get(key, userDefault)` overriding the built-in default) - **Python:** `pyspark/sql/tests/test_conf.py` — `test_conf` (lines 38–45) asserts the same behaviour using `spark.sql.sources.partitionOverwriteMode` Four new doctest examples were added to `pyspark/sql/conf.py`. They are executable via `python python/pyspark/sql/conf.py` and assert the same cases the unit tests cover. --- ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Sonnet 4.6 Closes #56154 from brijrajk/SPARK-49798-fix-runtimeconfig-docs. Authored-by: BRIJ RAJ KISHORE <22271048+brijrajk@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit d1ef991) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Clarify the Scaladoc and PySpark docstring for three public
RuntimeConfigmethods whose descriptions did not accurately explain the interaction between an explicitly-set value and the key's built-in (ConfigEntry) default value.get(key: String)/get(key)get(key: String, default: String)/get(key, default)default" — silent about overriding the ConfigEntry defaultdefaultis returned instead of the key's built-in default valuegetOption(key: String)None" — vagueSomeof its built-in default value if one exists, otherwiseNone"PySpark
RuntimeConfig.get()Raisessection documentingSparkNoSuchElementExceptionReturnssection to follow NumPy docstring formatspark.sql.sources.partitionOverwriteModethat demonstrate the built-in-default behaviourWhy are the changes needed?
The documentation was misleading in a subtle but important way. Users reading
get(key)would think it throws whenever the key is unset, but in fact it returns the ConfigEntry's built-in default silently. Similarly, users readingget(key, default)had no way to know the supplied default overrides the built-in default — not just fills a gap when no default exists at all.A concrete example (from SPARK-49798):
Does this PR introduce any user-facing change?
No.
How was this patch tested?
No new tests added. The described behaviour is already fully covered by existing tests:
RuntimeConfigSuite— "set and get a config with defaultValue" tests all three cases (get(key)returning built-in default,getOption(key)returningSome(builtInDefault), andget(key, userDefault)overriding the built-in default)pyspark/sql/tests/test_conf.py—test_conf(lines 38–45) asserts the same behaviour usingspark.sql.sources.partitionOverwriteModeFour new doctest examples were added to
pyspark/sql/conf.py. They are executable viapython python/pyspark/sql/conf.pyand assert the same cases the unit tests cover.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.6