Skip to content

[SPARK-49798][DOCS] Fix inaccurate documentation of RuntimeConfig.get#56154

Closed
brijrajk wants to merge 1 commit into
apache:masterfrom
brijrajk:SPARK-49798-fix-runtimeconfig-docs
Closed

[SPARK-49798][DOCS] Fix inaccurate documentation of RuntimeConfig.get#56154
brijrajk wants to merge 1 commit into
apache:masterfrom
brijrajk:SPARK-49798-fix-runtimeconfig-docs

Conversation

@brijrajk
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Clarify the Scaladoc and PySpark docstring for three public RuntimeConfig methods whose descriptions did not accurately explain the interaction between an explicitly-set value and the key's built-in (ConfigEntry) default value.

get(key: String) / get(key)

  • Before: "If the key is not set yet, return its default value if possible"
  • After: "If the key is not explicitly set, return its built-in default value if one exists"

get(key: String, default: String) / get(key, default)

  • Before: "If the key is not set yet, return the user given default" — silent about overriding the ConfigEntry default
  • After: explicitly states the user-supplied default is returned instead of the key's built-in default value

getOption(key: String)

  • Before: "return its default value if possible, otherwise None" — vague
  • After: "return Some of its built-in default value if one exists, otherwise None"

PySpark RuntimeConfig.get()

  • Removed the misleading phrase "assuming it is set"
  • Rewrote the description to explain both calling forms separately
  • Added a Raises section documenting SparkNoSuchElementException
  • Fixed the Returns section to follow NumPy docstring format
  • Added four illustrative doctest examples using spark.sql.sources.partitionOverwriteMode that demonstrate the built-in-default behaviour

Why are the changes needed?

The documentation was misleading in a subtle but important way. Users reading get(key) would think it throws whenever the key is unset, but in fact it returns the ConfigEntry's built-in default silently. Similarly, users reading get(key, default) had no way to know the supplied default overrides the built-in default — not just fills a gap when no default exists at all.

A concrete example (from SPARK-49798):

spark.conf.unset("spark.sql.session.timeZone")

# Old doc implied this would throw — it does NOT; returns built-in default "Etc/UTC"
spark.conf.get("spark.sql.session.timeZone")
# => "Etc/UTC"

# Old doc did not mention this ignores the built-in default "Etc/UTC"
spark.conf.get("spark.sql.session.timeZone", "Europe/Berlin")
# => "Europe/Berlin"

Does this PR introduce any user-facing change?

No.


How was this patch tested?

No new tests added. The described behaviour is already fully covered by existing tests:

  • Scala: RuntimeConfigSuite"set and get a config with defaultValue" tests all three cases (get(key) returning built-in default, getOption(key) returning Some(builtInDefault), and get(key, userDefault) overriding the built-in default)
  • Python: pyspark/sql/tests/test_conf.pytest_conf (lines 38–45) asserts the same behaviour using spark.sql.sources.partitionOverwriteMode

Four new doctest examples were added to pyspark/sql/conf.py. They are executable via python python/pyspark/sql/conf.py and assert the same cases the unit tests cover.


Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

@brijrajk brijrajk force-pushed the SPARK-49798-fix-runtimeconfig-docs branch 3 times, most recently from 5bce46c to 255eed3 Compare May 29, 2026 03:45
@brijrajk
Copy link
Copy Markdown
Contributor Author

Could a committer please review this documentation fix? It clarifies the Scaladoc and PySpark docstring for RuntimeConfig.get / getOption to accurately describe the interaction between explicitly-set values and built-in ConfigEntry defaults — a subtle distinction that was previously misleading (SPARK-49798).

cc @HyukjinKwon @zhengruifeng

@zhengruifeng
Copy link
Copy Markdown
Contributor

cc @cloud-fan

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 1 non-blocking, 0 nits.
Docs-only PR clarifying RuntimeConfig.get/getOption Scaladoc + the PySpark docstring. I verified each reworded behavioral claim against SQLConf.getConfString and confirmed the doctest values (partitionOverwriteMode default STATIC, DYNAMIC a valid override) — all accurate. One non-blocking nit on a self-contradicting example header.

Correctness (1)

  • python/pyspark/sql/conf.py:95: example header says "raises an exception" but the example supplies a default and returns it — see inline

Comment thread python/pyspark/sql/conf.py Outdated

Examples
--------
A key with no built-in default raises an exception when not explicitly set:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header contradicts the example beneath it: spark.conf.get("non-existent-key", "my_default") supplies a default and returns 'my_default', so nothing is raised — the example shows the default-provided case, not the exception case. Since the whole point of this PR is to remove misleading doc prose, worth rewording so it doesn't introduce a fresh one:

Suggested change
A key with no built-in default raises an exception when not explicitly set:
A key with no built-in default returns the provided ``default`` when not explicitly set:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed! The example was showing the default fallback path, not the exception path — the header now correctly reads "A key with no built-in default returns the provided default when not explicitly set".

The existing Scaladoc and PySpark docstring for RuntimeConfig.get,
get(key, default), and getOption do not accurately describe the
interaction between the user-supplied default and the key's built-in
(ConfigEntry) default value.

Changes:
- get(key): clarify that if the key is not explicitly set, its
  built-in default is returned (not just "default if possible")
- get(key, default): clarify that the user-supplied default is returned
  instead of the key's built-in default when the key is not explicitly set
- getOption(key): clarify that Some(builtInDefault) is returned when the
  key is not explicitly set but has a built-in default
- PySpark RuntimeConfig.get: remove the misleading "assuming it is set"
  phrase; document both calling forms and add a proper Raises section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@brijrajk brijrajk force-pushed the SPARK-49798-fix-runtimeconfig-docs branch from 255eed3 to c395299 Compare June 4, 2026 05:11
@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master/4.x/4.2 (doc only)

@cloud-fan cloud-fan closed this in d1ef991 Jun 4, 2026
cloud-fan pushed a commit that referenced this pull request Jun 4, 2026
### What changes were proposed in this pull request?

Clarify the Scaladoc and PySpark docstring for three public `RuntimeConfig` methods whose descriptions did not accurately explain the interaction between an explicitly-set value and the key's built-in (`ConfigEntry`) default value.

**`get(key: String)` / `get(key)`**
- Before: *"If the key is not set yet, return its default value if possible"*
- After: *"If the key is not explicitly set, return its built-in default value if one exists"*

**`get(key: String, default: String)` / `get(key, default)`**
- Before: *"If the key is not set yet, return the user given `default`"* — silent about overriding the ConfigEntry default
- After: explicitly states the user-supplied `default` is returned *instead of* the key's built-in default value

**`getOption(key: String)`**
- Before: *"return its default value if possible, otherwise `None`"* — vague
- After: *"return `Some` of its built-in default value if one exists, otherwise `None`"*

**PySpark `RuntimeConfig.get()`**
- Removed the misleading phrase *"assuming it is set"*
- Rewrote the description to explain both calling forms separately
- Added a `Raises` section documenting `SparkNoSuchElementException`
- Fixed the `Returns` section to follow NumPy docstring format
- Added four illustrative doctest examples using `spark.sql.sources.partitionOverwriteMode` that demonstrate the built-in-default behaviour

---

### Why are the changes needed?

The documentation was misleading in a subtle but important way. Users reading `get(key)` would think it throws whenever the key is *unset*, but in fact it returns the ConfigEntry's built-in default silently. Similarly, users reading `get(key, default)` had no way to know the supplied default overrides the built-in default — not just fills a gap when no default exists at all.

A concrete example (from SPARK-49798):

```python
spark.conf.unset("spark.sql.session.timeZone")

# Old doc implied this would throw — it does NOT; returns built-in default "Etc/UTC"
spark.conf.get("spark.sql.session.timeZone")
# => "Etc/UTC"

# Old doc did not mention this ignores the built-in default "Etc/UTC"
spark.conf.get("spark.sql.session.timeZone", "Europe/Berlin")
# => "Europe/Berlin"
```

---

### Does this PR introduce _any_ user-facing change?

No.

---

### How was this patch tested?

No new tests added. The described behaviour is already fully covered by existing tests:

- **Scala:** `RuntimeConfigSuite` — *"set and get a config with defaultValue"* tests all three cases (`get(key)` returning built-in default, `getOption(key)` returning `Some(builtInDefault)`, and `get(key, userDefault)` overriding the built-in default)
- **Python:** `pyspark/sql/tests/test_conf.py` — `test_conf` (lines 38–45) asserts the same behaviour using `spark.sql.sources.partitionOverwriteMode`

Four new doctest examples were added to `pyspark/sql/conf.py`. They are executable via `python python/pyspark/sql/conf.py` and assert the same cases the unit tests cover.

---

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

Closes #56154 from brijrajk/SPARK-49798-fix-runtimeconfig-docs.

Authored-by: BRIJ RAJ KISHORE <22271048+brijrajk@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d1ef991)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Jun 4, 2026
### What changes were proposed in this pull request?

Clarify the Scaladoc and PySpark docstring for three public `RuntimeConfig` methods whose descriptions did not accurately explain the interaction between an explicitly-set value and the key's built-in (`ConfigEntry`) default value.

**`get(key: String)` / `get(key)`**
- Before: *"If the key is not set yet, return its default value if possible"*
- After: *"If the key is not explicitly set, return its built-in default value if one exists"*

**`get(key: String, default: String)` / `get(key, default)`**
- Before: *"If the key is not set yet, return the user given `default`"* — silent about overriding the ConfigEntry default
- After: explicitly states the user-supplied `default` is returned *instead of* the key's built-in default value

**`getOption(key: String)`**
- Before: *"return its default value if possible, otherwise `None`"* — vague
- After: *"return `Some` of its built-in default value if one exists, otherwise `None`"*

**PySpark `RuntimeConfig.get()`**
- Removed the misleading phrase *"assuming it is set"*
- Rewrote the description to explain both calling forms separately
- Added a `Raises` section documenting `SparkNoSuchElementException`
- Fixed the `Returns` section to follow NumPy docstring format
- Added four illustrative doctest examples using `spark.sql.sources.partitionOverwriteMode` that demonstrate the built-in-default behaviour

---

### Why are the changes needed?

The documentation was misleading in a subtle but important way. Users reading `get(key)` would think it throws whenever the key is *unset*, but in fact it returns the ConfigEntry's built-in default silently. Similarly, users reading `get(key, default)` had no way to know the supplied default overrides the built-in default — not just fills a gap when no default exists at all.

A concrete example (from SPARK-49798):

```python
spark.conf.unset("spark.sql.session.timeZone")

# Old doc implied this would throw — it does NOT; returns built-in default "Etc/UTC"
spark.conf.get("spark.sql.session.timeZone")
# => "Etc/UTC"

# Old doc did not mention this ignores the built-in default "Etc/UTC"
spark.conf.get("spark.sql.session.timeZone", "Europe/Berlin")
# => "Europe/Berlin"
```

---

### Does this PR introduce _any_ user-facing change?

No.

---

### How was this patch tested?

No new tests added. The described behaviour is already fully covered by existing tests:

- **Scala:** `RuntimeConfigSuite` — *"set and get a config with defaultValue"* tests all three cases (`get(key)` returning built-in default, `getOption(key)` returning `Some(builtInDefault)`, and `get(key, userDefault)` overriding the built-in default)
- **Python:** `pyspark/sql/tests/test_conf.py` — `test_conf` (lines 38–45) asserts the same behaviour using `spark.sql.sources.partitionOverwriteMode`

Four new doctest examples were added to `pyspark/sql/conf.py`. They are executable via `python python/pyspark/sql/conf.py` and assert the same cases the unit tests cover.

---

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

Closes #56154 from brijrajk/SPARK-49798-fix-runtimeconfig-docs.

Authored-by: BRIJ RAJ KISHORE <22271048+brijrajk@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d1ef991)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants