-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46395][DOCS] Assign Spark configs to groups for use in documentation #44756
Conversation
@beliefer - Assuming you prefer the approach here to the one in #44755, is there anything else I can do to move this proposal forward? Are you interested in shepherding this work? There will be many follow-up PRs to incrementally migrate all of our config documentation to use this framework. The result will be that our config documentation is more accurate, more usable, and easier to maintain. |
### What changes were proposed in this pull request? - Assign all config tables in the documentation to the new CSS class `spark-config`. - Migrate the config table in `docs/sql-ref-ansi-compliance.md` from Markdown to HTML and assign it to this new CSS class as well. - Limit the width of the config tables to the width of the main content, and force words to break and wrap if necessary. - Remove a styling workaround for the documentation for `spark.worker.resourcesFile` that is not needed anymore thanks to these changes. - Remove some `.global` CSS rules that, due to their [specificity][specificity] interfere with our ability to assign simple rules that apply directly to elements. [specificity]: https://developer.mozilla.org/en-US/docs/Web/CSS/Specificity ### Why are the changes needed? Many configs and config defaults have very long names that normally cannot wrap. This causes tables to overflow the viewport. An egregious example of this is `spark.scheduler.listenerbus.eventqueue.executorManagement.capacity`, which has a default of `spark.scheduler.listenerbus.eventqueue.capacity`. This change will force these long strings to break and wrap, which will keep the table widths limited to the width of the overall content. Because we are hard-coding the column widths, some tables will look slightly worse with this new layout due to extra whitespace. I couldn't figure out a practical way to prevent that while also solving the main problem of table overflow. In #44755 or #44756 (whichever approach gets accepted), these config tables will be generated automatically. This will give us the opportunity to improve the styling further by setting the column width dynamically based on the content. (This should be possible in CSS, but table styling in CSS is limited and we cannot use properties like `max-width`.) We will also be able to insert [word break opportunities][wbo] so that config names wrap in a more visually pleasing manner. [wbo]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/wbr ### Does this PR introduce _any_ user-facing change? Yes, it changes the presentation of tables, especially config tables, in the main documentation. ### How was this patch tested? I built the docs and compared them visually across `master` (left) and this branch (right). `sql-ref-ansi-compliance.html`: <img width="200px" src="https://github.com/apache/spark/assets/1039369/1ad54764-e942-491a-8060-a23cdc2b1d3f" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/5e623d71-1f8b-41ca-b0bc-7166e9a2de7e" /> `configuration.html#scheduling`: <img width="200px" src="https://github.com/apache/spark/assets/1039369/7a14c2d6-b9e6-4114-8902-96c80bebfc87" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/27dec813-f5a0-49de-a74d-f98b5c0f1606" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/76cfb54d-dad7-46e8-a29d-1f0d91962ce1" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/21457627-a7a4-4ede-8b8b-fcffabe1e24c" /> `configuration.html#barrier-execution-mode`: <img width="200px" src="https://github.com/apache/spark/assets/1039369/e9b4118b-85eb-4ee5-9195-7e69e94e1008" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/2f3a1f0e-3d67-4352-b884-94b41c0dd6ea" /> `spark-standalone.html`: <img width="200px" src="https://github.com/apache/spark/assets/1039369/c1923b0b-b57c-45c3-aff9-b8db1c0b39f6" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/6abb45e1-10e0-4a31-b1a3-91ef3a0478d1" /> `structured-streaming-kafka-integration.html#configuration`: <img width="200px" src="https://github.com/apache/spark/assets/1039369/8505cda1-46fc-4b61-be22-16362bbf00fc" /> <img width="200px" src="https://github.com/apache/spark/assets/1039369/a5e351af-161a-442b-95a9-2501ec7934c9" /> <!-- <img width="200px" src="" /> <img width="200px" src="" /> --> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44955 from nchammas/table-styling. Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Closing this in favor of #44755. |
What changes were proposed in this pull request?
Enable Spark configs to be assigned to documentation groups via a dedicated YAML file. These groups will be used by a new Python script to automatically build config tables for display in our documentation.
Instead of having to maintain large blocks of HTML tables throughout our documentation, config tables can simply be included as follows:
sql-tuning-caching-data
would be the name of the config group defined in the YAML file from which the above HTML table would be automatically generated for inclusion in our documentation.This approach covers both SQL and non-SQL config docs and, if accepted, will replace
sql/gen-sql-config-docs.py
.This proposal is an alternative to #44755 (and #44300) that does not require modifying
ConfigEntry
orConfigBuilder
to add a new field. Instead, the groups are defined completely outside of Spark's core.Why are the changes needed?
Using this approach we can accomplish several goals at once:
spark.sql.files.openCostInBytes
)spark.sql.autoBroadcastJoinThreshold
)Does this PR introduce any user-facing change?
No, not for users.
For contributors, the documentation build will change so that config tables are generated only when Spark is built. For fast documentation builds -- like what we currently have via
SKIP_API=1
-- Spark won't be built and placeholders will be inserted in the place of every config table. This will be addressed in a future PR.How was this patch tested?
I manually ran the new script to generate config tables and confirmed the following:
Was this patch authored or co-authored using generative AI tooling?
No.