Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46935][DOCS] Consolidate error documentation #44971

Closed
wants to merge 22 commits into from

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented Feb 1, 2024

What changes were proposed in this pull request?

Consolidate all error documentation into a single page of error states and conditions. Each condition and sub-condition will have a link anchor that allows for direct references.

Here are some examples:

sql-error-conditions.html#cannot-update-field
sql-error-conditions.html#cannot-update-field-array-type
sql-error-conditions.html#cannot-update-field-interval-type
sql-error-conditions.html#cannot-update-field-map-type

The table is styled to make it easier to read:

  • Sub-conditions are indented relative to their parent condition.
  • Long condition names like UDTF_INVALID_ALIAS_IN_REQUESTED_ORDERING_STRING_FROM_ANALYZE_METHOD wrap in a visually pleasing manner, without interfering with how search engines will index those names or preventing users from copy-pasting the whole name as they would normally.

The new documentation is generated dynamically via docs/util/build-error-docs.py as part of the documentation build. No generated files will be tracked in git.

TODO:

  • Figure out what, if anything, we will do about the links we are breaking by deleting the old error pages.
  • Make sure we are using the correct terminology once SPARK-46810 is resolved.
  • Add opening prose for the main error conditions page.
  • Update the main Jekyll build to generate the new documentation on the fly.

Why are the changes needed?

The current error documentation has several problems that make it difficult to maintain and difficult to use:

  1. The current documentation consists of many Markdown files that are both programmatically generated and checked in to git. This combination leads to problems like the checked in files getting out of sync with the source they are generated from, and leads to unnecessary code churn and problems like the one that precipitated [SPARK-46807][DOCS] Add automation notice to generated SQL error class docs #44847.

  2. The individual pages like docs/sql-error-conditions-cannot-load-state-store-error-class.md are sparse and don't add much value as standalone pages. They also clutter the top-level namespace. There are around 60 of these individual pages, one for each error condition that has sub-conditions.

    The number of pages to manage leads to other, more subtle problems as well. Though there are 60 individual pages, only 22 of them are captured in docs/_data/menu-sql.yaml.

  3. The current process to generate this documentation is awkward. We are using a) a Spark Scala test that requires b) a special environment variable to be set so that c) we read some JSON, in order to d) generate Markdown files, which in turn e) get compiled into HTML.

    Fundamentally, we are just generating HTML from JSON. It seems inappropriate to couple this process to the Spark build.

Does this PR introduce any user-facing change?

Yes, it overhauls the error documentation. Unless we add redirects, several links that work against the Spark 3.5 documentation will no longer work against the Spark 4.0 documentation.

One developer-facing change included in this PR is that the main documentation build with Jekyll now depends on our Python development environment. This is in order to generate the error documentation, since that depends on our ability to render Markdown. Previously, only the SQL and Python API docs depended on the Python environment.

How was this patch tested?

I built the docs and reviewed the results in my browser.




Was this patch authored or co-authored using generative AI tooling?

No.

@nchammas
Copy link
Contributor Author

nchammas commented Feb 1, 2024

@MaxGekk @itholic @srielau - I consider this PR blocked by SPARK-46810, but I want to share it with you nonetheless to get your feedback.

@nchammas nchammas changed the title [SPARK-46935] Consolidate error documentation [SPARK-46935][DOCS] Consolidate error documentation Feb 1, 2024
@srielau
Copy link
Contributor

srielau commented Feb 1, 2024

@MaxGekk @itholic @srielau - I consider this PR blocked by SPARK-46810, but I want to share it with you nonetheless to get your feedback.
The problem with a table is that both the error conditions and the descriptions can be lengthy.
I do think we want to keep the option of "custom crafted" detail messages (although that does not conflict with what you propose).
Either way I agree that anything that makes it more automateable to maintain the page is a good thing.

@nchammas
Copy link
Contributor Author

nchammas commented Feb 1, 2024

The problem with a table is that both the error conditions and the descriptions can be lengthy.

The generated table renders very nicely for me. I spent some time on styling the table appropriately specifically to solve this problem.

Take a look at the screenshots in the PR description. They show how long error names and descriptions soft wrap nicely.

What are some examples of error conditions you think won't display well in the proposed documentation? I am happy to share additional screenshots to show you how they render.

@nchammas
Copy link
Contributor Author

nchammas commented Feb 5, 2024

@MaxGekk @itholic - What do you think of consolidating the error documentation as demonstrated here?

@srielau - Do you still think we need standalone pages for each error condition? Curious to know if the screenshots I posted changed your mind. If you're still concerned about lengthy error names or messages, please share some examples so I can check how they are rendered.

@srielau
Copy link
Contributor

srielau commented Feb 5, 2024

@MaxGekk @itholic - What do you think of consolidating the error documentation as demonstrated here?

@srielau - Do you still think we need standalone pages for each error condition? Curious to know if the screenshots I posted changed your mind. If you're still concerned about lengthy error names or messages, please share some examples so I can check how they are rendered.

I'm fien with that proposal. I'd like to keep the option open for hand crafted long-form error docs. Any thoughts?

@nchammas
Copy link
Contributor Author

nchammas commented Feb 6, 2024

I'd like to keep the option open for hand crafted long-form error docs. Any thoughts?

I'm fine with that. If we find in the future that some error condition merits a dedicated page, then what I suggest we do is the following:

  • Create a hand-crafted error page with the extended prose we want.

  • Add a link from the error condition's message in the JSON document to that standalone page.

    We should express that link using Markdown. build-error-docs.py compiles any Markdown in the message into clean HTML.

The result will be a nice HTML link from the description of the error condition in the master table to whatever error page we need.

@nchammas
Copy link
Contributor Author

The discussion on SPARK-46810 is not moving forward, unfortunately.

I'd like to get this in as it is a non-trivial maintenance win. If SPARK-46810 gets going again, we can always rename things on the consolidated error page as necessary, and it will be much easier to do so.

@srielau @MaxGekk - If I resolve the merge conflicts, can we move this PR forward?

@srielau
Copy link
Contributor

srielau commented Mar 12, 2024

The discussion on SPARK-46810 is not moving forward, unfortunately.

I'd like to get this in as it is a non-trivial maintenance win. If SPARK-46810 gets going again, we can always rename things on the consolidated error page as necessary, and it will be much easier to do so.

@srielau @MaxGekk - If I resolve the merge conflicts, can we move this PR forward?

+1

@nchammas
Copy link
Contributor Author

The discussion on SPARK-46810 is not moving forward, unfortunately.

This discussion has since been resolved in favor of Option 1, described on the ticket.

I will update this PR to resolve conflicts. I still need, however, a committer to give the go-ahead for this restructuring of our error documentation.

@cloud-fan
Copy link
Contributor

let's fix conflicts and move forward, thanks for the work!

return table_html


if __name__ == "__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when do we run this script? Now the generated doc pages are not committed in the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this is the main goal of this PR: to remove auto-generated content from git, plus all the associated friction it causes.

One of my TODOs for this PR is to invoke this script from the Jekyll build. In other words, no-one will need to remember to run anything separately. They just run bundle exec jekyll build as they normally do.

@nchammas nchammas marked this pull request as ready for review April 17, 2024 15:23
@nchammas
Copy link
Contributor Author

This is ready for review. The only two things I'll call out specifically for additional attention are this open TODO:

Figure out what, if anything, we will do about the links we are breaking by deleting the old error pages.

And this minor change to our workflow for building docs:

One developer-facing change included in this PR is that the main documentation build with Jekyll now depends on our Python development environment. This is in order to generate the error documentation, since that depends on our ability to render Markdown. Previously, only the SQL and Python API docs depended on the Python environment.

@cloud-fan
Copy link
Contributor

And this minor change to our workflow for building docs:

I think this is fine. During release process, we need to generate all docs so Python development environment is already needed.

cc @HyukjinKwon as well.

@github-actions github-actions bot added the INFRA label Apr 18, 2024
@github-actions github-actions bot removed the INFRA label Apr 18, 2024
@github-actions github-actions bot added the INFRA label Apr 18, 2024
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 76fb196 Apr 19, 2024
@nchammas nchammas deleted the SPARK-46935-revamp-error-docs branch April 19, 2024 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants