Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate build and publish of the user guide #5500

Closed
2 tasks
Tracked by #3058
alamb opened this issue Mar 7, 2023 · 7 comments · Fixed by #5670
Closed
2 tasks
Tracked by #3058

Automate build and publish of the user guide #5500

alamb opened this issue Mar 7, 2023 · 7 comments · Fixed by #5670
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed

Comments

@alamb
Copy link
Contributor

alamb commented Mar 7, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The main datafusion documentation site at https://arrow.apache.org/datafusion is good because:

  1. They look good and follow the best practice for open source project documentation
  2. They are associated clearly with the overall arrow.apache.org

The documentation source is at https://github.com/apache/arrow-datafusion/tree/main/docs.

https://arrow.apache.org/datafusion is updated (typically by @andygrove) when new releases of arrow-datafusion are published to crates.io (for example, apache/arrow-site#313)

However, the current setup has a few notable issues:

  1. It is behind what is in the repo as it is only updated every release (every 2 weeks at the time of writing)
  2. The content of the landing page in the repo (README): https://github.com/apache/arrow-datafusion has diverged from the user guide (as I believe developers want the latest content and the only way to see that without building the docs locally is README.md)
  3. The manual update process is somewhat cumbersome

I think if the user guide was more immediately updated, people would be more likely to contribute to it as well.

Describe the solution you'd like
I would like some mechanism to see the latest, rendered version of the user guide as a webpage.

  • On every commit to main, the site would be updated with the latest version of the user guide
  • The main README.md page in arrow-datafusion would redirect to the hosted site

Bonus points for

Describe alternatives you've considered
Perhaps we could make a github pages site https://pages.github.com/ ?

Additional context

@alamb alamb added enhancement New feature or request documentation Improvements or additions to documentation help wanted Extra attention is needed labels Mar 7, 2023
@alamb
Copy link
Contributor Author

alamb commented Mar 8, 2023

Leaving a note to myself (and maybe others) from @martin-g to check out how the apache-datafusion-python module works

Maybe we can make it like the apache-datafusion-python module

@martin-g
Copy link
Member

Just to make sure I understand correctly: The DataFusion team prefers commits to main to update the main site immediately, right ?

@alamb
Copy link
Contributor Author

alamb commented Mar 17, 2023

Just to make sure I understand correctly: The DataFusion team prefers commits to main to update the main site immediately, right ?

This is my preference. Any thoughts @andygrove @Dandandan or @thinkharderdev ?

Eventually, perhaps when we have longer term stable versions of datafusion, hosting snapshots of the docs for older releases might be useful. But I think the most important first thing is to get the most up to date docs published first

martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 17, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Member

The docs build fails for me with:

./build.sh
Running Sphinx v6.1.3
making output directory... done
[autosummary] generating autosummary for: contributor-guide/communication.md, contributor-guide/index.md, contributor-guide/quarterly_roadmap.md, contributor-guide/roadmap.md, contributor-guide/specification/index.rst, contributor-guide/specification/invariants.md, contributor-guide/specification/output-field-name-semantic.md, index.rst, user-guide/cli.md, user-guide/configs.md, ..., user-guide/sql/aggregate_functions.md, user-guide/sql/data_types.md, user-guide/sql/ddl.md, user-guide/sql/explain.md, user-guide/sql/index.rst, user-guide/sql/information_schema.md, user-guide/sql/scalar_functions.md, user-guide/sql/select.md, user-guide/sql/sql_status.md, user-guide/sql/subqueries.md
myst v1.0.0: MdParserConfig(commonmark_only=False, gfm_only=False, enable_extensions={'tasklist'}, disable_syntax=[], all_links_external=False, url_schemes=('http', 'https', 'mailto', 'ftp'), ref_domains=None, fence_as_directive=set(), number_code_blocks=[], title_to_header=False, heading_anchors=3, heading_slug_func=None, html_meta={}, footnote_transition=True, words_per_minute=200, substitutions={}, linkify_fuzzy_links=True, dmath_allow_labels=True, dmath_allow_space=True, dmath_allow_digits=True, dmath_double_inline=False, update_mathjax=True, mathjax_classes='tex2jax_process|mathjax_process|math|output_area', enable_checkboxes=False, suppress_warnings=[], highlight_code_blocks=True)
building [mo]: targets for 0 po files that are out of date
writing output... 
building [html]: targets for 26 source files that are out of date
updating environment: [new config] 26 added, 0 changed, 0 removed
reading sources... [100%] user-guide/sql/subqueries                                                                                                                                                                 
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [  3%] contributor-guide/communication                                                                                                                                                            
Theme error:
An error happened in rendering the page contributor-guide/communication.
Reason: UndefinedError("'logo' is undefined")
make: *** [Makefile:38: html] Error 2

https://github.com/martin-g/arrow-datafusion/actions/runs/4448158011/jobs/7810591628?pr=1

Do I need to do something more than https://github.com/martin-g/arrow-datafusion/pull/1/files#diff-d54d69dbb27e75dae25cb4b2384310cb57707e419377cf572d5cb0ecc1f16877R31-R43 ?

@martin-g
Copy link
Member

I've removed temporarily the usage of logo to be able to build: martin-g@a3a3107

martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 17, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@thinkharderdev
Copy link
Contributor

Just to make sure I understand correctly: The DataFusion team prefers commits to main to update the main site immediately, right ?

This is my preference. Any thoughts @andygrove @Dandandan or @thinkharderdev ?

Eventually, perhaps when we have longer term stable versions of datafusion, hosting snapshots of the docs for older releases might be useful. But I think the most important first thing is to get the most up to date docs published first

Agreed

martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 22, 2023
…support

Suggested-by @kou at apache#5670 (comment)

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
alamb pushed a commit that referenced this issue Mar 22, 2023
* Fixes #5500 - Add a Github Actions workflow that builds the docs

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* Change target branch to "main"

Co-authored-by: Sutou Kouhei <kou@cozmixng.org>

* Use rsync to copy the new content

Co-authored-by: Sutou Kouhei <kou@cozmixng.org>

* Issue #5500 - Add a new line at the bottom of .asf.yaml

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* Issue #5500 - Add .nojekyll to explicitly disable Github Pages support

Suggested-by @kou at #5670 (comment)

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

---------

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@alamb
Copy link
Contributor Author

alamb commented Mar 22, 2023

I am happy to report I have tested this and confirmed that the process is working as expected!

Here is an example PR #5684

And the content has appeared at https://arrow.apache.org/datafusion/user-guide/introduction.html 🎉

Screenshot 2023-03-22 at 7 08 23 AM

kou pushed a commit to apache/arrow-site that referenced this issue Mar 22, 2023
…thon` (#337)

Per @kou 's suggestion
#336 (comment)

We are now serving datafusion content from the datafusion-repo -- see
apache/datafusion#5500
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
3 participants