Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36200: [CI][Docs] Avoid "No space left on device" #36230

Merged
merged 7 commits into from
Jun 29, 2023
Merged

Conversation

kou
Copy link
Member

@kou kou commented Jun 22, 2023

Rationale for this change

Our build requires many disk space.

What changes are included in this PR?

Remove unused files.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@kou
Copy link
Member Author

kou commented Jun 22, 2023

@github-actions crossbow submit preview-docs -g linux

@github-actions
Copy link

⚠️ GitHub issue #36200 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Jun 22, 2023
@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Jun 22, 2023

@github-actions crossbow submit preview-docs

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Jun 22, 2023

@github-actions crossbow submit preview-docs

@github-actions

This comment was marked as outdated.

@@ -72,6 +82,7 @@ jobs:
# 376MB
sudo rm -rf /opt/hostedtoolcache/node || :
df -h
df -h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's garbage.

runs-on: ubuntu-latest
{{ macros.github_set_env(env) }}
steps:
{{ macros.github_checkout_arrow(fetch_depth=fetch_depth|default(1))|indent }}
{{ macros.github_install_archery()|indent }}

- name: Free up disk space
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add some kind of macro for this step instead of repeating it in two different files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But the debug option approach is better.

@pitrou
Copy link
Member

pitrou commented Jun 22, 2023

@kou The C++ build directory takes more than 8GB in this build, which is insane (partly due to building bundled gRPC and google-cloud-cpp with static libraries).

This can be trimmed down significantly by reducing the size of debug information (which isn't very useful on CI anyway). If I do:

export ARROW_C_FLAGS_DEBUG=-g1
export ARROW_CXX_FLAGS_DEBUG=-g1

then the size of the build directory goes down from 8GB to 5GB...

We should probably do so on all gcc-based builds.

Copy link
Member Author

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! I didn't notice that the big size was caused by debug option. I'll use the approach.

@@ -72,6 +82,7 @@ jobs:
# 376MB
sudo rm -rf /opt/hostedtoolcache/node || :
df -h
df -h
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's garbage.

runs-on: ubuntu-latest
{{ macros.github_set_env(env) }}
steps:
{{ macros.github_checkout_arrow(fetch_depth=fetch_depth|default(1))|indent }}
{{ macros.github_install_archery()|indent }}

- name: Free up disk space
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But the debug option approach is better.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 22, 2023
@pitrou
Copy link
Member

pitrou commented Jun 22, 2023

Wow! I didn't notice that the big size was caused by debug option. I'll use the approach.

Can we find a way to do that on all debug CI builds (except if MSVC is used, probably)?

@pitrou
Copy link
Member

pitrou commented Jun 22, 2023

Also, it might make compilation caching more efficient (since the cached files may be smaller)...

@github-actions github-actions bot added Component: C++ awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 23, 2023
@kou
Copy link
Member Author

kou commented Jun 23, 2023

Can we find a way to do that on all debug CI builds (except if MSVC is used, probably)?

We can detect whether on GitHub Actions or not by GITHUB_ACTIONS=true. So we can use -g1 by default on GItHub Actions.

But... we can't use -g1 for GDB plugin tests...
https://github.com/apache/arrow/actions/runs/5353229013/jobs/9708905231?pr=36230#step:6:6529

_______________________________ test_arrays_heap _______________________________

gdb_arrow = <pyarrow.tests.test_gdb.GdbSession object at 0x7f460d9a1910>

    def test_arrays_heap(gdb_arrow):
        # Null
>       check_heap_repr(
            gdb_arrow, "heap_null_array",
            "arrow::NullArray of length 2, offset 0, null count 2")

opt/conda/envs/arrow/lib/python3.9/site-packages/pyarrow/tests/test_gdb.py:770: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

gdb = <pyarrow.tests.test_gdb.GdbSession object at 0x7f460d9a1910>
expr = 'heap_null_array'
expected = 'arrow::NullArray of length 2, offset 0, null count 2'

    def check_heap_repr(gdb, expr, expected):
        """
        Check printing a heap-located value, given its address.
        """
        s = gdb.print_value(f"*{expr}")
        # GDB may prefix the value with an address or type specification
        if s != expected:
>           assert s.endswith(f" {expected}")
E           AssertionError: assert False
E            +  where False = <built-in method endswith of str object at 0x55e0685ce330>(' arrow::NullArray of length 2, offset 0, null count 2')
E            +    where <built-in method endswith of str object at 0x55e0685ce330> = '(std::__shared_ptr_access<arrow::Array, (__gnu_cxx::_Lock_policy)2, false, false>::element_type &) @0x55d33eb43a70: {...ields>}, _M_ptr = 0x55d33eb54bc0, _M_refcount = {_M_pi = 0x55d33eb54bb0}}, <No data fields>}, null_bitmap_data_ = 0x0}'.endswith

opt/conda/envs/arrow/lib/python3.9/site-packages/pyarrow/tests/test_gdb.py:245: AssertionError
----------------------------- Captured stdout call -----------------------------
p *heap_null_array
$36 = (std::__shared_ptr_access<arrow::Array, (__gnu_cxx::_Lock_policy)2, false, false>::element_type &) @0x55d33eb43a70: {_vptr.Array = 0x7fa4501a6ff8 <vtable for arrow::NullArray+16>, data_ = {<std::__shared_ptr<arrow::ArrayData, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<arrow::ArrayData, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x55d33eb54bc0, _M_refcount = {_M_pi = 0x55d33eb54bb0}}, <No data fields>}, null_bitmap_data_ = 0x0}
(gdb) 
----------------------------- Captured stderr call -----------------------------
Python Exception <class 'gdb.error'>: There is no member named id_.

Hmm. We may need to use -g1 only for bundled dependencies...

@kou
Copy link
Member Author

kou commented Jun 23, 2023

Or we just don't use -g1 for Python jobs. Our GDB plugin's tests are only run in Python jobs.

@kou
Copy link
Member Author

kou commented Jun 25, 2023

@github-actions crossbow submit -g linux preview-docs

@github-actions
Copy link

Revision: 3c78e39

Submitted crossbow builds: ursacomputing/crossbow @ actions-1328ed330f

Task Status
almalinux-8-amd64 Github Actions
almalinux-8-arm64 Github Actions
almalinux-9-amd64 Github Actions
almalinux-9-arm64 Github Actions
amazon-linux-2-amd64 Github Actions
amazon-linux-2-arm64 Github Actions
amazon-linux-2023-amd64 Github Actions
amazon-linux-2023-arm64 Github Actions
centos-7-amd64 Github Actions
centos-8-stream-amd64 Github Actions
centos-8-stream-arm64 Github Actions
centos-9-stream-amd64 Github Actions
centos-9-stream-arm64 Github Actions
debian-bookworm-amd64 Github Actions
debian-bookworm-arm64 Github Actions
debian-bullseye-amd64 Github Actions
debian-bullseye-arm64 Github Actions
preview-docs Github Actions
ubuntu-focal-amd64 Github Actions
ubuntu-focal-arm64 Github Actions
ubuntu-jammy-amd64 Github Actions
ubuntu-jammy-arm64 Github Actions
ubuntu-lunar-amd64 Github Actions
ubuntu-lunar-arm64 Github Actions

@kou
Copy link
Member Author

kou commented Jun 26, 2023

The preview-docs job failure isn't "No space left on device".
It's a R document generation failure.
@thisisnic Could you check this failure?

https://github.com/ursacomputing/crossbow/actions/runs/5368074498/jobs/9738676612#step:6:10165

-- Building function reference -------------------------------------------------
Error: 
! in callr subprocess.
Caused by error in `map2(.x, vec_index(.x), .f, ...)`:
! In index: 4.
---
Standard output:
== Building pkgdown site =======================================================
Reading from: '/arrow/r'
Writing to:   '/arrow/r/docs'
-- Initialising site -----------------------------------------------------------
Copying '../../usr/local/lib/R/site-library/pkgdown/BS5/assets/link.svg' to 'link.svg'
Copying '../../usr/local/lib/R/site-library/pkgdown/BS5/assets/pkgdown.js' to 'pkgdown.js'
Copying 'pkgdown/extra.js' to 'extra.js'
Copying 'pkgdown/assets/versions.html' to 'versions.html'
Copying 'pkgdown/assets/versions.json' to 'versions.json'
Copying 'pkgdown/favicon/apple-touch-icon-120x120.png' to 'apple-touch-icon-120x120.png'
Copying 'pkgdown/favicon/apple-touch-icon-152x152.png' to 'apple-touch-icon-152x152.png'
Copying 'pkgdown/favicon/apple-touch-icon-180x180.png' to 'apple-touch-icon-180x180.png'
Copying 'pkgdown/favicon/apple-touch-icon-60x60.png' to 'apple-touch-icon-60x60.png'
Copying 'pkgdown/favicon/apple-touch-icon-76x76.png' to 'apple-touch-icon-76x76.png'
Copying 'pkgdown/favicon/apple-touch-icon.png' to 'apple-touch-icon.png'
Copying 'pkgdown/favicon/favicon-16x16.png' to 'favicon-16x16.png'
Copying 'pkgdown/favicon/favicon-32x32.png' to 'favicon-32x32.png'
Copying 'pkgdown/favicon/favicon.ico' to 'favicon.ico'
-- Building home ---------------------------------------------------------------
Writing 'authors.html'
Reading 'PACKAGING.md'
Writing 'PACKAGING.html'
Reading 'STYLE.md'
Writing 'STYLE.html'
Writing '404.html'
-- Building function reference -------------------------------------------------
---
Backtrace:
1. pkgdown::build_site(install = FALSE)
2. pkgdown:::build_site_external(pkg = pkg, examples = examples, run_dont_run = run_d...
3. callr::r(function(..., cli_colors, pkgdown_internet) { ...
4. callr:::get_result(output = out, options)
5. callr:::throw(callr_remote_error(remerr, output), parent = fix_msg(remerr[[3]]))
---
Subprocess backtrace:
 1. pkgdown::build_site(...)
 2. pkgdown:::build_site_local(pkg = pkg, examples = examples, run_dont_run = run_dont...
 3. pkgdown::build_reference(pkg, lazy = lazy, examples = examples, run_dont_run = ru...
 4. pkgdown::build_reference_index(pkg)
 5. pkgdown::render_page(pkg, "reference-index", data = data_reference_index(pkg), ...
 6. pkgdown:::render_page_html(pkg, name = name, data = data, depth = depth)
 7. utils::modifyList(data_template(pkg, depth = depth), data)
 8. base::stopifnot(is.list(x), is.list(val))
 9. pkgdown:::data_reference_index(pkg)
10. meta %>% purrr::imap(data_reference_index_rows, pkg = pkg) %>% ...
11. base::unlist(., recursive = FALSE)
12. purrr::compact(.)
13. purrr::discard(.x, function(x) is_empty(.f(x)))
14. purrr:::where_if(.x, .p, ...)
15. purrr:::map_(.x, .p, ..., .type = "logical", .purrr_error_call = .purrr_error_call)
16. purrr:::vctrs_vec_compat(.x, .purrr_user_env)
17. purrr::imap(., data_reference_index_rows, pkg = pkg)
18. purrr::map2(.x, vec_index(.x), .f, ...)
19. purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
20. purrr:::with_indexed_errors(i = i, names = names, error_call = .purrr_error_call...
21. base::withCallingHandlers(expr, error = function(cnd) { ...
22. purrr:::call_with_cleanup(map2_impl, environment(), .type, .progress, ...
23. local .f(.x[[i]], .y[[i]], ...)
24. pkgdown:::section_topics(section$contents, pkg$topics, pkg$src_path)
25. topics[select_topics(match_strings, topics), , ]
26. `[.tbl_df`(topics, select_topics(match_strings, topics), , )
27. pkgdown:::select_topics(match_strings, topics)
28. purrr::map(match_strings, match_eval, env = match_env(topics))
29. purrr:::map_("list", .x, .f, ..., .progress = .progress)
30. purrr:::with_indexed_errors(i = i, names = names, error_call = .purrr_error_call...
31. base::withCallingHandlers(expr, error = function(cnd) { ...
32. purrr:::call_with_cleanup(map_impl, environment(), .type, .progress, ...
33. local .f(.x[[i]], ...)
34. base::tryCatch(eval(expr, env), error = function(e) { ...
35. base::tryCatchList(expr, classes, parentenv, handlers)
36. base::tryCatchOne(expr, names, parentenv, handlers[[1L]])
37. value[[3L]](cond)
38. pkgdown:::topic_must("be a known selector function", string, parent = e)
39. rlang::abort(c(paste0("In '_pkgdown.yml', topic must ", message), x = paste0("N...
40. | rlang:::signal_abort(cnd, .file)
41. | base::signalCondition(cnd)
42. (function (cnd) ...
43. cli::cli_abort(message, location = i, name = name, parent = cnd, ...
44. | rlang::abort(message, ..., call = call, use_cli_format = TRUE, ...
45. | rlang:::signal_abort(cnd, .file)
46. | base::signalCondition(cnd)
47. (function (cnd) ...
48. cli::cli_abort(message, location = i, name = name, parent = cnd, ...
49. | rlang::abort(message, ..., call = call, use_cli_format = TRUE, ...
50. | rlang:::signal_abort(cnd, .file)
51. | base::signalCondition(cnd)
52. global (function (e) ...
Execution halted
1

@thisisnic
Copy link
Member

@kou Having issues trying to add fix commit to your branch; here's a PR: kou#13

@kou
Copy link
Member Author

kou commented Jun 26, 2023

Thanks!
(You can push to this branch directly. :-)

@kou
Copy link
Member Author

kou commented Jun 26, 2023

@github-actions crossbow submit preview-docs

@github-actions
Copy link

Revision: 60bd466

Submitted crossbow builds: ursacomputing/crossbow @ actions-8c79fd8f2e

Task Status
preview-docs Github Actions

@kou
Copy link
Member Author

kou commented Jun 26, 2023

@thisisnic Sorry. Could you also check this?

https://github.com/apache/arrow/actions/runs/5374927859/jobs/9750784797?pr=36230#step:4:9

Error! Scalar-class
schema-class missing from ./r/_pkgdown.yml

(You can push a fix to this branch directly.)

@thisisnic
Copy link
Member

thisisnic commented Jun 26, 2023

@kou The failing step is due to a technicality on how we check for missing sections in the doc. Since we implemented the check in 2021, the pkgdown package now already does this check and their method is better than the one I implemented for us to do in CI. I've opened #36300 to remove it, so once that's passed CI and merged, you'll need to rebase from that. [Edit: merged now]

@kou
Copy link
Member Author

kou commented Jun 26, 2023

Thanks! Rebased.

@kou
Copy link
Member Author

kou commented Jun 28, 2023

The "R / AMD64 Ubuntu 20.04 R 4.2 Force-Tests true" failure is caused by #36346. So I want to merge this.

If nobody objects it, I'll merge this tomorrow.

@kou kou merged commit 63b8091 into apache:main Jun 29, 2023
44 of 49 checks passed
@kou kou deleted the ci-docs-space branch June 29, 2023 03:53
@kou kou removed the awaiting change review Awaiting change review label Jun 29, 2023
@kou
Copy link
Member Author

kou commented Jun 29, 2023

Merged.

@conbench-apache-arrow
Copy link

Conbench analyzed the 5 benchmark runs on commit 63b8091d.

There were 7 benchmark results indicating a performance regression:

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI][Docs] Complete Documentation builds fail with No space left on device
3 participants