Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: address some integration test bloat of disk usage for development #1552

Merged
merged 4 commits into from
Jul 22, 2023

Conversation

rtyler
Copy link
Member

@rtyler rtyler commented Jul 21, 2023

Description

This change attempts to further reduce our build/link time by changing RUSTFLAGS for CI and consolidating some integration tests into singular files, while moving more unit-style tests (those which just rely on tests/data/) into unit tests.

The previous use of "debuginfo=1" would still result in significant file sizes. When building integration tests locally a setting of "1" leads to a total target/ directory size of 3.1GB, whereas "line-tables-only" results in a total clean build target/ directory size of 2.1GB.

The "line-tables-only" feature provides:

Generates the minimal amount of debug info for backtraces with filename/line
number info, but not anything else, i.e. no variable or function parameter info.

Setting this to "0" does reduce the size further, but for panics and stacktraces in CI, we need some level of information to be present in order for the backtraces to be useful.

Related Issue(s)

Closes #1550

Documentation

The line-tables-only value for debuginfo was introduced in Rust 1.70

@github-actions github-actions bot added binding/rust Issues for the Rust crate rust labels Jul 21, 2023
@github-actions
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@rtyler rtyler changed the title Issue target disk bloat 1550 chore: address some integration test bloat of disk usage for development Jul 21, 2023
Unfortunately `cargo +nightly udeps --tests --no-default-features --features
integration_test,datafusion,s3-native-tls` did not produce more unused
dependencies which means there is fewer cruft to chop out of these dependency
trees at the moment.

Related to delta-io#1550

Sponsored-by: Databricks In
Since each of these unit tests are quite large, removing just two of them would
result in a 1GB+ reduction in disk space utilization. This pull request removes
a few of the sets of tests which rely on tests/data/ but do not require any
non-default features.

Sponsored-by: Databricks Inc
The previous use of "1" would still result in significant file sizes. When
building integration tests locally a setting of "1" leads to a total target/
directory size of 3.1GB, whereas "line-tables-only" results in a total clean
build target/ directory size of 2.1GB.

The "line-tables-only" feature provides:

> Generates the minimal amount of debug info for backtraces with filename/line
> number info, but not anything else, i.e. no variable or function parameter info.

And was introduced in 1.70:
    <https://blog.rust-lang.org/2023/06/01/Rust-1.70.0.html#named-levels-of-debug-information>

Setting this to "0" does reduce the size further, but for panics and stacktraces
in CI, we need some level of information to be present in order for the
backtraces to be useful.

Sponsored-by: Databricks Inc
@rtyler rtyler force-pushed the issue-target-disk-bloat-1550 branch from 01df558 to 20da846 Compare July 21, 2023 06:34
@rtyler rtyler marked this pull request as ready for review July 21, 2023 06:57
@rtyler rtyler enabled auto-merge (rebase) July 21, 2023 07:07
@omkar-foss
Copy link
Contributor

omkar-foss commented Jul 21, 2023

Thanks for this PR @rtyler. As it has optimized not just on disk usage, but also significantly reduced RAM consumption and total test runtime. Peace for local development especially for newcomers to this project (like myself).

My test run comparisons as below for reference:
Test Run Instance: Ubuntu 22.04, 8GB RAM, 4 vCPUs, SSD

  • Current main (HEAD 1bb6c3d) with clean target, old flags:
    ~8GB RAM consumption (lots of swapping), ~15 mins runtime, ~19GB size of target after completion.
  • This PR (HEAD 20da846) with clean target, updated flags in this PR:
    ~6.5 RAM consumption (reduced swapping), ~8 mins runtime, ~6GB size of target after completion.

Below is the new test command that I've constructed from this PR's CI yaml, for reference:

RUSTFLAGS="-C debuginfo=line-tables-only" CARGO_INCREMENTAL=0 RUST_BACKTRACE=1 CARGO_NET_GIT_FETCH_WITH_CLI="true" cargo test -p deltalake --no-default-features --features integration_test,s3-native-tls,datafusion

@rtyler rtyler disabled auto-merge July 21, 2023 14:12
… project

Based on [this
comment](delta-io#1552 (comment))
I believe this will be a valuable setting for making local development cycles
faster and easier for contributors to run.
@rtyler rtyler enabled auto-merge (rebase) July 21, 2023 15:11
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this. LGTM!

@rtyler rtyler merged commit 60409b8 into delta-io:main Jul 22, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate rust
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Excessive integration test sizes causing builds to fail
3 participants