Skip to content

Attempt to run parallel tests (with del)#14817

Closed
potiuk wants to merge 3 commits into
apache:masterfrom
potiuk:attempt_to_run_parallel_tests_del
Closed

Attempt to run parallel tests (with del)#14817
potiuk wants to merge 3 commits into
apache:masterfrom
potiuk:attempt_to_run_parallel_tests_del

Conversation

@potiuk
Copy link
Copy Markdown
Member

@potiuk potiuk commented Mar 16, 2021


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

The Parallel tests from apache#14531 created a good opportunity to
reproduce some of the race conditions that cause some of the
scheduler job test to be flaky.

This change is an attempt to fix three of the flaky tests
there by removing side effects between tests. The previous
implementation did not take into account that scheduler job
processes might still be running when the test finishes and
the tests could have unintended side effects - especially
when they were run on a busy machine.

This PR adds mechanism that stops all running
schedulerJob processes in tearDown before cleaning
the database.

Fixes: apache#14778
Fixes: apache#14773
Fixes: apache#14772
Fixes: apache#14771
Fixes: apache#11571
Fixes: apache#12861
Fixes: apache#11676
Fixes: apache#11454
Fixes: apache#11442
Fixes: apache#11441
@potiuk potiuk requested review from XD-DENG, ashb and kaxil as code owners March 16, 2021 01:17
@boring-cyborg boring-cyborg Bot added area:dev-tools area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues labels Mar 16, 2021
@potiuk potiuk force-pushed the attempt_to_run_parallel_tests_del branch from 86c9462 to 921803b Compare March 16, 2021 01:22
@potiuk potiuk added the full tests needed We need to run full set of tests for this PR to merge label Mar 16, 2021
@potiuk potiuk force-pushed the attempt_to_run_parallel_tests_del branch from 921803b to 6f3ca26 Compare March 16, 2021 01:29
@potiuk potiuk force-pushed the attempt_to_run_parallel_tests_del branch from 6f3ca26 to 6acc553 Compare March 16, 2021 02:32
@potiuk potiuk requested a review from turbaszek as a code owner March 16, 2021 02:32
@potiuk potiuk force-pushed the attempt_to_run_parallel_tests_del branch 2 times, most recently from a7400a7 to 62bdc72 Compare March 16, 2021 03:14
@github-actions
Copy link
Copy Markdown
Contributor

The Workflow run is cancelling this PR. Building image for the PR has been cancelled

This is by far the biggest improvements of the test execution time
we can get now when we are using self-hosted runners.

This change drives down the time of executing all tests on
self-hosted runners from ~ 50 minutes to ~ 13 minutes due to heavy
parallelisation we can implement for different test types and the
fact that our machines for self-hosted runners are far more
capable - they have more CPU, more memory and the fact that
we are using tmpfs for everything.

This change will also drive the cost of our self-hosted runners
down. Since we have auto-scaling infrastructure we will simply need
the machines to run tests for far shorter time. Since the number
of test jobs we run on those self hosted runners is substantial
(10 jobs), we are going to save ~ 6 build hours per one PR/merged
commit!

This also allows the developers to use the power of their
development machines - when you use
`./scripts/ci/testing/ci_run_airflow_testing.sh` the script
detects how many CPU cores are available and it will run as
many parallel test types as many cores you have.

Also in case of Integration tests - they require more memory to run
all the integrations, so in case there is less than ~ 32 GB of RAM
available to Docker, the integration tests are run sequentially
at the end. This drives stability up for machines with lower memory.

On one personal PC (64GB RAM, 8 CPUS/16 cores, fast SSD) the full
test suite execution went down from 30 minutes to 5 minutes.

There is a continuous progress information printed every 10 seconds when
either parallel or sequential tests are run, and the full output is
shown at the end - failed tests are marked in red groups, and succesful are
marked in green groups. This makes it easier to see and analyse errors.
@potiuk potiuk force-pushed the attempt_to_run_parallel_tests_del branch from 62bdc72 to 7418aea Compare March 16, 2021 09:31
@potiuk potiuk closed this Mar 16, 2021
@potiuk potiuk deleted the attempt_to_run_parallel_tests_del branch April 3, 2021 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant