Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run helm chart tests in parallel #15706

Merged
merged 2 commits into from
May 6, 2021
Merged

Conversation

jedcunningham
Copy link
Member

The helm chart tests are pretty slow when run sequentially. Modifying
them so they can be run in parallel saves a lot of time, from 10 minutes
to 3 minutes on my machine with 8 cores.

The only test that needed modification was test_pod_template_file.py,
as it temporarily moves a file into the templates directory
which was causing other tests to fail as they weren't expecting any
objects from that temporary file. This is resolved by giving the
pod_template_file test an isolated chart directory it can modify.

helm dep update also doesn't work when it is called in parallel, so
the fixture responsible for running it now ensures we only run it one at
a time.

@boring-cyborg boring-cyborg bot added the area:helm-chart Airflow Helm Chart label May 6, 2021
@potiuk
Copy link
Member

potiuk commented May 6, 2021

FYI. @jedcunningham -> this is already possibe: https://github.com/apache/airflow/blob/master/TESTING.rst#running-full-airflow-test-suite-in-parallel What happens there is the whole "chart" is copied to a separate directory and helm tests are run in parallel. This is done in CI but you can also run the script to run helm tests in parallel

@potiuk
Copy link
Member

potiuk commented May 6, 2021

export TEST_TYPES="Helm"
./scripts/ci/testing/ci_run_airflow_testing.sh

@github-actions
Copy link

github-actions bot commented May 6, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@@ -484,6 +484,7 @@ def get_sphinx_theme_version() -> str:
'click~=7.1',
'coverage',
'docutils',
'filelock',
Copy link
Contributor

@ephraimbuddy ephraimbuddy May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the py-filelock project is no longer maintained, the last commit to the project was in 2019. Apart from that, this LGTM

Copy link
Member

@potiuk potiuk May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that matters @ephraimbuddy ? This is rather simple library and I do not expect too many changes (300 lines of code). Looks like it is not updated because it simply does its job well :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right @potiuk :)

@potiuk
Copy link
Member

potiuk commented May 6, 2021

Correction to my previous comment-> what I wrote was correct for Kubernetes tests not Helm tests (this is where we copy all the charts and run them in parallell. For Helm tests it runs well in parallell, because they are run in a separate docker containers where sources are baked into image, rather than mounted. I see why you'd want to run them locally though.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great for local iteration speed :)

@github-actions
Copy link

github-actions bot commented May 6, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label May 6, 2021
@jedcunningham
Copy link
Member Author

I see why you'd want to run them locally though.

Yeah, this was driven by me getting tired of the tests running slow locally.

For Helm tests it runs well in parallell, because they are run in a separate docker containers where sources are baked into image, rather than mounted.

(not sure how to deep link further): https://github.com/apache/airflow/pull/15627/checks?check_run_id=2518755958

Looks like even with self hosted runners these are running sequentially, based on the time?

@mik-laj
Copy link
Member

mik-laj commented May 6, 2021

I think we still need to set a flag to CI script to activate parallelism. See:

if [[ "${TEST_TYPE}" != "Helm" ]]; then
EXTRA_PYTEST_ARGS+=(
"--with-db-init"
)
fi

We should add the following change.

if [[ "${TEST_TYPE}" == "Helm" ]]; then
    # Enable parallelism
    EXTRA_PYTEST_ARGS+=(
        "-n" "auto"
    )
else
    EXTRA_PYTEST_ARGS+=(
        "--with-db-init"
    )
fi

The helm chart tests are pretty slow when run sequentially. Modifying
them so they can be run in parallel saves a lot of time, from 10 minutes
to 3 minutes on my machine with 8 cores.

The only test that needed modification was `test_pod_template_file.py`,
as it temporarily moves a file into the templates directory
which was causing other tests to fail as they weren't expecting any
objects from that temporary file. This is resolved by giving the
pod_template_file test an isolated chart directory it can modify.

`helm dep update` also doesn't work when it is called in parallel, so
the fixture responsible for running it now ensures we only run it one at
a time.
@jedcunningham
Copy link
Member Author

@mik-laj, do you want me to do that as part of this PR?

@mik-laj
Copy link
Member

mik-laj commented May 6, 2021

@jedcunningham , Yes. We have a limited budget for CI and we try to optimize it. See: https://lists.apache.org/thread.html/r7d712327f985536b33c7791ddb7f443730f0c88a1e2e2c50538411fa%40%3Cdev.airflow.apache.org%3E

@potiuk
Copy link
Member

potiuk commented May 6, 2021

Looks like even with self hosted runners these are running sequentially, based on the time?

You are right. I've forgotten that the helm charts were run as separate process. We had far less number of unit tests before for helm charts and they run much faster.

After your recent addition they started to run much longer - they were running much quicker before. Good call. Yep. The change proposed by @mik-laj should work (we already have pytest-xdist added).

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
@mik-laj mik-laj changed the title Allow helm chart tests to run in parallel Run helm chart tests in parallel May 6, 2021
@mik-laj
Copy link
Member

mik-laj commented May 6, 2021

Some static checks failed, but it looks unrelated to this change. I am merging this change to test on a self-hosted runner.

@mik-laj mik-laj merged commit bdb76be into apache:master May 6, 2021
@jedcunningham jedcunningham deleted the helm_tests_parallel branch May 6, 2021 21:07
@jedcunningham
Copy link
Member Author

Looks like it was faster on an actions runner too.

@potiuk
Copy link
Member

potiuk commented May 6, 2021

Looks like it was faster on an actions runner too.

Yep. There are 2 CPUs there. Most of our tests are already auto-detecting the number of CPUs and I sped them up this way (unfortunately a lot of our regular Airflow tests (unlike the helm tests) cannot be run in parallel with pytest-xdist because they are using shared database and rely on the DB state. So I had to parallelize them 'per test type'.

@potiuk
Copy link
Member

potiuk commented May 6, 2021

@mik-laj - I think you merged it too early - there were static-checks and build-docs failing @jedcunningham - can you please fix them ? Otherwise master is broken.

@potiuk
Copy link
Member

potiuk commented May 6, 2021

AH. I see the comment now.

@mik-laj
Copy link
Member

mik-laj commented May 6, 2021

@potiuk it is unrelated to this change. We have these problem on maaster too. See: https://github.com/apache/airflow/runs/2521146207

@potiuk
Copy link
Member

potiuk commented May 6, 2021

So we already have broken master . Do we know why?

@mik-laj
Copy link
Member

mik-laj commented May 6, 2021

@potiuk It may be related to this PR. #15681 this build doesn't check all files.

Lint OpenAPI using openapi-spec-validator............................(no files to check)Skipped
Lint dockerfile......................................................(no files to check)Skipped
Check order of dependencies in setup.cfg and setup.py................(no files to check)Skipped
Checks setup extra packages..........................................(no files to check)Skipped
Update output of breeze command in BREEZE.rst........................(no files to check)Skipped
Update mounts in the local yml file..................................(no files to check)Skipped
Update setup.cfg file with all licenses..............................(no files to check)Skipped
Build cross-dependencies for providers packages......................(no files to check)Skipped
Update extras in documentation.......................................(no files to check)Skipped
Check for pydevd debug statements accidentally left..................(no files to check)Skipped

@potiuk
Copy link
Member

potiuk commented May 6, 2021

Reverting it then: #15707

@potiuk
Copy link
Member

potiuk commented May 6, 2021

I reverted it just in case but I think there were two problems. The change was run on the AMI in Ohio that @ashb is testing (Runner name: 'Airflow Runner 85'. @ashb -> FYI: https://github.com/apache/airflow/runs/2521146207 seems that jq replacing "Experimental" in docker had problem in the AMI.

https://github.com/apache/airflow/runs/2521146207#step:10:188

cognifloyd pushed a commit to cognifloyd/stackstorm-k8s that referenced this pull request Feb 16, 2022
* Allow helm chart tests to run in parallel

The helm chart tests are pretty slow when run sequentially. Modifying
them so they can be run in parallel saves a lot of time, from 10 minutes
to 3 minutes on my machine with 8 cores.

The only test that needed modification was `test_pod_template_file.py`,
as it temporarily moves a file into the templates directory
which was causing other tests to fail as they weren't expecting any
objects from that temporary file. This is resolved by giving the
pod_template_file test an isolated chart directory it can modify.

`helm dep update` also doesn't work when it is called in parallel, so
the fixture responsible for running it now ensures we only run it one at
a time.

* Enable parallelism for helm unit tests in CI

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>

Partial Commit Extracted From: https://github.com/apache/airflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:helm-chart Airflow Helm Chart full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants