Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to reduce flakiness of PythonVirtualeEnv test_airflow_context #17486

Merged

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Aug 7, 2021

We have a global limit (60 seconds) for individual test execution,
however 'test_airflow_context' of the Python Virtualenv test might take longer in
case they are run in parallel - because they are using dill serialization
including a lot of serializable data from the context of the task.

We give the test 120 seconds to complete now.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added the area:core-operators Operators, Sensors and hooks within Core Airflow label Aug 7, 2021
@potiuk potiuk changed the title Attempt to reduce flakiness of PythonVirtualeEnv tests Attempt to reduce flakiness of PythonVirtualEnv tests Aug 7, 2021
@potiuk potiuk changed the title Attempt to reduce flakiness of PythonVirtualEnv tests Attempt to reduce flakiness of PythonVirtualenv tests Aug 7, 2021
@github-actions
Copy link

github-actions bot commented Aug 7, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Aug 7, 2021
@uranusjr
Copy link
Member

uranusjr commented Aug 7, 2021

I wonder why this takes so long in the first place, virtualenv creation should generally be lightning fast. Probably because virtualenv is trying to update its embedded wheels (pip, setuptools, and wheel) automatically? We should try to disable this by passing --no-periodic-update.

https://virtualenv.pypa.io/en/stable/user_guide.html#embed-wheels-for-distributions

@potiuk
Copy link
Member Author

potiuk commented Aug 7, 2021

Yeah. I am not reallly sure why this is happening either. Actually I am a bit puzzled here. Seems that one of those happened AGAIN! And Again with 60 seconds (which I was sure should be changed to 120 now). So I am a bit lost here :)

https://github.com/apache/airflow/pull/17486/checks?check_run_id=3269777720#step:6:10094

@potiuk
Copy link
Member Author

potiuk commented Aug 7, 2021

It's only the "test_airflow_context" that has problem. I limited the timeout to that test only and let's see. I am not sure why applying timeout on the class level did not work - maybe it's not supported well.

@potiuk potiuk force-pushed the reduce-flakines-on-individual-test-execution branch from b5c18da to 8b58e9d Compare August 7, 2021 16:39
@potiuk
Copy link
Member Author

potiuk commented Aug 7, 2021

@potiuk
Copy link
Member Author

potiuk commented Aug 7, 2021

Ah right. the execution_timeout is indeed only applicable on function-level. So It had no chance to work for class.

Still - I am not sure if it's the virtualenv creation, but when i test it locally this "test_airflow_context" takes ~20 seconds to run, so I can imagine easily when you run several tests in parallell something might get into contention and make it works for quite a bit longer than 20 seconds. I think in this case it is pickling the whole context that takes that much time. It looks like this is the only one test case that has both:

use_dill=True, system_site_packages=True

I do not know some intrinsic details of what's going on in this case, but it sounds plausible that this combination might work much slower.

@potiuk
Copy link
Member Author

potiuk commented Aug 7, 2021

Yeah. Seems like we save a lot of context in a serializable form in a file - including all the airflow system classes which come via dag /macros etc. etc. and restore them when we run the task. That might take a long time when dill is used I guess.

@potiuk
Copy link
Member Author

potiuk commented Aug 7, 2021

I updated the description in PR to reflect this and will update it in the commit if it will work now (I hope it will)..

@potiuk potiuk force-pushed the reduce-flakines-on-individual-test-execution branch from 8b58e9d to d134210 Compare August 7, 2021 17:39
@potiuk
Copy link
Member Author

potiuk commented Aug 7, 2021

Ehhh. Seems that we also needed to apply --timeout-order. Default order of --pytest-timeout is that commandline overrides markers (?). Pretty counter-intuitive. https://pytest-timeouts.readthedocs.io/en/latest/usage/#timeout-order

So whatever we applied as marker, was not effective because the command-line 60 overrode it (!).

@potiuk potiuk changed the title Attempt to reduce flakiness of PythonVirtualenv tests Attempt to reduce flakiness of PythonVirtualeEnv test_airflow_context Aug 7, 2021
We have a global limit (60 seconds) for individual test execution,
however 'test_airflow_context' of the Python Virtualenv test might take longer in
case they are run in parallel - because they are using dill serialization
including a lot of serializable data from the context of the task.

We give the test 120 seconds to complete now.
@potiuk potiuk force-pushed the reduce-flakines-on-individual-test-execution branch from d134210 to 5cc2fc9 Compare August 7, 2021 19:31
@potiuk potiuk merged commit 793afaa into apache:main Aug 7, 2021
@potiuk potiuk deleted the reduce-flakines-on-individual-test-execution branch August 7, 2021 20:47
potiuk added a commit that referenced this pull request Aug 7, 2021
…#17486)

We have a global limit (60 seconds) for individual test execution,
however 'test_airflow_context' of the Python Virtualenv test might take longer in
case they are run in parallel - because they are using dill serialization
including a lot of serializable data from the context of the task.

We give the test 120 seconds to complete now.

(cherry picked from commit 793afaa)
jhtimmins pushed a commit that referenced this pull request Aug 11, 2021
…#17486)

We have a global limit (60 seconds) for individual test execution,
however 'test_airflow_context' of the Python Virtualenv test might take longer in
case they are run in parallel - because they are using dill serialization
including a lot of serializable data from the context of the task.

We give the test 120 seconds to complete now.

(cherry picked from commit 793afaa)
kaxil pushed a commit that referenced this pull request Aug 17, 2021
…#17486)

We have a global limit (60 seconds) for individual test execution,
however 'test_airflow_context' of the Python Virtualenv test might take longer in
case they are run in parallel - because they are using dill serialization
including a lot of serializable data from the context of the task.

We give the test 120 seconds to complete now.

(cherry picked from commit 793afaa)
jhtimmins pushed a commit that referenced this pull request Aug 17, 2021
…#17486)

We have a global limit (60 seconds) for individual test execution,
however 'test_airflow_context' of the Python Virtualenv test might take longer in
case they are run in parallel - because they are using dill serialization
including a lot of serializable data from the context of the task.

We give the test 120 seconds to complete now.

(cherry picked from commit 793afaa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants