Skip to content

Comments

Remove @pytest.mark.quarantined from stable quarantine tests#28767

Merged
potiuk merged 1 commit intoapache:mainfrom
Taragolis:remove-quarantined-marker
Jan 11, 2023
Merged

Remove @pytest.mark.quarantined from stable quarantine tests#28767
potiuk merged 1 commit intoapache:mainfrom
Taragolis:remove-quarantined-marker

Conversation

@Taragolis
Copy link
Contributor

@Taragolis Taragolis commented Jan 6, 2023

related: #28658

Currently all tests pass either local run or quarantined pipeline in CI. I've just remove flag from all test for check in all possible backends.

Most of the issues in the past related to MySQL 5.7 which almost reach EOL, however quarantined pipeline run only with SQLite backend.

Update
After couple runs we found that only tests/executors/test_celery_executor.py::TestLocalTaskJob::test_process_sigterm_works_with_retries is still flaky

@Taragolis Taragolis added the full tests needed We need to run full set of tests for this PR to merge label Jan 6, 2023
@Taragolis Taragolis requested review from eladkal and potiuk January 6, 2023 14:27
@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Jan 6, 2023
@potiuk
Copy link
Member

potiuk commented Jan 6, 2023

Cool. I think we shoudl do it a "bit" gradually (and let them run for a few days after merge). This will be helpful to narrow down the problem in case some of the tests introduce side-effect, to be able to determine which ones it is more easily.

@potiuk potiuk force-pushed the remove-quarantined-marker branch from 4f16be5 to 3a24f2c Compare January 7, 2023 08:14
@potiuk
Copy link
Member

potiuk commented Jan 7, 2023

Rebased them to re-run and re-check if they are repeatably stable as well.

@Taragolis Taragolis force-pushed the remove-quarantined-marker branch from 3a24f2c to e636766 Compare January 7, 2023 12:32
@Taragolis Taragolis changed the title POC: Remove @pytest.mark.quarantined from tests Remove @pytest.mark.quarantined from remaining quarantine tests Jan 7, 2023
@Taragolis Taragolis marked this pull request as ready for review January 7, 2023 12:32
@potiuk potiuk added the use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) label Jan 7, 2023
@potiuk potiuk force-pushed the remove-quarantined-marker branch from e636766 to 9e4a669 Compare January 7, 2023 23:09
@potiuk
Copy link
Member

potiuk commented Jan 7, 2023

And rebased them with public runners too.

@potiuk
Copy link
Member

potiuk commented Jan 7, 2023

@Taragolis -> https://github.com/apache/airflow/actions/runs/3864325055/jobs/6587088003 - seems that at least test_process_sigterm_works_with_retries is flaky on public runners ...

@Taragolis
Copy link
Contributor Author

@potiuk I hope it just flaky test and not something wrong with callback mechanism.
Let keep it for a while and have a look how often it failed - just for some statistic, I will exclude this test anyway and try to figure out what the actual nature of this failure.

@Taragolis
Copy link
Contributor Author

Taragolis commented Jan 8, 2023

I can't reproduce this error locally but I have an idea what could possible wrong

Potentially callback might acquire lock

retry_callback_called = Value("i", 0)
def retry_callback(context):
with retry_callback_called.get_lock():
retry_callback_called.value += 1
assert context["dag_run"].dag_id == "test_mark_failure_2"

After it checked in test case

with timeout(10):
job1.run()
assert retry_callback_called.value == 1

However locally I do not have this behaviour. I only could emulate it by adding time.sleep(5) before acquire lock in callback.

@Taragolis Taragolis force-pushed the remove-quarantined-marker branch from 9e4a669 to 82bcd48 Compare January 9, 2023 12:32
@Taragolis
Copy link
Contributor Author

And ... this test failed again 😞

@potiuk
Copy link
Member

potiuk commented Jan 9, 2023

Yeah. I think we could merge the others and leave that one as the last - I might also be able to take a look at it shortly (this is kinda what I was expecting and that's why I wanted to do it gradually) :)

@Taragolis
Copy link
Contributor Author

Yeah, let me revert quarantined mark for this test.

I just think about what we tried to test here:

  1. Is on_retry_callback actually called
    or
  2. Is task retried, e.g. change state to up_for_retry

@Taragolis Taragolis force-pushed the remove-quarantined-marker branch from 82bcd48 to b9ff72c Compare January 9, 2023 18:09
@Taragolis Taragolis changed the title Remove @pytest.mark.quarantined from remaining quarantine tests Remove @pytest.mark.quarantined from stable quarantine tests Jan 9, 2023
@Taragolis Taragolis force-pushed the remove-quarantined-marker branch from b9ff72c to 58a6fb6 Compare January 9, 2023 21:48
@potiuk potiuk force-pushed the remove-quarantined-marker branch from 58a6fb6 to 1c8c520 Compare January 10, 2023 07:17
@potiuk
Copy link
Member

potiuk commented Jan 10, 2023

One more rebase just in case...

@Taragolis Taragolis force-pushed the remove-quarantined-marker branch from 1c8c520 to fadb23d Compare January 11, 2023 14:39
@Taragolis
Copy link
Contributor Author

@potiuk looks like this tests stable. We would know more about stability after we merge it 🤣

@potiuk potiuk merged commit 8771899 into apache:main Jan 11, 2023
@potiuk
Copy link
Member

potiuk commented Jan 11, 2023

Let's

@Taragolis Taragolis deleted the remove-quarantined-marker branch January 14, 2023 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants