Fix CI #3064

Hocuri · 2022-02-07T17:31:31Z

#skip-changelog

The tests are flaky: (sometimes they just run through, though; I repeatetly hit re-run, the first two runs succeeded, in the third run six tests failed at once)

https://pipelines.actions.githubusercontent.com/0Cs9EsF2mwtr7icbiRvpCNrfoVF185ozSjOeo0Ale4nDoOrShu/_apis/pipelines/1/runs/8174/signedlogcontent/9?urlExpires=2022-02-07T20%3A40%3A06.1888390Z&urlSigningMethod=HMACV1&urlSignature=%2FO5JxNmTDQpKnBGVfL8wDKg7TVaN1ypDfmhQdepPyw4%3D

2022-02-07T18:52:10.6629468Z =================================== FAILURES ===================================
2022-02-07T18:52:10.6629891Z ____________________________ tests/test_account.py _____________________________
2022-02-07T18:52:10.6630709Z [gw3] linux -- Python 3.9.10 /home/runner/work/deltachat-core-rust/deltachat-core-rust/python/.tox/py3/bin/python
2022-02-07T18:52:10.6631453Z worker 'gw3' crashed while running 'tests/test_account.py::TestOnlineAccount::test_qr_join_chat'
2022-02-07T18:52:10.6632137Z ____________________________ tests/test_account.py _____________________________
2022-02-07T18:52:10.6632975Z [gw1] linux -- Python 3.9.10 /home/runner/work/deltachat-core-rust/deltachat-core-rust/python/.tox/py3/bin/python
2022-02-07T18:52:10.6633914Z worker 'gw1' crashed while running 'tests/test_account.py::TestOnlineAccount::test_immediate_autodelete'
2022-02-07T18:52:10.6634430Z ____________________________ tests/test_account.py _____________________________
2022-02-07T18:52:10.6635222Z [gw2] linux -- Python 3.9.10 /home/runner/work/deltachat-core-rust/deltachat-core-rust/python/.tox/py3/bin/python
2022-02-07T18:52:10.6635938Z worker 'gw2' crashed while running 'tests/test_account.py::TestOnlineAccount::test_set_get_contact_avatar'
2022-02-07T18:52:10.6636604Z ____________________________ tests/test_account.py _____________________________
2022-02-07T18:52:10.6637196Z [gw4] linux -- Python 3.9.10 /home/runner/work/deltachat-core-rust/deltachat-core-rust/python/.tox/py3/bin/python
2022-02-07T18:52:10.6638069Z worker 'gw4' crashed while running 'tests/test_account.py::TestOnlineAccount::test_group_quote'
2022-02-07T18:52:10.6638554Z ____________________________ tests/test_account.py _____________________________
2022-02-07T18:52:10.6639338Z [gw5] linux -- Python 3.9.10 /home/runner/work/deltachat-core-rust/deltachat-core-rust/python/.tox/py3/bin/python
2022-02-07T18:52:10.6640290Z worker 'gw5' crashed while running 'tests/test_account.py::TestOnlineAccount::test_name_changes'
2022-02-07T18:52:10.6640948Z ____________________________ tests/test_account.py _____________________________
2022-02-07T18:52:10.6641561Z [gw6] linux -- Python 3.9.10 /home/runner/work/deltachat-core-rust/deltachat-core-rust/python/.tox/py3/bin/python
2022-02-07T18:52:10.6642430Z worker 'gw6' crashed while running 'tests/test_account.py::TestOnlineAccount::test_fetch_existing[True]'
2022-02-07T18:52:10.6642919Z ____________________________ tests/test_account.py _____________________________
2022-02-07T18:52:10.6643704Z [gw0] linux -- Python 3.9.10 /home/runner/work/deltachat-core-rust/deltachat-core-rust/python/.tox/py3/bin/python
2022-02-07T18:52:10.6644454Z worker 'gw0' crashed while running 'tests/test_account.py::TestGroupStressTests::test_synchronize_member_list_on_group_rejoin'

Hocuri · 2022-02-07T21:23:17Z

I think I know at least a big part of the cause.

Consider these flaky tests:

    def test_flaky_timeout(self):
        if random() < 0.5:
            time.sleep(20000)

    def test_flaky_error(self):
        if random() < 0.5:
            raise Exception()

If you run with -n2 --reruns 2 --reruns-delay 5 -v -rsXx --ignored --strict-tls -k test_flaky (which is similar to the CI), test_flaky_error() is rerun if it failed, but test_flaky_timeout() not. Because, for some reason rerunning doesn't work for timeouts.

Now, the thing is, before #3059, we had arbitrary timeouts that threw an error after, like, 30 seconds, and effectively turned timeouts into errors. So, a quick and easy solution would be to revert #3059.

IIRC think that the tests were flaky and failing sometimes before already - plus, in an ideal world, it wouldn't be necessary to re-run them at all. But we should probably try to get re-running to work first.

As for why tests are failing in the first place... from the log, it seems like there are a lot of different causes:

A message can't be sent in test_set_get_contact_avatar() because of src/smtp.rs:479: Failed to send message over SMTP: SMTP connection failure: SMTP failed to connect: permanent: 5.7.8 Error: authentication failed: .
In test_qr_join_chat(), ac2 says that it sends one of the securejoin messages, but then it never arrives at ac1. Either ac2 didn't properly send it, or it got lost somewhere, or ac1 didn't notice it.
in test_immediate_autodelete(), imap2.idle_check() doesn't notice a the message that was sent. ac2 does receive the message according to the logs, so seems like the message arrived but idle_check didn't work.
test_group_quote(): Excerpt from the log:

2022-02-07T18:49:04.8011781Z 59.98 [events-ac2] DC_EVENT_INFO data1=0 data2=src/imap/idle.rs:73: Idle has NewData ResponseData { raw: 4096, response: MailboxData(Exists(1)) }
2022-02-07T18:49:04.8012329Z 60.20 [events-ac2] DC_EVENT_INFO data1=0 data2=src/job.rs:753: loading job for Imap-thread
2022-02-07T18:49:04.8012825Z 60.43 [events-ac2] DC_EVENT_INFO data1=0 data2=src/imap.rs:823: 1 mails read from "INBOX".
2022-02-07T18:49:04.8013306Z 61.05 [events-ac2] DC_EVENT_IMAP_MESSAGE_MOVED data1=0 data2=IMAP messages 15 moved to DeltaChat
2022-02-07T18:49:04.8013827Z 61.05 [events-ac2] DC_EVENT_INFO data1=0 data2=src/imap/scan_folders.rs:28: Starting full folder scan
2022-02-07T18:49:04.8014321Z 61.46 [events-ac2] DC_EVENT_INFO data1=0 data2=src/imap.rs:823: 0 mails read from "INBOX".
2022-02-07T18:49:04.8014815Z 61.69 [events-ac2] DC_EVENT_CONNECTIVITY_CHANGED data1=0 data2=0
2022-02-07T18:49:04.8015318Z 61.69 [events-ac2] DC_EVENT_INFO data1=0 data2=src/imap.rs:1624: ignoring unsolicited response Recent(1)
2022-02-07T18:49:04.8016127Z 61.69 [events-ac2] DC_EVENT_INFO data1=0 data2=src/imap.rs:1624: ignoring unsolicited response Other(ResponseData { raw: 4096, response: Data { status: Ok, code: Some(CopyUid(1644259495, [Uid(15)], [Uid(4)])), information: Some("Moved UIDs.") } })

Seems like the mail is found on the server and then moved to the DeltaChat folder - this happens because get_many_online_accounts() has default move=True (while the more-frequently-used get_online_configuring_account() has move=False, but maybe it's good if some different settings are used for the different tests). But then it is somehow not found in the DeltaChat folder.

@link2xt do you know if maybe the server doesn't send an Exists when idle-ing on the destination server if it already sends a CopyUid?

There are about 4 more timeouts in the log which I didn't analyze.

link2xt · 2022-02-08T07:25:05Z

There is a related issue pytest-dev/pytest-rerunfailures#99 suggesting -o timeout_func_only=true option.

Another maybe related issue: pytest-dev/pytest-rerunfailures#157

link2xt · 2022-02-08T07:44:22Z

maybe the server doesn't send an Exists when idle-ing on the destination server if it already sends a CopyUid?

CopyUid is sent on the Inbox connection, but Movebox connection should still be interrupted with Exists, because it could be even from a client from another device.

Hocuri · 2022-02-08T08:50:51Z

suggesting -o timeout_func_only=true option

I just tried this out (my code from above with -n2 --reruns 2 --reruns-delay 5 -v -rsXx --ignored --strict-tls -k test_flaky -o timeout_func_only=true), but it didn't help.

link2xt · 2022-02-08T18:30:09Z

Works for me with

-timeout_method = thread
+timeout_func_only = True

in tox.ini.

timeout_method was set to thread from the introduction of rerunfailures, probably not needed: ffc525a

Hocuri · 2022-02-08T19:38:29Z

Nice, thanks, works for me too!

link2xt · 2022-02-08T20:49:04Z

I pushed it as 6c6d47c with some description in the commit message

link2xt · 2022-02-08T20:51:04Z

Hopefully this is even more robust than before, because it will rerun all timeout-related failures rather than only our own manual timeouts.

link2xt · 2022-02-08T21:10:58Z

We are still affected by pytest-dev/pytest-rerunfailures#157 though. When a worker crashes, it is replaced but forgets to do reruns.

.

51778f9

Hocuri mentioned this pull request Feb 7, 2022

Revert "python: remove arbitrary timeouts from tests" #3066

Closed

fix ci

fef1c43

link2xt mentioned this pull request Feb 8, 2022

Incompatible with pytest-timeout plugin pytest-dev/pytest-rerunfailures#99

Open

Remove unnecessary comment

db0a396

Hocuri changed the title ~~[WIP] Try to fix CI...~~ Fix CI Feb 8, 2022

link2xt closed this Feb 8, 2022

link2xt deleted the hocuri/try-to-fix-ci branch February 8, 2022 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CI #3064

Fix CI #3064

Hocuri commented Feb 7, 2022 •

edited

Loading

Hocuri commented Feb 7, 2022 •

edited

Loading

link2xt commented Feb 8, 2022

link2xt commented Feb 8, 2022

Hocuri commented Feb 8, 2022

link2xt commented Feb 8, 2022 •

edited

Loading

Hocuri commented Feb 8, 2022

link2xt commented Feb 8, 2022

link2xt commented Feb 8, 2022

link2xt commented Feb 8, 2022

Fix CI #3064

Fix CI #3064

Conversation

Hocuri commented Feb 7, 2022 • edited Loading

Hocuri commented Feb 7, 2022 • edited Loading

link2xt commented Feb 8, 2022

link2xt commented Feb 8, 2022

Hocuri commented Feb 8, 2022

link2xt commented Feb 8, 2022 • edited Loading

Hocuri commented Feb 8, 2022

link2xt commented Feb 8, 2022

link2xt commented Feb 8, 2022

link2xt commented Feb 8, 2022

Hocuri commented Feb 7, 2022 •

edited

Loading

Hocuri commented Feb 7, 2022 •

edited

Loading

link2xt commented Feb 8, 2022 •

edited

Loading