Skip to content

TESTS: make intgcheck is not always passing in the internal CI (enumeration tests) #4489

@sssd-bot

Description

@sssd-bot

Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/3463

  • Created at 2017-08-07 17:54:53 by mzidek
  • Closed at 2019-12-11 17:16:19 as Fixed
  • Assigned to nobody

The enumeration tests in make intgcheck are sometimes failing (last two weeks quite often).

It happens mostly on the Debian machine, but other machines can fail too. Lukas mentioned that it also happened while running the tests locally.

It may be bug in the enumeration tests or in the SSSD's enumeration code.

Comments


Comment from lslebodn at 2017-08-07 19:16:22

Failures in tests are not acceptable. Therefore changing priority to blocker.


Comment from lslebodn at 2017-08-07 19:16:25

Metadata Update from @lslebodn:

  • Issue priority set to: blocker

Comment from lslebodn at 2017-08-07 20:22:13

There is no reason for triage.
Failure in test (which is far from rare) is automatically a blocker for next release.


Comment from lslebodn at 2017-08-07 20:22:14

Metadata Update from @lslebodn:

  • Issue set to the milestone: SSSD 1.15.4

Comment from fidencio at 2017-08-08 22:31:23

I've set up a Debian testing machine on my personal server and could reproduce the issue within a few runs of make intgcheck-run INTGCHECK_PYTEST_ARGS="-k test_add_remove_user".

At the first look, the problem seems to occur because the time we wait between adding the user to LDAP and testing that the operation is reflected on SSSD may not be enough.

I've tweaked the test a little bit, as shown below, and left the test running till it fails ...

root@testing:~/sssd/x86_64# git diff
diff --git a/src/tests/intg/test_enumeration.py b/src/tests/intg/test_enumeration.py
index 47772dea2..32e666e4b 100644
--- a/src/tests/intg/test_enumeration.py
+++ b/src/tests/intg/test_enumeration.py
@@ -406,7 +406,7 @@ def user_and_groups_rfc2307_bis(request, ldap_conn):
 def test_add_remove_user(ldap_conn, blank_rfc2307):
     """Test user addition and removal are reflected by SSSD"""
     e = ldap_ent.user(ldap_conn.ds_inst.base_dn, "user", 2001, 2000)
-    time.sleep(INTERACTIVE_TIMEOUT/2)
+    time.sleep(INTERACTIVE_TIMEOUT)
     # Add the user
     ent.assert_passwd(ent.contains_only())
     ldap_conn.add_s(*e)

I'll update the status here whenever it fails or after a day or two of non-failure runs.


Comment from fidencio at 2017-08-09 07:30:32

It's been running for the last 8+ hours without any failure.


Comment from lslebodn at 2017-08-09 09:02:57

Changing hardcoded timeout without explanation "why" is not sufficient solution. (because you might hide a bug)

The first problem is that current value in test is not described. So we cannot easily say whether issue is in sssd or in test.

Example of description in other place:
https://pagure.io/SSSD/sssd/blob/master/f/src/tests/intg/test_secrets.py#_428


Comment from mzidek at 2017-08-09 18:20:14

Here is closed PR with attempt to fix this issue by modifying timeout/sleep values. It may be of some value for the person that will end up looking into this:

#345


Comment from lslebodn at 2017-08-14 12:50:23

Temporary workaround which should be reverted before 1.15.4
master:


Comment from jhrozek at 2017-10-19 21:02:16

Unfortunately we haven't been able to figure out the root cause and at the same time we must release 1.16.0 no later than tomorrow (Oct-20). Therefore moving to 1.16.1


Comment from jhrozek at 2017-10-19 21:02:51

Metadata Update from @jhrozek:

  • Issue set to the milestone: SSSD 1.16.1 (was: SSSD 1.15.4)

Comment from jhrozek at 2017-12-15 18:13:30

Since the commit that removed the tests was reverted in 26803ff I'm closing this ticket.

@lslebodn please feel free to reopen this ticket, i would have asked you before closing it in the first place normally, but since we're about to release the new tarball and you're away, I'm closing without asking :)


Comment from jhrozek at 2017-12-15 18:14:02

Metadata Update from @jhrozek:

  • Issue close_status updated to: Fixed
  • Issue status updated to: Closed (was: Open)

Comment from lslebodn at 2017-12-21 17:37:36

Since the commit that removed the tests was reverted in 26803ff I'm closing this ticket.

I do not think that this ticket is fixed by commit 26803ff.
Therefore re-openning.


Comment from lslebodn at 2017-12-21 17:37:47

Metadata Update from @lslebodn:

  • Issue set to the milestone: SSSD Continuous integration (was: SSSD 1.16.1)
  • Issue status updated to: Open (was: Closed)

Comment from atikhonov at 2019-12-05 16:34:16

For the reference: #956


Comment from lslebodn at 2019-12-05 23:56:20

For the reference: #956

Testing periodic tasks (which is also enumeration) is quite tricky.
Because ti is not easy to find out when the 1st enumeration was triggered.
And testing should be ideally done in the middle between two periodic tasks.
INTERACTIVE_TIMEOUT/2 is quite confusing and should be probably changed to some constant e.g. (INITIAL_DELAY).

Ideal would be to have initial delay flexible instead of hard-coded. cause different architectures/distributions might trigger tasks slightly differently. Tests passed without any issue with {0,1,2,3,4} as initial delay delay on bare metal machine. But virtual machines are more problematic IIRC. (And i could see issues sometime on arm32)

my 2c


Comment from atikhonov at 2019-12-06 17:45:14

Testing periodic tasks (which is also enumeration) is quite tricky.
Because ti is not easy to find out when the 1st enumeration was triggered.
And testing should be ideally done in the middle between two periodic tasks.

Why is it necessary to do in the middle?

Why checking after (ENUMERATION_PERIOD + DELAY),
where ENUMERATION_PERIOD = https://github.com/SSSD/sssd/blob/master/src/tests/intg/test_enumeration.py#L151 , and DELAY is probably equal to ENUMERATION_PERIOD,
wont work?

INTERACTIVE_TIMEOUT/2 is quite confusing and should be probably changed to some constant e.g. (INITIAL_DELAY).

If I understood Michal's explanation correctly (probably I didn't), for most of the tests this is not "initial" but merely periodic task.
I mean, test should expect that next task will only be run in ENUMERATION_PERIOD + DELAY in a worst case.


Comment from mzidek at 2019-12-11 16:25:47

Commit 116b144b relates to this ticket


Comment from mzidek at 2019-12-11 17:16:20

Some new patches have landed in the master. Closing this for now. We can open a new issue or re-open this one if the problem still persists.

master:
116b144
3477f2c


Comment from mzidek at 2019-12-11 17:16:20

Metadata Update from @mzidek:

  • Issue close_status updated to: Fixed
  • Issue status updated to: Closed (was: Open)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closed: FixedIssue was closed as fixed.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions