Skip to content

Conversation

@lucab
Copy link
Contributor

@lucab lucab commented Jun 20, 2016

Hard-sleep instead of waiting for a journal event.
This fixes a race due to waiting for any events but enumerating only
matching ones.

Closes #180

Hard-sleep instead of waiting for a journal event.
This fixes a race due to waiting for any events but enumerating only
matching ones.
@lucab
Copy link
Contributor Author

lucab commented Jun 20, 2016

All together #178, #179, and #181 let the whole testsuite run on travis without failures.

@jonboulle
Copy link
Contributor

@lucab how about just putting them up in a single PR?

@lucab
Copy link
Contributor Author

lucab commented Jun 22, 2016

@jonboulle #179 is some "creative" travis usage which I'm not sure if ok. The other two could have been merged yes, but this last one was unplanned at first as I was expecting #180 to be harder to trace down.

@jonboulle
Copy link
Contributor

@lucab is that like "creative accounting"? ;-).
What are your concerns with that approach exactly? Longer term, we could consider migrating this to our Jenkins infrastructure for more flexibility with different systemds.

@lucab
Copy link
Contributor Author

lucab commented Jun 22, 2016

@jonboulle exactly 😄

I expanded the discussion in that PR.

I'm not joining here the other two testsuite-fixing PRs now, but I'll avoid wasting resources next time. Relevant to this specific PR, there is a thread ongoing on systemd ML: https://lists.freedesktop.org/archives/systemd-devel/2016-June/036946.html

if r < 0 {
t.Fatalf("Error waiting to journal")
}
time.Sleep(time.Duration(1) * time.Second)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on why a hard sleep is better than waiting for the journal? This seems counter-intuitive to me as time.Sleep cries for flakes later on.

Copy link
Contributor Author

@lucab lucab Jul 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From ML thread (quoting myself in lack of a better source):

sd_journal_wait() will trigger on *any* events, while sd_journal_get_data() 
will apply the filter and find no matching entries.

In our context, Wait() will return as soon as there is a new message available in the log even if it doesn't match the filter, in which case Next() will return EOF (if expected entry is not yet available).

This sleep could be replaced with something like a Wait&Next with a retrial counter, but it will not eliminate the flakiness of the test: it will then depend on the magic number of retrials and additionally also on how many events happen before our expected entry.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could loop over sd_journal_wait()/sd_journal_get_data() until we eventually get our message matching the field, but I agree this could end up in an endless loop.

Not really happy about the time.Sleep at all, so let's observe this in subsequent CI builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither, but I think sdjournal would need to grow a C API for filtered event-triggering to properly address this usecase. May I go on and merge this as-is for now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sdjournal: possible race in Match/Send/Wait/Next flow

3 participants