Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs from different streams can arrive out of order #401

Merged
merged 2 commits into from
Dec 14, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions lib/kubernetes-deploy/container_logs.rb
Original file line number Diff line number Diff line change
Expand Up @@ -62,16 +62,23 @@ def rfc3339_timestamp(time)

def deduplicate(logs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this deserves a new name now? sort_and_deduplicate? sort_and_uniq?

deduped = []
check_for_duplicate = true

logs.each do |line|
timestamp, msg = split_timestamped_line(line)
next if check_for_duplicate && likely_duplicate?(timestamp)
check_for_duplicate = false # logs are ordered, so once we've seen a new one, assume all subsequent logs are new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

womp womp lol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆

parsed_lines = logs.map { |line| split_timestamped_line(line) }
sorted_lines = parsed_lines.sort do |a, b|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about something like

sort do |(t1, _), (t2, _)|

then you can avoid the [0] below (and the name t makes it bit more obvious that it's the timestamp that's being sorted by).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'ld probably go so far as to call the timestamp1 and timestamp2

if a[0].nil? || b[0].nil?
0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Not sure I'm following this logic. 0 here means the two elements are treated as equal, right? Why is that what we want? I would have expected something like

if a[0].nil?
  -1
elsif b[0].nil?
  +1
else
  a[0] <=> b[0]
end

or maybe something like

(a[0] || 0) <=> (b[0] || 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that considering them equal was equivalent to leaving them alone, but that's completely wrong. It totally messes up the sort. Since the actual known case where the timestamp is missing is one where that defective line is the first, I think the second suggestion (assuming no timestamp means earliest) is a good approach.

else
a[0] <=> b[0]
end
end

sorted_lines.each do |line|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted_lines.each do |timestamp, msg|

saves you the next two lines

timestamp = line[0]
msg = line[1]
next if likely_duplicate?(timestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. likely_duplicate? means that timestamp <= @last_timestamp. But how can this ever be true now that we've sorted the logs by timestamp? Or is it possibly true because of a previous call to this method? i.e. an older value of @last_timestamp, not one that was set during this method call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it possibly true because of a previous call to this method

Exactly. We fetch logs (and then process them here) repeatedly, and since the API does not support subsecond granularity, we inevitably sometimes get the same log lines in more than one batch.

@last_timestamp = timestamp if timestamp
deduped << msg
end

deduped
end

Expand Down
37 changes: 37 additions & 0 deletions test/unit/container_logs_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,43 @@ def test_deduplication_works_when_exact_same_batch_is_returned_more_than_once
assert_logs_match("Line 2", 1)
end

def test_deduplication_works_when_last_line_is_out_of_order
regression_data = <<~STRING
2018-12-13T12:17:23.727605598Z Line 1
2018-12-13T12:17:23.727696012Z Line 2
2018-12-13T12:17:23.728538913Z Line 3
2018-12-13T12:17:23.7287293Z Line 4
2018-12-13T12:17:23.729694842Z Line 5
2018-12-13T12:17:23.731259592Z Line 7
2018-12-13T12:17:23.73127007Z Line 8
2018-12-13T12:17:23.731273672Z Line 9
2018-12-13T12:17:23.731276862Z Line 10
2018-12-13T12:17:23.731284069Z Line 11
2018-12-13T12:17:23.731287054Z Line 12
2018-12-13T12:17:23.731289959Z Line 13
2018-12-13T12:17:23.731292814Z Line 14
2018-12-13T12:17:23.731295298Z Line 15
2018-12-13T12:17:23.731297747Z Line 16
2018-12-13T12:17:23.731297748Z Line 17
2018-12-13T12:17:23.729851532Z Line 6
STRING

KubernetesDeploy::Kubectl.any_instance.stubs(:run)
.returns([regression_data, "", ""]).times(12)

12.times do
@logs.sync
@logs.print_latest
end

expected_lines = generate_log_messages(1..17)

expected_lines.each do |line|
assert_logs_match(/#{line}$/, 1) # no duplicates
end
assert_logs_match_all(expected_lines, in_order: true) # sorted correctly
end

private

def generate_log_messages(range)
Expand Down