Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.8: hubble/relay: flush old flows when the buffer drain timeout is reached #13877

Merged
merged 2 commits into from Nov 6, 2020

Conversation

rolinh
Copy link
Member

@rolinh rolinh commented Nov 3, 2020

This is a manual backport of #13776 to v1.8 as protobuf's timestamp in version in 1.8 does not have the AsTime() method which results in compilation failure. This is worked around in this PR by implementing a helper which converts a timestamp.Timestamp to a time.Time object (this is covered by unit tests).

Ref: #13875 cc @tklauser


Once this PR is merged, you can update the PR labels via:

$ for pr in 13776; do contrib/backporting/set-labels.py $pr done 1.8; done

[ upstream commit 195351a ]

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
[ upstream commit 7d56068 ]

Draining only 1 flow every time the buffer drain timeout is reached
proves to be insufficient for certain cases. The worst case typically
happens for a request with historic data, some filters and follow-mode.
Using the Hubble CLI, this means requests like this one:

    hubble observe --last=100 --follow --pod kube-system/coredns-f9fd979d6-66mfg

The behavior before this patch for a request like this, assuming 250
flows matching the requests are stored in Hubble instances ring buffers
and new flows matching the filter criteria are infrequent, would be to
forward 150 flows, then drain 1 flow to forward every time the drain
timeout is reached. This means that for a buffer size of 100, it would
take at least 100 seconds (with the default buffer drain timeout of 1s)
to drain the buffer (unless enough new flows are received in the
meantime and fill the buffer again).

The new code drains all flows in the queue for which the timestamp is
older than `now-bufferDrainTimeout`. In other words, this would be the
old behavior:

    ...
    QUEUE FULL draining flow 149
    QUEUE FULL draining flow 150
    TIMEOUT REACHED
    DRAINED 1 flow
    TIMEOUT REACHED
    DRAINED 1 flow
    TIMEOUT REACHED
    DRAINED 1 flow
    ...

And this is the new behavior:

    ...
    QUEUE FULL draining flow 149
    QUEUE FULL draining flow 150
    TIMEOUT REACHED
    DRAINED 100 flow
    TIMEOUT REACHED
    DRAINED 0 flow
    TIMEOUT REACHED
    DRAINED 0 flow
    ...

If new flows are received but not in enough numbers to fill the buffer
within the time window of `bufferDrainTimeout`, the behavior would look
something like this:

    ...
    QUEUE FULL draining flow 149
    QUEUE FULL draining flow 150
    TIMEOUT REACHED
    DRAINED 100 flow
    TIMEOUT REACHED
    DRAINED 34 flow
    TIMEOUT REACHED
    DRAINED 72 flow
    ...

In effect, this means that a time window of `bufferDrainTimeout`
(default to 1s) is considered to sort flows before forwarding them to
the client. This dramatically improves the query experience for requests
in follow-mode.

Suggested-by: Tom Payne <tom@isovalent.com>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
@rolinh rolinh added kind/backports This PR provides functionality previously merged into master. backport/1.8 labels Nov 3, 2020
@rolinh rolinh requested a review from a team as a code owner November 3, 2020 13:27
@rolinh rolinh requested a review from jrfastab November 3, 2020 13:27
@tklauser
Copy link
Member

tklauser commented Nov 3, 2020

test-backport-1.8

Copy link
Contributor

@jrfastab jrfastab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 6, 2020
@jrfastab jrfastab merged commit f844457 into v1.8 Nov 6, 2020
@jrfastab jrfastab deleted the pr/1.8-hubble-relay-buffer-drain-backport branch November 6, 2020 20:27
@aanm aanm mentioned this pull request Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants