Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hubble/recorder: Extend the API to allow stopping a recording automatically #16473

Merged
merged 3 commits into from Aug 10, 2021

Conversation

gandro
Copy link
Member

@gandro gandro commented Jun 8, 2021

This implements a feature to stop a recording as soon as one of
three optional stop conditions are hit. If a stop condition is hit, the
RecordingStoppedResponse is sent to the client the same way
it would if the client requested an explicit stop request.

The StartRecording API message is extended with a new (optional) StopCondition field:

// StopCondition defines one or more conditions which cause the recording to
// stop after they have been hit. Stop conditions are ignored if they are
// absent or zero-valued. If multiple conditions are defined, the recording
// stops after the first one is hit.
message StopCondition {
    // bytes_captured stops the recording after at least this many bytes have
    // been captured. Note: The resulting file might be slightly larger due
    // to added pcap headers.
    uint64 bytes_captured = 1;
    // bytes_captured stops the recording after at least this many packets have
    // been captured.
    uint64 packets_captured = 2;
    // nanoseconds_elapsed stops the recording after this duration has elapsed.
    uint64 nanoseconds_elapsed = 3;
}

We ensure that the stop conditions are hit as early as possible,
to avoid writing more bytes or packets than necessary.
While this can also be implemented client-side by observing the
statistics of the running recording and issuing an explicit stop
request, it can be useful to implement some basic stop conditions
server-side as a fail-safe in case the client becomes unresponsive but
is still connected.

@gandro gandro added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay labels Jun 8, 2021
@stale

This comment has been minimized.

@stale stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 11, 2021
@gandro gandro removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 12, 2021
@gandro gandro force-pushed the pr/gandro/hubble-recorder-stop-condition branch from 1ec77a5 to 426d55c Compare July 28, 2021 10:22
@gandro gandro marked this pull request as ready for review July 28, 2021 10:22
@gandro gandro requested a review from a team as a code owner July 28, 2021 10:22
@gandro gandro requested review from a team, nathanjsweet and glibsm July 28, 2021 10:22
@gandro
Copy link
Member Author

gandro commented Jul 28, 2021

test-me-please

Comment on lines 63 to 69
// bytes_captured stops the recording after at least this many bytes have
// been captured. Note: The resulting file might be slightly larger due
// to added pcap headers.
uint64 bytes_captured = 1;
// bytes_captured stops the recording after at least this many packets have
// been captured.
uint64 packets_captured = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think that bytes_captured_count and packets_captured_count would be better names because the current ones sound a bit like they'd be boolean values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also num_bytes and num_captured could work or even bytes/packets one word

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will change!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with packets/bytes_captured_count. The reason I prefer to actually include captured is to make clear that it refers to these values here:

message RecordingStatistics {
// bytes_captured is the total amount of bytes captured in the recording
uint64 bytes_captured = 1;
// packets_captured is the total amount of packets captured the recording
uint64 packets_captured = 2;
// packets_lost is the total amount of packets matching the filter during
// the recording, but never written to the sink because it was overloaded.
uint64 packets_lost = 3;
// bytes_lost is the total amount of bytes matching the filter during
// the recording, but never written to the sink because it was overloaded.
uint64 bytes_lost = 4;
}

We might want to add other filters in the futures, e.g. bytes_lost_count or maybe even bytes_written_count (for the actual bytes written to disk, which includes things like pcap headers)

api/v1/recorder/recorder.proto Outdated Show resolved Hide resolved
Comment on lines 63 to 69
// bytes_captured stops the recording after at least this many bytes have
// been captured. Note: The resulting file might be slightly larger due
// to added pcap headers.
uint64 bytes_captured = 1;
// bytes_captured stops the recording after at least this many packets have
// been captured.
uint64 packets_captured = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also num_bytes and num_captured could work or even bytes/packets one word

pkg/hubble/recorder/sink/dispatch.go Show resolved Hide resolved
@gandro
Copy link
Member Author

gandro commented Jul 28, 2021

@aanm
Copy link
Member

aanm commented Jul 28, 2021

/mlh new-flake Cilium-PR-K8s-1.19-kernel-5.4

👍 created #17010

This defines an additional set of conditions which can be submitted when
a recording is started to automatically stop it after any of the
conditions have been hit.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This implements a feature to stop a recording as soon as one of the
three stop conditions are hit. We push the conditions into the sink, to
ensure they are hit as early as possible, e.g. to avoid writing more
bytes or packets than necessary.

While this can also be implemented client-side by observing the
statistics of the running recording and issuing an explicit stop
request, it can be useful to implement some basic stop conditions
server-side as a fail-safe in case the client becomes unresponsive but
is still connected.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This ensures the stop timer is not started before it can receive any
data.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/hubble-recorder-stop-condition branch from 426d55c to 427caea Compare July 28, 2021 16:43
@gandro gandro requested review from glibsm and rolinh July 28, 2021 16:44
@gandro
Copy link
Member Author

gandro commented Aug 9, 2021

test-me-please

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Aug 9, 2021
@gandro gandro merged commit 3574300 into cilium:master Aug 10, 2021
gandro added a commit to cilium/hubble that referenced this pull request Aug 11, 2021
This pulls in the Hubble API definitions from
cilium/cilium#16473

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to cilium/hubble that referenced this pull request Aug 11, 2021
This pulls in the Hubble API definitions from
cilium/cilium#16473

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
michi-covalent pushed a commit to cilium/hubble that referenced this pull request Aug 11, 2021
This pulls in the Hubble API definitions from
cilium/cilium#16473

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants