Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the ability to specify workload type for GetPosixStreamSignals #391

Merged
merged 23 commits into from
Feb 14, 2023

Conversation

JeroenSoeters
Copy link
Contributor

@JeroenSoeters JeroenSoeters commented Feb 10, 2023

This PR allows for specifying a workload when retrieving the POSIX signals stream. Right now only 1 type of workload is supported (Cell) and only a single Cell can be specified.

To achieve this the eBPF probe now surfaces the cgroup_id and the observe_service has a cache that allows for looking up the cgroup_path for a cgroup_id. With the cgroup_path we can filter out signals relevant to a specific cell.

message GetPosixSignalsStreamRequest {
/// The scope of this request. If no scope is specified, a stream of all POXIS
/// signals on the host will be returned.
oneof workload {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might generate nicer code as an enum and an opaque string "id". oneof is pretty nasty, especially in some languages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could introduce a "Workload" message with an enum and a string "id" and have this repeated. I need to think this thru a bit though what this means in terms of implementation details for this specific endpoint, as it could get pretty tricky (and not very intuitive) to aggregate POSIX signals for a mix of cells, containers, vms etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we start with the ability to stream signals for a single workload, we can potentially layer the ability to capture signals from multiple workloads on top in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about how the API looks now @dmah42? I will create a separate issue to discuss observing multiple workloads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we want to avoid using oneof b/c of how this gets compiled in different languages, should we capture this in the stdlib docs as a "general rule"? https://aurae.io/stdlib/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably point to https://google.aip.dev/ and specifically https://google.aip.dev/search?q=oneof which doesn't say avoid it, but it does provide some trade-offs.

api/v0/observe/observe.proto Outdated Show resolved Hide resolved
e2e-tests/Cargo.toml Show resolved Hide resolved
e2e-tests/Cargo.toml Outdated Show resolved Hide resolved
message GetPosixSignalsStreamRequest {
/// The workload to which te response will be scoped. If no workload is
/// specified, a stream of all POSIX signals on the host will be returned.
Workload workload = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i personally prefer this (with notes) but i think it should be repeated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we iterate on the API later? Making this repeated warrants some discussion imo. Like how would the response look like in these cases? For a single workload, the response is just a stream of signals. If you would be able to specify multiple cells/containers etc we would need to return tuples (id, signal). If we allow for a mix of workload types we need to return 3-tuples (workload_type, id, signal) for the user to make sense of the response. Maybe something for an RFC?

auraed/src/observe/cgroup_cache.rs Outdated Show resolved Hide resolved
auraed/src/observe/cgroup_cache.rs Show resolved Hide resolved
auraed/src/observe/cgroup_cache.rs Show resolved Hide resolved
@JeroenSoeters JeroenSoeters marked this pull request as ready for review February 13, 2023 04:47
Copy link
Contributor

@future-highway future-highway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving comments. Didn't do a full review.

e2e-tests/Cargo.toml Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
mod observe;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setup lints

// Lint groups: https://doc.rust-lang.org/rustc/lints/groups.html
#![warn(future_incompatible, nonstandard_style, unused)]
#![warn(
    improper_ctypes,
    non_shorthand_field_patterns,
    no_mangle_generic_items,
    unconditional_recursion,
    unused_comparisons,
    while_true
)]
#![warn(
    missing_debug_implementations,
    missing_docs,
    trivial_casts,
    trivial_numeric_casts,
    unused_extern_crates,
    unused_import_braces,
    unused_results
)]
#![warn(clippy::unwrap_used)]

ebpf-shared/src/lib.rs Outdated Show resolved Hide resolved
ebpf-shared/src/lib.rs Outdated Show resolved Hide resolved
@krisnova krisnova merged commit 86e9528 into aurae-runtime:main Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants