Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add oom monitor for rustshim #229

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

aa624545345
Copy link
Contributor

Registering and reporting cgroup OOM events for Rust shim。 #227

Copy link
Member

@Burning1020 Burning1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your pr, some of the methods already exist, so please change your code, thanks.

crates/runc-shim/src/container.rs Outdated Show resolved Hide resolved
crates/runc-shim/src/cgroup_memory.rs Show resolved Hide resolved
crates/runc-shim/src/task.rs Outdated Show resolved Hide resolved
crates/runc-shim/src/task.rs Show resolved Hide resolved
crates/runc-shim/src/cgroup_memory.rs Outdated Show resolved Hide resolved
crates/runc-shim/src/cgroup_memory.rs Outdated Show resolved Hide resolved
crates/runc-shim/src/cgroup_memory.rs Outdated Show resolved Hide resolved
crates/runc-shim/src/cgroup_memory.rs Outdated Show resolved Hide resolved
@mxpv mxpv added the C-runc-shim Runc shim label Jan 3, 2024
@aa624545345 aa624545345 force-pushed the cgroup-memory-oom-monitor branch 2 times, most recently from 9d1ccc9 to 7833296 Compare January 4, 2024 09:16
@aa624545345
Copy link
Contributor Author

@mxpv @Burning1020 Hello, are there any further plans for this submission?

crates/runc-shim/src/cgroup_memory.rs Show resolved Hide resolved
crates/runc-shim/src/cgroup_memory.rs Show resolved Hide resolved
crates/runc-shim/src/task.rs Outdated Show resolved Hide resolved
crates/runc-shim/src/cgroup_memory.rs Outdated Show resolved Hide resolved
crates/runc-shim/src/cgroup_memory.rs Outdated Show resolved Hide resolved
Copy link
Member

@mxpv mxpv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few suggestions.
Also I'd love to see some unit tests if possible.

crates/runc-shim/src/cgroup_memory.rs Outdated Show resolved Hide resolved
crates/runc-shim/src/cgroup_memory.rs Outdated Show resolved Hide resolved
crates/runc-shim/src/cgroup_memory.rs Outdated Show resolved Hide resolved
crates/runc-shim/src/cgroup_memory.rs Outdated Show resolved Hide resolved
@Burning1020
Copy link
Member

Left a few suggestions. Also I'd love to see some unit tests if possible.

Here I suggest the UT could be:

  1. mock a remote publisher
  2. create a temp memory cgroup and set the limit, like 10K
  3. keep watching this cgroup path
  4. fork a child process that would apply for high memory, like 11K
match fork().unwrap() {
            ForkResult::Parent { child } => {
                let pid = child.as_raw() as u64;
                cg.add_task_by_tgid(CgroupPid::from(pid)).unwrap(); // add child to cgroup
                waitpid(Some(child), None).unwrap();
            }
            ForkResult::Child => {
                std::thread::sleep(Duration::from_millis(100));
                let mut v: Vec<u8> = vec![];
                v.resize(11 * 1024 * 1024, 0);
                exit(0);
            }
        };
  1. make sure OOM event received
  2. reasource cleanup

@aa624545345
Copy link
Contributor Author

Left a few suggestions. Also I'd love to see some unit tests if possible.

Here I suggest the UT could be:

  1. mock a remote publisher
  2. create a temp memory cgroup and set the limit, like 10K
  3. keep watching this cgroup path
  4. fork a child process that would apply for high memory, like 11K
match fork().unwrap() {
            ForkResult::Parent { child } => {
                let pid = child.as_raw() as u64;
                cg.add_task_by_tgid(CgroupPid::from(pid)).unwrap(); // add child to cgroup
                waitpid(Some(child), None).unwrap();
            }
            ForkResult::Child => {
                std::thread::sleep(Duration::from_millis(100));
                let mut v: Vec<u8> = vec![];
                v.resize(11 * 1024 * 1024, 0);
                exit(0);
            }
        };
  1. make sure OOM event received
  2. reasource cleanup

Thanks for your comments.

@aa624545345
Copy link
Contributor Author

@Burning1020 @mxpv Hello, there are two points that i]m not quite sure about this PR:

  1. This commit only works on cgroup v1, i think it is necessary to add detection for thel cgroup version of the rutime enviroment.(Impelemention of cgroup v2 canbe supported in a future update).
  2. Regarding the mentioned UT tests, should i commit it in another PR?

@Burning1020
Copy link
Member

@Burning1020 @mxpv Hello, there are two points that i]m not quite sure about this PR:

  1. This commit only works on cgroup v1, i think it is necessary to add detection for thel cgroup version of the rutime enviroment.(Impelemention of cgroup v2 canbe supported in a future update).
  2. Regarding the mentioned UT tests, should i commit it in another PR?
  1. Yes, I agree, cgroup v2 can be supported in future.
  2. It's better to keep UT with your code in the same PR.

@aa624545345 aa624545345 force-pushed the cgroup-memory-oom-monitor branch 4 times, most recently from 796f337 to e6cc1b4 Compare February 1, 2024 09:34
@codecov-commenter
Copy link

codecov-commenter commented Feb 1, 2024

Codecov Report

Attention: 154 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (main@9ac1f26). Click here to learn what that means.

Files Patch % Lines
crates/runc-shim/src/cgroup_memory.rs 3.07% 126 Missing ⚠️
crates/runc-shim/src/task.rs 0.00% 28 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #229   +/-   ##
=======================================
  Coverage        ?   36.82%           
=======================================
  Files           ?       56           
  Lines           ?     5241           
  Branches        ?        0           
=======================================
  Hits            ?     1930           
  Misses          ?     3311           
  Partials        ?        0           
Flag Coverage Δ
unittests 36.82% <2.53%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aa624545345
Copy link
Contributor Author

aa624545345 commented Feb 4, 2024

@Burning1020 @mxpv @kzys hello, commit contains ut is pushed.

@aa624545345
Copy link
Contributor Author

@mxpv @Burning1020 Revised according to the feedback, please review, thanks.

@mxpv
Copy link
Member

mxpv commented Feb 22, 2024

CI needs some work.
Other than that, looks good.

Copy link
Member

@Burning1020 Burning1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@aa624545345 aa624545345 force-pushed the cgroup-memory-oom-monitor branch 3 times, most recently from ffb2407 to dbd3a4e Compare February 26, 2024 10:49
@aa624545345
Copy link
Contributor Author

aa624545345 commented Feb 27, 2024

@Burning1020 @mxpv
To fix CI, make the following modifications:

  1. Use fmt to format code;
  2. Add conditional compilation #[cfg(target_os = "linux")];
  3. Handling this error scenario, when registering oom monitor, trying to query the cgroup information of the container's main process, if the container has already exited, get_path_from_cgorup can't find the cgroup information of this container's main process, then this line will return the error directly:
let path_from_cgorup = cgroup_memory::get_path_from_cgorup(resp.pid).await?;

Modify the logic of registring the oom monitor to, during the construction of oom monitor, when error happens, not return, but instead log the error msgs.

#[cfg(target_os = "linux")]
async fn monitor_oom(id: &String, pid: u32, tx: EventSender) -> Result<()> {
    if !is_cgroup2_unified_mode() {
        let path_from_cgorup = cgroup_memory::get_path_from_cgorup(pid).await?;
        let (mount_root, mount_point) =
            cgroup_memory::get_existing_cgroup_mem_path(path_from_cgorup).await?;

        let mem_cgroup_path = mount_point + &mount_root;
        let rx = cgroup_memory::register_memory_event(
            id,
            Path::new(&mem_cgroup_path),
            "memory.oom_control",
        )
        .await
        .map_err(other_error!(e, "register_memory_event failed:"))?;

        run_oom_monitor(rx, id.to_string(), tx);
    }
    Ok(())
}
async fn start(&self, _ctx: &TtrpcContext, req: StartRequest) -> TtrpcResult<StartResponse> {
......
            #[cfg(target_os = "linux")]
            if let Err(e) = monitor_oom(&req.id, resp.pid, self.tx.clone()).await {
                error!("monitor_oom failed: {:?}.", e);
            }
......
}

@mxpv mxpv added this pull request to the merge queue Mar 6, 2024
Merged via the queue into containerd:main with commit 21973bd Mar 6, 2024
18 checks passed
@aa624545345
Copy link
Contributor Author

@mxpv @Burning1020 Thx for the review, is cgroup2 version of oom monitor needed?

@mxpv
Copy link
Member

mxpv commented Mar 12, 2024

@aa624545345 what would be nice to have!

Burning1020 added a commit to Burning1020/rust-extensions that referenced this pull request May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-runc-shim Runc shim
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants