Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update omrsysinfo_cgroup_subsystem_iterator functions for cgroup v2 #6494

Merged
merged 1 commit into from
May 17, 2022

Conversation

EricYangIBM
Copy link
Contributor

Update cgroup iterator functions to retrieve metrics from cgroup v2
files.

Issue: #1281
Signed-off-by: Eric Yang eric.yang@ibm.com

@EricYangIBM
Copy link
Contributor Author

@babsingh iterator for v2 memory subsystem is done and working. Can you take an early look?

@babsingh
Copy link
Contributor

Can you take an early look?

yes, I will do one by the end of day.

Copy link
Contributor

@babsingh babsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first review pass.

port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
Copy link
Contributor

@babsingh babsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor formatting suggestions. Pinged @0xdaryl on Slack for #6465 amd #6479 to be merged so that this PR can be rebased. More feedback on the functional behaviour to follow.

port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
Copy link
Contributor

@babsingh babsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For verification, can you provide the Cgroup Information Section from a javacore for cgroup v1 and v2?

Examples in https://blog.openj9.org/2019/04/22/cgroup-metrics-now-available-in-javacore/.

port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Outdated Show resolved Hide resolved
port/unix/omrsysinfo.c Show resolved Hide resolved
@EricYangIBM
Copy link
Contributor Author

Cgroup v1

Host

2CICGRPINFO    subsystem : cpu
2CICGRPINFO    cgroup name : /user.slice
3CICGRPINFO        CPU Period : 100000 microseconds
3CICGRPINFO        CPU Quota : Not Set
3CICGRPINFO        CPU Shares : 1024
3CICGRPINFO        Period intervals elapsed count : 0
3CICGRPINFO        Throttled count : 0
3CICGRPINFO        Total throttle time : 0 nanoseconds
2CICGRPINFO    subsystem : memory
2CICGRPINFO    cgroup name : /user.slice/user-0.slice/session-1.scope
3CICGRPINFO        Memory Limit : Not Set
3CICGRPINFO        Memory + Swap Limit : Not Set
3CICGRPINFO        Memory Usage : 1309958144 bytes
3CICGRPINFO        Memory + Swap Usage : 1309958144 bytes
3CICGRPINFO        Memory Max Usage : 1309958144 bytes
3CICGRPINFO        Memory + Swap Max Usage : 1309958144 bytes
3CICGRPINFO        Memory limit exceeded count : 0
3CICGRPINFO        Memory + Swap limit exceeded count : 0
3CICGRPINFO        OOM Killer Disabled : 0
3CICGRPINFO        Under OOM : 0
2CICGRPINFO    subsystem : cpuset
2CICGRPINFO    cgroup name : /
3CICGRPINFO        CPU exclusive : 1
3CICGRPINFO        Mem exclusive : 1
3CICGRPINFO        CPUs : 0-7
3CICGRPINFO        Mems : 0

Container no limits

2CICGRPINFO    subsystem : cpu
2CICGRPINFO    cgroup name : /
3CICGRPINFO        CPU Period : 100000 microseconds
3CICGRPINFO        CPU Quota : Not Set
3CICGRPINFO        CPU Shares : 1024
3CICGRPINFO        Period intervals elapsed count : 0
3CICGRPINFO        Throttled count : 0
3CICGRPINFO        Total throttle time : 0 nanoseconds
2CICGRPINFO    subsystem : memory
2CICGRPINFO    cgroup name : /
3CICGRPINFO        Memory Limit : Not Set
3CICGRPINFO        Memory + Swap Limit : Not Set
3CICGRPINFO        Memory Usage : 35131392 bytes
3CICGRPINFO        Memory + Swap Usage : 35131392 bytes
3CICGRPINFO        Memory Max Usage : 35135488 bytes
3CICGRPINFO        Memory + Swap Max Usage : 35135488 bytes
3CICGRPINFO        Memory limit exceeded count : 0
3CICGRPINFO        Memory + Swap limit exceeded count : 0
3CICGRPINFO        OOM Killer Disabled : 0
3CICGRPINFO        Under OOM : 0
2CICGRPINFO    subsystem : cpuset
2CICGRPINFO    cgroup name : /
3CICGRPINFO        CPU exclusive : 0
3CICGRPINFO        Mem exclusive : 0
3CICGRPINFO        CPUs : 0-7
3CICGRPINFO        Mems : 0

Container with limits docker run -v /root/docker_hostdir:/root/hostdir -it -m 4GB --cpu-period=1000 --cpu-quota=10000 --cpuset-cpus=1,2,4-5 --rm openj9

2CICGRPINFO    subsystem : cpu
2CICGRPINFO    cgroup name : /
3CICGRPINFO        CPU Period : 1000 microseconds
3CICGRPINFO        CPU Quota : 10000 microseconds
3CICGRPINFO        CPU Shares : 1024
3CICGRPINFO        Period intervals elapsed count : 92
3CICGRPINFO        Throttled count : 0
3CICGRPINFO        Total throttle time : 0 nanoseconds
2CICGRPINFO    subsystem : memory
2CICGRPINFO    cgroup name : /
3CICGRPINFO        Memory Limit : 4294967296 bytes
3CICGRPINFO        Memory + Swap Limit : 8589934592 bytes
3CICGRPINFO        Memory Usage : 24539136 bytes
3CICGRPINFO        Memory + Swap Usage : 24539136 bytes
3CICGRPINFO        Memory Max Usage : 24543232 bytes
3CICGRPINFO        Memory + Swap Max Usage : 24543232 bytes
3CICGRPINFO        Memory limit exceeded count : 0
3CICGRPINFO        Memory + Swap limit exceeded count : 0
3CICGRPINFO        OOM Killer Disabled : 0
3CICGRPINFO        Under OOM : 0
2CICGRPINFO    subsystem : cpuset
2CICGRPINFO    cgroup name : /
3CICGRPINFO        CPU exclusive : 0
3CICGRPINFO        Mem exclusive : 0
3CICGRPINFO        CPUs : 1-2,4-5
3CICGRPINFO        Mems : 0

Cgroup v2

Host

2CICGRPINFO    subsystem : memory
2CICGRPINFO    cgroup name : /user.slice/user-0.slice/session-876.scope
3CICGRPINFO        Memory Limit : Not Set
3CICGRPINFO        Swap Limit : Not Set
3CICGRPINFO        Memory Usage : 1619886080 bytes
3CICGRPINFO        Swap Usage : 64802816 bytes
3CICGRPINFO        Approached memory limit count : 1246
3CICGRPINFO        Reached memory limit count : 0
3CICGRPINFO        Approached swap limit count : 0
3CICGRPINFO        Swap alloc failed count : 0

Container no limits

2CICGRPINFO    subsystem : memory
2CICGRPINFO    cgroup name : /
3CICGRPINFO        Memory Limit : Not Set
3CICGRPINFO        Swap Limit : Not Set
3CICGRPINFO        Memory Usage : 2601267200 bytes
3CICGRPINFO        Swap Usage : 0 bytes
3CICGRPINFO        Approached memory limit count : 0
3CICGRPINFO        Reached memory limit count : 0
3CICGRPINFO        Approached swap limit count : 0
3CICGRPINFO        Swap alloc failed count : 0
2CICGRPINFO    subsystem : cpuset
2CICGRPINFO    cgroup name : /
3CICGRPINFO        CPUs : Not Set
3CICGRPINFO        Mems : Not Set
3CICGRPINFO        Effective CPUs : 0-7
3CICGRPINFO        Effective Mems : 0
2CICGRPINFO    subsystem : cpu
2CICGRPINFO    cgroup name : /
3CICGRPINFO        CPU Quota : Not Set
3CICGRPINFO        CPU Period : 100000 microseconds
3CICGRPINFO        CPU Weight relative to procs in same cgroup : 100
3CICGRPINFO        Period intervals elapsed count : 0
3CICGRPINFO        Throttled count : 0
3CICGRPINFO        Total throttle time : 0 microseconds

Container with limits docker run -v /root/docker_hostdir:/root/hostdir -it -m 4GB --cpu-period=1000 --cpu-quota=10000 --cpuset-cpus=1,2,4-5 --rm openj9

2CICGRPINFO    subsystem : memory
2CICGRPINFO    cgroup name : /
3CICGRPINFO        Memory Limit : 4294967296 bytes
3CICGRPINFO        Swap Limit : 4294967296 bytes
3CICGRPINFO        Memory Usage : 22908928 bytes
3CICGRPINFO        Swap Usage : 0 bytes
3CICGRPINFO        Approached memory limit count : 0
3CICGRPINFO        Reached memory limit count : 0
3CICGRPINFO        Approached swap limit count : 0
3CICGRPINFO        Swap alloc failed count : 0
2CICGRPINFO    subsystem : cpuset
2CICGRPINFO    cgroup name : /
3CICGRPINFO        CPUs : 1-2,4-5
3CICGRPINFO        Mems : Not Set
3CICGRPINFO        Effective CPUs : 1-2,4-5
3CICGRPINFO        Effective Mems : 0
2CICGRPINFO    subsystem : cpu
2CICGRPINFO    cgroup name : /
3CICGRPINFO        CPU Quota : 10000 microseconds
3CICGRPINFO        CPU Period : 1000 microseconds
3CICGRPINFO        CPU Weight relative to procs in same cgroup : 100
3CICGRPINFO        Period intervals elapsed count : 103
3CICGRPINFO        Throttled count : 1
3CICGRPINFO        Total throttle time : 224 microseconds

@babsingh
Copy link
Contributor

re #6494 (comment):

For Cgroup v2 Host output, why cpu and cpuset subsystems are not shown?

@EricYangIBM
Copy link
Contributor Author

Differences in v2 hierarchy. In v2 if a subsystem is not in cgroup.controllers it means the subsystem is not active / limits are not set for that cgroup

@babsingh
Copy link
Contributor

OSX failures: https://github.com/eclipse/omr/pull/6494/checks?check_run_id=6395271683

These failures are related to the socket library, and probably caused due to network issues.

31: [  FAILED  ] PortSockTest.poll_functionality_basic (1001 ms)
31: /Users/runner/work/1/s/fvtest/porttest/omrsockTest.cpp:51: Failure
31:       Expected: privateOmrPortLibrary->sock_bind(privateOmrPortLibrary, *serverSocket, serverSockAddr)
31:       Which is: -506
31: To be equal to: 0
31: /Users/runner/work/1/s/fvtest/porttest/omrsockTest.cpp:1259: Failure
31:       Expected: privateOmrPortLibrary->sock_accept(privateOmrPortLibrary, serverSocket, &connectedServerSockAddr, &connectedServerSocket)
31:       Which is: -20
31: To be equal to: 0
31: [  FAILED  ] PortSockTest.poll_functionality_many_sockets (1 ms)
31: [----------] 19 tests from PortSockTest (1006 ms total)
31: 
31: [==========] 236 tests from 20 test cases ran. (91357 ms total)
31: [  PASSED  ] 234 tests.
31: [  FAILED  ] 2 tests, listed below:
31: [  FAILED  ] PortSockTest.poll_functionality_basic
31: [  FAILED  ] PortSockTest.poll_functionality_many_sockets

@babsingh
Copy link
Contributor

jenkins build all

@babsingh
Copy link
Contributor

Compilation error:

08:42:34  /home/jenkins/workspace/Build/port/unix/omrsysinfo.c:6835:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
08:42:34      for (int32_t i = 0; i < state->fileMetricCounter; i++) {
08:42:34      ^

@babsingh
Copy link
Contributor

jenkins build all

@babsingh
Copy link
Contributor

jenkins build all

@babsingh
Copy link
Contributor

@EricYangIBM OSX failures consistently happen. Can you confirm that the OSX failures reported in #6494 (comment) are not occurring due to this PR?

@EricYangIBM EricYangIBM force-pushed the iteratorCgroup branch 2 times, most recently from fd0f8c5 to 4e84385 Compare May 12, 2022 17:00
@EricYangIBM
Copy link
Contributor Author

Rebasing didn't change anything but reverting the commit stopped the failures. All of the commit's changes are ifdefd for linux only so I don't know how this PR could cause mac failures

@EricYangIBM
Copy link
Contributor Author

Seems like the most recent builds passed

@babsingh
Copy link
Contributor

jenkins build all

@babsingh
Copy link
Contributor

babsingh commented May 13, 2022

Restarting the zOS job since it timed out in finding a node.

@babsingh
Copy link
Contributor

jenkins build zos

@EricYangIBM
Copy link
Contributor Author

All zos nodes are offline

@babsingh
Copy link
Contributor

All zos nodes are offline

@jdekonin @AdamBrousseau Are the zOS machines strategically turned off? or are we experiencing an outage? Is there a timeline when they will be online?

@jdekonin
Copy link
Contributor

If there was an outage I wasn't advised or maybe I missed the email?? It would appear that both are online now though.

@babsingh
Copy link
Contributor

jenkins build zos

Copy link
Contributor

@dsouzai dsouzai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks fine; just had a few questions.

port/unix/omrsysinfo.c Show resolved Hide resolved
port/unix/omrsysinfo.c Show resolved Hide resolved
port/unix/omrsysinfo.c Show resolved Hide resolved
port/unix/omrsysinfo.c Show resolved Hide resolved
port/unix/omrsysinfo.c Show resolved Hide resolved
port/unix/omrsysinfo.c Show resolved Hide resolved
Update cgroup iterator functions to retrieve metrics from cgroup v2
files.

Issue: eclipse-omr#1281
Signed-off-by: Eric Yang <eric.yang@ibm.com>
@dsouzai
Copy link
Contributor

dsouzai commented May 16, 2022

jenkins build all

@mpirvu
Copy link
Contributor

mpirvu commented May 17, 2022

Tests have passed.
@EricYangIBM is this the last PR that is needed to have cgroup v2 support in OpenJ9? Thanks

@EricYangIBM
Copy link
Contributor Author

Yes, all that remains after this is related to testing (both v1 and v2)

@dsouzai dsouzai merged commit 5e51a12 into eclipse-omr:master May 17, 2022
@mpirvu
Copy link
Contributor

mpirvu commented May 17, 2022

Great work! Thanks!

@EricYangIBM EricYangIBM deleted the iteratorCgroup branch May 17, 2022 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants