Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for cgroup subsystem availability #6520

Merged
merged 1 commit into from
Jun 6, 2022

Conversation

EricYangIBM
Copy link
Contributor

@EricYangIBM EricYangIBM commented May 17, 2022

Add tests for omrsysinfo_cgroup_is_system_available, get_available_subsystems,
are_subsystems_available, get_enabled_subsystems, enable_subsystems, and
are_subsystems_enabled. Also add test for omrsysinfo_get_cgroup_subsystem_list.

Issue: #1281
Signed-off-by: Eric Yang eric.yang@ibm.com

@EricYangIBM
Copy link
Contributor Author

@babsingh Most of the test code is copied from the implementation. If I were to continue to add tests this way there will be a lot of test code, especially for the iterator functions where I would have to copy over the structs as well. However there is no way to check if the api returns correct values other than reading from the cgroup and cgroup metric files. Would it be better to just check that the values returned by the api make sense rather than checking if they are the same as the values in the cgroup files?

@babsingh
Copy link
Contributor

@EricYangIBM Copying code from the actual implementation is not the only path. In the tests, we just need an alternate implementation to derive the values from cgroup and cgroup metric files for verification purposes. Since this is test code, we can incorporate more expensive approaches such as using C++ libraries and third-party libraries to achieve our goals. This would reduce the amount of work to derive the values from cgroup and cgroup metric files.

Example libraries which can be used for verification:

Copying code from the actual implementation also has a drawback: if our actual implementation has a bug it won't be caught in our tests.

I will leave it up to you on which libraries to incorporate in the test code. We want to choose an approach which will minimize test code and our effort. For cgroup, we only need something that works on all Linux platforms (x, p, z, arm, arch). But, it will be preferable to use libraries which will work on all OMR supported platforms. This will allow your test examples to be used by others in the future due to their platform agnostic property.

Would it be better to just check that the values returned by the api make sense rather than checking if they are the same as the values in the cgroup files?

This won't offer complete verification. There is a probability that the values returned by the API may seem valid (make sense) but they can still be incorrect.

@EricYangIBM
Copy link
Contributor Author

I haven't been able to find any third party libraries to match our api. If we use the c++ standard libraries the test code will still have the same logic as our api's and will have similar complexity and amount of code.

@babsingh
Copy link
Contributor

babsingh commented May 18, 2022

Check that the values returned by the api make sense rather than checking if they are the same as the values in the cgroup files

@EricYangIBM Can you provide more details on how the values will be checked? We can get a second opinion if this approach will be sufficient.

@pshipton @tajila for a second opinion and ideas for easily achieving complete verification.

Note: Values generated by some of the cgroup API can be used by autoscalers which rely upon information about system resources (CPU, memory) to make decisions. Having incorrect values will lead to bad decisions in terms of perf, or potentially fatal decisions if the resource limit is exceeded.

@EricYangIBM
Copy link
Contributor Author

Maybe something like https://github.com/eclipse/omr/blob/974dee8597698debd9babcd12af9d4aa4b207cb4/fvtest/porttest/si.cpp#L1245? It checks that no errors are returned and that the memory limit is >0 and the usage is < limit. There will be no way to check some of the metrics e.g. Throttled count, oom_kill_disable (a boolean).
Also the subsystem availability functions cannot be tested this way either, since a return value of 0 is also valid. I would only be able to check for (>=0 && <=7) for the bits of the three subsystems. Perhaps I could convert the code currently in this PR to c++ to check for available subsystems?

@babsingh
Copy link
Contributor

We can prioritize metric values related to perf (CPU, memory) with stricter verification. We can minimize work by using a set of generic functions which will read a specified file and apply a regex to derive the required value.

Others can have simple bound checks as the sysinfo_testMemoryInfo example.

For the no way to check metrics, we can add a note suggesting why they can't be verified.

Perhaps I could convert the code currently in this PR to c++ to check for available subsystems?

Sure. All cgroup tests will be consistent i.e. written in C++. It will be an alternate impl. The drawback of reusing the current impl won't apply. It should be less code. Plus, you will get C++ experience.

@EricYangIBM EricYangIBM force-pushed the cgroup_test_availability branch 2 times, most recently from 191eba3 to 4ede54e Compare May 24, 2022 17:05
@EricYangIBM
Copy link
Contributor Author

Added tests

@EricYangIBM EricYangIBM marked this pull request as ready for review May 25, 2022 15:32
@babsingh
Copy link
Contributor

babsingh commented May 25, 2022

Added tests

@EricYangIBM Is this PR good to be reviewed? I see pull request as ready for review 4 hours ago. I will review it.

@EricYangIBM
Copy link
Contributor Author

Yes

@babsingh babsingh self-assigned this May 25, 2022
fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
@babsingh
Copy link
Contributor

jenkins build all

Copy link
Contributor

@babsingh babsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are failing: https://ci.eclipse.org/omr/job/PullRequest-linux_x86/2795/consoleFull

14:18:25  6: [  FAILED  ] CgroupTest.sysinfo_cgroup_get_available_subsystems
14:18:25  6: [  FAILED  ] CgroupTest.sysinfo_cgroup_are_subsystems_available
14:18:25  6: [  FAILED  ] CgroupTest.sysinfo_cgroup_enable

Infra changes, for running tests on cgroup v1/v2 systems, are still WIP: #6525.

For the time being, can you locally verify if these tests pass on both cgroup v1 and v2 systems?

Also, the commits can be squashed.

fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
fvtest/porttest/si.cpp Outdated Show resolved Hide resolved
@EricYangIBM
Copy link
Contributor Author

jenkins build linux

@EricYangIBM
Copy link
Contributor Author

jenkins build xlinux

@babsingh
Copy link
Contributor

jenkins build linux

@babsingh
Copy link
Contributor

jenkins build all

@babsingh
Copy link
Contributor

@EricYangIBM Not everyone can launch PR builds. Message me on Slack, and I will launch them for you.

@EricYangIBM
Copy link
Contributor Author

I think the error is in the regex instantiation: std::regex v1Regex(R"(^[0-9]+:([^:]*):(.+)$)");
I found this https://stackoverflow.com/questions/15671536/why-does-this-c11-stdregex-example-throw-a-regex-error-exception
Is the gcc version for the launched PR builds machines different than that of the automatic PR builds machines?

@babsingh
Copy link
Contributor

All the machines, where the tests are failing, have:

g++ (Ubuntu 4.8.5-4ubuntu8~16.04.1) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

gcc (Ubuntu 4.8.5-4ubuntu8~16.04.1) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

This gcc/g++ version is pretty old. We need a C++11 compliant compiler (gcc 4.8 is NOT!) to use std::regex.

@jdekonin @AdamBrousseau We were updating machines to use a newer Ubuntu OS and newer gcc/g++. What's the state of this work (expected timeline)? Are we planning to adopt gcc/g++-11?

fyi - @0xdaryl @tajila @pshipton

@babsingh
Copy link
Contributor

We have two machines:

These machines have gcc/g++-9. We can temporarily hack the build script to force xlinux PR builds on these machines.

CHANGE the below line: https://github.com/eclipse/omr/blob/dc315bf5cbe9bd629da73dcb186c77e0753ba3a1/buildenv/jenkins/omrbuild.groovy#L307

TO run tests on the cgroup v1 machine:

node("${SPECS[params.BUILDSPEC].label} && cgroup.v1") { 

TO run tests on the cgroup v2 machine:

node("${SPECS[params.BUILDSPEC].label} && cgroup.v2") { 

@babsingh
Copy link
Contributor

jenkins build all

@babsingh
Copy link
Contributor

jenkins build x32linux

@babsingh
Copy link
Contributor

fyi @jdekonin The following failures were seen on ub20-x64-omr1 and ub20-x64-omr2 with -DOMR_ENV_DATA32=ON (in linux_x86). I had to apt-get install gcc-multilib g++-multilib on these machines to resolve these failures.

Failures

11:20:25  /usr/lib/ccache/c++   -DJ9X86 -DLINUX -D_FILE_OFFSET_BITS=64 -D_X86_  -pthread -fno-strict-aliasing -m32 -msse2 -std=c++0x -fPIC   -Werror -Wall -MD -MT tools/tracegen/CMakeFiles/trace.dir/ArgParser.cpp.o -MF tools/tracegen/CMakeFiles/trace.dir/ArgParser.cpp.o.d -o tools/tracegen/CMakeFiles/trace.dir/ArgParser.cpp.o -c /home/jenkins/workspace/Build/tools/tracegen/ArgParser.cpp
11:20:25  In file included from /usr/include/c++/9/stdlib.h:36,
11:20:25                   from /home/jenkins/workspace/Build/tools/tracegen/ArgParser.cpp:23:
11:20:25  /usr/include/c++/9/cstdlib:41:10: fatal error: bits/c++config.h: No such file or directory
11:20:25     41 | #include <bits/c++config.h>
11:20:25        | 
11:20:25  compilation terminated.
11:20:25  [3/844] Building CXX object tools/tracegen/CMakeFiles/trace.dir/FileReader.cpp.o
11:20:25  FAILED: tools/tracegen/CMakeFiles/trace.dir/FileReader.cpp.o

Failing build: https://ci.eclipse.org/omr/job/PullRequest-linux_x86/2798/consoleFull

Ref: https://stackoverflow.com/questions/4643197/missing-include-bits-cconfig-h-when-cross-compiling-64-bit-program-on-32-bit

Resolved build (after installing gcc-multilib g++-multilib): https://ci.eclipse.org/omr/job/PullRequest-linux_x86/2799/consoleFull

There are other failures seen for the resolved build:

12:10:26  7: [----------] 9 tests from TraceLifecycleTest
12:10:26  7: /home/jenkins/workspace/Build/fvtest/rastest/traceLifecycleTest.cpp:544 OMR_Thread_Init(&testVM->omrVM, NULL, &vmthread, "attachDetachHelper") failed, rc=10 (OMR_ERROR_NOT_AVAILABLE)
...
12:10:26  7: [----------] 1 test from RASTraceTest
12:10:26  7: omr_trc_startup: failed to set trace options, rc=9
12:10:26  7: omr_trc_startup error, rc=9

@jdekonin
Copy link
Contributor

Thanks @babsingh. I've updated ub20-x64-omr3 -> ub20-x64-omr6 as well. Those are pending a jiro PR merge that should go in over the weekend.

@EricYangIBM
Copy link
Contributor Author

@babsingh the test failure is in porttest

12:10:26  6: [----------] 19 tests from PortVmemTest
12:10:26   6/18 Test  #6: porttest .........................***Exception: Other 26.21 sec

Is this the same failure you saw in #6525 (comment)?

@babsingh
Copy link
Contributor

Is this the same failure you saw in #6525 (comment)?

Current test output, from https://ci.eclipse.org/omr/job/PullRequest-linux_x86/2799/consoleFull, does not indicate the failing test. So, verifying via the PR build in #6533; it has verbose output enabled.

Also, std::regex was fully implemented in gcc-4.9: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53631#c17. Can we protect the tests from running on an unsupported compiler? Example: https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html. GTest already has this defined as GTEST_GCC_VER_; try to reuse it in the test.

#define GCC_VERSION (__GNUC__ * 10000 \
                     + __GNUC_MINOR__ * 100 \
                     + __GNUC_PATCHLEVEL__)
…
/* Test for GCC >= 4.9.0 */
#if GCC_VERSION >= 40900

@babsingh
Copy link
Contributor

jenkins build all

@EricYangIBM EricYangIBM force-pushed the cgroup_test_availability branch 3 times, most recently from 3fe8478 to db999a8 Compare May 30, 2022 19:01
Copy link
Contributor

@babsingh babsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • omrsysinfo_get_cgroup_subsystem_list looks good to me.
  • Both commits mention: add test for omrsysinfo_get_cgroup_subsystem_list. This should be fixed.

reportTestExit(OMRPORTLIB, testName);
return;
}
#endif /* !defined(LINUX) || (GTEST_GCC_VER_ >= 40900) */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should send a message to the OMR user that the Cgroup tests are disabled. Once all the machines in the OMR build-farm have newer compilers, we should attempt to change this to an #error.

Suggested change
#endif /* !defined(LINUX) || (GTEST_GCC_VER_ >= 40900) */
#else /* !defined(LINUX) || (GTEST_GCC_VER_ >= 40900) */
#warning "Cgroup tests are disabled due to an unsupported compiler."
#endif /* !defined(LINUX) || (GTEST_GCC_VER_ >= 40900) */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing locally:

error: #warning "Cgroup tests are disabled due to an unsupported compiler." [-Werror=cpp]
 #warning "Cgroup tests are disabled due to an unsupported compiler."

Won't the builds fail with this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref: https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Warnings-and-Errors.html

Warnings report other unusual conditions in your code that may indicate a problem, although compilation can (and does) proceed. Warning messages also report the source file name and line number, but include the text ‘warning:’ to distinguish them from error messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are compiling with -Werror, which converts all warnings to errors.

Copy link
Contributor

@babsingh babsingh May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work?

#pragma message("Cgroup tests are disabled due to an unsupported compiler.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me locally but may not be supported on older gcc. Pushed the changes, can you try running the builds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what happened with the windows build but the ppc failures also appear in other PRs

Add tests for omrsysinfo_cgroup_is_system_available, get_available_subsystems,
are_subsystems_available, get_enabled_subsystems, enable_subsystems, and
are_subsystems_enabled. Also add test for omrsysinfo_get_cgroup_subsystem_list.

Issue: eclipse-omr#1281
Signed-off-by: Eric Yang <eric.yang@ibm.com>
@babsingh
Copy link
Contributor

jenkins build all

@babsingh
Copy link
Contributor

x86 and PPC are known failures:

Window's build crashed very quickly (3.68 sec). Probably related to infra. Let me see if it goes away with a rerun.

@babsingh
Copy link
Contributor

jenkins build win

Copy link
Contributor

@babsingh babsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @0xdaryl Passing the PR to you for review/merge.

@0xdaryl 0xdaryl self-assigned this Jun 6, 2022
@0xdaryl 0xdaryl merged commit f5001bb into eclipse-omr:master Jun 6, 2022
@EricYangIBM EricYangIBM deleted the cgroup_test_availability branch June 7, 2022 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants