Adding CPU Measurements and Unit Tests #14470

ecngtng · 2021-12-23T09:47:45Z

This commit is about reporting CPU user time and CPU system time,
that is already available, but not printed in any log.

Part one

The solution adds CPU metrics after a build as seen below:

INFO: 2 processes: 1 internal, 1 linux-sandbox.
INFO: Total action wall time 70.05s
INFO: Critical path 70.08s (setup 0.00s, action wall time 70.05s)
INFO: Elapsed time 71.58s (preparation 1.45s, execution 70.13s)
INFO: CPU time 5.31s (user 0.00s, system 0.01s, bazel 5.29s)
INFO: Build completed successfully, 2 total actions

If any of the values is not collected, "???" is printed instead like:

INFO: 415 processes: 1 internal, 414 processwrapper-sandbox.
INFO: Total action wall time 127.84s
INFO: Critical path 2.58s (setup 0.00s, action wall time 2.51s)
INFO: Elapsed time 38.37s (preparation 3.98s, execution 34.39s)
INFO: CPU time ???s (user 79.45s, system 22.39s, bazel ???s)
INFO: Build completed successfully, 415 total actions

Note:
The values are only presented if the following flag is set:
--experimental_stats_summary

The following flags must be set to get the measurements collected:
--experimental_collect_local_sandbox_action_metrics
--experimental_profile_cpu_usage (set but default)

Part two

Unit Test is added in runtime-tests.
Test command:
bazel test --experimental_stats_summary //src/test/java/com/google/devtools/build/lib:runtime-tests

ecngtng · 2021-12-27T03:53:37Z

Hello, due to I couldn't reproduce the issue locally. Who knows where to get the the log as below:
FAILED: //src/test/java/com/google/devtools/build/lib:runtime-tests (Summary)
/private/var/tmp/_bazel_buildkite/5e1880647af908bfbf43bdcc50066287/execroot/io_bazel/bazel-out/darwin-fastbuild/testlogs/src/test/java/com/google/devtools/build/lib/runtime-tests/test.log
/private/var/tmp/_bazel_buildkite/5e1880647af908bfbf43bdcc50066287/execroot/io_bazel/bazel-out/darwin-fastbuild/testlogs/src/test/java/com/google/devtools/build/lib/runtime-tests/test_attempts/attempt_1.log
/private/var/tmp/_bazel_buildkite/5e1880647af908bfbf43bdcc50066287/execroot/io_bazel/bazel-out/darwin-fastbuild/testlogs/src/test/java/com/google/devtools/build/lib/runtime-tests/test_attempts/attempt_2.log

This commit is about reporting cpu user time and cpu system time, that is already available, but not printed in any log. Part one The solution adds CPU metrics after a build as seen below: INFO: 2 processes: 1 internal, 1 linux-sandbox. INFO: Total action wall time 70.05s INFO: Critical path 70.08s (setup 0.00s, action wall time 70.05s) INFO: Elapsed time 71.58s (preparation 1.45s, execution 70.13s) INFO: CPU time 5.31s (user 0.00s, system 0.01s, bazel 5.29s) INFO: Build completed successfully, 2 total actions If any of the values is not collected, "???" is printed instead like: INFO: 415 processes: 1 internal, 414 processwrapper-sandbox. INFO: Total action wall time 127.84s INFO: Critical path 2.58s (setup 0.00s, action wall time 2.51s) INFO: Elapsed time 38.37s (preparation 3.98s, execution 34.39s) INFO: CPU time ???s (user 79.45s, system 22.39s, bazel ???s) INFO: Build completed successfully, 415 total actions Note: The values is only presented if the following flag is set: --experimental_stats_summary The following flags must be set to get the measurements to be ollected: --experimental_collect_local_sandbox_action_metrics --experimental_profile_cpu_usage (set but default) Part two Unit Test is added in runtime-tests. Test command: ./misc/bazel test --experimental_stats_summary //src/test/java/com/google/devtools/build/lib:runtime-tests Note: Due to Reporter class has keywords 'final', it'll block mock in UT. We didn't find a better way to solve this issue. So in this commit, this keyword 'final' in Reporter was removed.

ecngtng · 2022-01-04T10:36:13Z

Here is the answer for my question above.:

Choose the failed test suite, by pressing ”details” link in github. That will take you to https://buildkite.com
Choose the failed test suite again in buildkite.com.
Switch tab from ”Log” to ”Artifacts” for the suite in buildkite.com, to access the log files.

ecngtng · 2022-01-04T10:42:05Z

New commit fixed the issue in Mac OS. It's due to Mac OS uses "," to divide integer and decimal fraction instead of '.' in other OS.

E.g.: In Mac OS, it's '9,00'. In other OS, it's '9.00'.

larsrc-google · 2022-01-24T10:24:06Z

@ecngtng It's entirely possible that that difference in punctuation is not due to being on Mac per se, but due to Mac doing different i18n.

ecngtng · 2022-01-24T10:42:38Z

@larsrc-google So in this case, is it reasonable to distinguish the punctuation with related key words in OS name? Or?

ulrfa · 2022-01-24T13:07:45Z

Distinguishing on OS might cause problems for users in other countries with different default Locale.

One alternative could be to get the JVMs decimal separator char for current Locale.getDefault(), via DecimalFormatSymbols.getInstance().getDecimalSeparator() and use that as condition.

Another alternative could be to use String.format also when constructing the expected result in the test case.

Change distinguishing from OS to getDecimalSeparator().

ecngtng · 2022-01-25T05:41:33Z

@ulrfa @larsrc-google Thanks for your good comments. Codes have been updated with your comment.

meisterT · 2022-01-26T08:28:58Z

Is this the right place to expose it? Most users likely don't care about CPU time.

The idea behind --experimental_stats_summary is to provide the most useful information in a more concise way.

If a user cares, they can always look it up in the BEP and the trace profile.

ulrfa · 2022-01-26T14:54:16Z

Thank you @meisterT for reviewing!

AFAIK, BEP provides CPU time only for:

The JVM process.

and the --profile json file contains CPU usage only for:

The JVM Process.
All processes running on local host (including other builds and other concurrently executing processes).

I assume the metrics above are relevant mostly for the developers of bazel codebase itself. But this PR provides CPU time for:

The JVM process. (“bazel”)
All subprocesses spawned in sandboxes for the particular build. (“user” and “system”)

I think the CPU time, for the JVM and subprocesses combined, is one of the most relevant metrics for a user that would like to grasp the overall resource consumption of a build.

In the future, I imagine storing subprocess's CPU time per action in the profile json file, and then calculate summary grouped by mnemonic, to see how much CPU each tool in the build chain consumes. But I think this PR is a good start and makes information conveniently available.

Perhaps --experimental_stats_summary is not suitable for this. @meisterT, do you have ideas about other ways to control verbosity?

ecngtng · 2022-02-10T10:02:49Z

@meisterT What are your comments to @ulrfa's response? : )

larsrc-google · 2022-02-14T09:00:16Z

@wilwell

larsrc-google

I started a review, then realized a general problem: Action CPU time is readily available for standalone/sandboxed actions. For singleplex worker actions, it could be found by taking the processes CPU time before and after making a request (I don't think we do that by default, but we could). But multiplex worker actions simply don't have a well-defined CPU time concept (actually the singleplex action would paper over some things like GC between requests). Since even a single action without CPU time makes the whole thing unknown, it will probably end up being unknown more often than not.

The way around this would be to measure non-worker actions the way you currently do, but measure worker actions by taking the CPU time at start and end (of build or of worker).

larsrc-google · 2022-04-08T12:59:23Z

src/main/java/com/google/devtools/build/lib/runtime/BuildSummaryStatsModule.java

@@ -60,6 +63,10 @@
  private long executionEndMillis;
  private SpawnStats spawnStats;
  private Path profilePath;
+  private static final long UNKNOWN = -1;


Could you rename this to "UNKNOWN_CPU_TIME" to be a bit clearer?

larsrc-google · 2022-04-08T13:01:36Z

src/main/java/com/google/devtools/build/lib/runtime/BuildSummaryStatsModule.java

+    Optional<Duration> cpuUserTimeForActionsDuration = event.getActionResult().cumulativeCommandExecutionUserTime();
+
+    if(cpuUserTimeForActionsDuration.isPresent() &&  (cpuUserTimeForActions != UNKNOWN)) {
+      cpuUserTimeForActions = cpuUserTimeForActions + cpuUserTimeForActionsDuration.get().toMillis();


You can use += here and below.

src/main/java/com/google/devtools/build/lib/runtime/BuildSummaryStatsModule.java

larsrc-google · 2022-04-08T13:03:16Z

src/main/java/com/google/devtools/build/lib/runtime/BuildSummaryStatsModule.java

-  }
+    Optional<Duration> cpuUserTimeForActionsDuration = event.getActionResult().cumulativeCommandExecutionUserTime();
+
+    if(cpuUserTimeForActionsDuration.isPresent() &&  (cpuUserTimeForActions != UNKNOWN)) {


Nit: Unnecessary extra parentheses.

larsrc-google · 2022-04-08T13:06:46Z

src/main/java/com/google/devtools/build/lib/runtime/BuildSummaryStatsModule.java

+    }
+  }
+
+  private static long sumCpuTimes(long a, long b, long c) {


Since you're hardcoding this to three parameters, you might as well name them with what kinds of CPU times they are.

larsrc-google · 2022-04-08T13:11:47Z

src/test/java/com/google/devtools/build/lib/runtime/BuildSummaryStatsModuleTest.java

+      when(result.cumulativeCommandExecutionUserTime()).thenReturn(Optional.empty());
+    }
+    else{
+      when(result.cumulativeCommandExecutionUserTime()).thenReturn(Optional.of(userTime));


You can simplify these with Optional.ofNullable().

larsrc-google · 2022-04-08T13:15:42Z

src/test/java/com/google/devtools/build/lib/runtime/BuildSummaryStatsModuleTest.java

+    field1.setLong(buildSummaryStatsModule, 11000);
+    buildSummaryStatsModule.buildComplete(createBuildEvent());
+    if(Comma) {
+      verify(reporterMock).handle(Event.info("CPU time 88,00s (user 55,00s, system 22,00s, bazel 11,00s)"));


I think the comma problem would be easier to understand if you use String.format here instead of having the Comma variable. A note on why it's done that way would be good, though.

For this comment, I think you want it to be changed like:
verify(reporterMock).handle(Event.info(String.format("CPU time %.2fs (user %.2fs, system %.2fs, bazel %.2fs)",88.00, 55.00, 22.00, 11.00)));
Is it right?

Yes, that should do it. Plus a comment on why it's not just a fixed string.

I hope, in this way, it could be determined by the system to use '.' or ',' to expresss the floating number.

ckolli5 · 2022-04-27T05:41:42Z

Hello @ecngtng, could you please confirm that all the review comment are taken care? Thanks.

ecngtng · 2022-04-27T05:46:57Z

@ckolli5 yes, all the comments are taken care.

ecngtng · 2023-02-22T02:24:23Z

@larsrc-google Codes have been changed according to your comments. Could you please help to check? Thanks!

src/main/java/com/google/devtools/build/lib/events/Reporter.java

src/main/java/com/google/devtools/build/lib/runtime/BuildSummaryStatsModule.java

larsrc-google · 2023-02-22T08:47:28Z

src/main/java/com/google/devtools/build/lib/runtime/BuildSummaryStatsModule.java

@@ -206,6 +229,34 @@ public void buildComplete(BuildCompleteEvent event) {
        criticalPathComputer = null;
      }
      profilePath = null;
+      cpuUserTimeForActions = Duration.ofMillis(0);


Zeroing them on startup would also make it clearer that they are properly initialized.

larsrc-google · 2023-02-22T08:50:33Z

Thank you. You missed one comment from before, and I added a couple small things.

ecngtng · 2023-02-23T10:50:34Z

@larsrc-google Thanks for your good comments. Except the comment that I'm not sure if I have got your point, all the others have been updated as you suggested. About "Zeroing them on startup would also make it clearer that they are properly initialized.", the zeroing actions have been done in both executorInit() and buildComplete(). Is it not ok? Or?

larsrc-google · 2023-02-23T10:58:16Z

I don't see the zeroing happening in executorInit(), only in buildComplete(). If you zeroed them in executorInit(), that would be sufficient.

ecngtng · 2023-02-23T12:50:54Z

@larsrc-google You're right. After checking, I found they were really not in last commits. I added them back now. (Every interesting, I ever checked codes and confirmed that they merged several days ago.)

ecngtng · 2023-02-24T10:26:28Z

@larsrc-google Codes have been changed and all checks passed. Could you please help to check? Thanks!

larsrc-google · 2023-02-24T10:43:42Z

Looks nice. I'll do some extra checks internally, and then I think we can merge it.

larsrc-google · 2023-03-08T09:02:31Z

Getting bogged down in all the technical bits, I forgot about @meisterT's comments above. Sorry. We want to reduce the terminal output, not increase it. Putting it in the log (as indicated in the original comment) would be fine, though.

sgowroji · 2023-03-15T12:08:35Z

Hi @wilwell, Since I can see that this PR has been approved, please let me know whether I should proceed with importing it.

larsrc-google · 2023-03-15T13:02:30Z

No, not yet. As per my last comment adding this info to the output from every build is not desirable.

ecngtng force-pushed the my_branch branch from 8d45141 to e5ee569 Compare January 4, 2022 09:41

ecngtng added 3 commits January 5, 2022 03:07

Merge branch 'master' into my_branch

f2dec26

Update Reporter.java

e6fd638

Update BuildSummaryStatsModuleTest.java

49c948a

aiuto requested a review from meisterT January 19, 2022 04:38

aiuto added team-Performance Issues for Performance teams untriaged labels Jan 19, 2022

Update BuildSummaryStatsModuleTest.java

a59c295

Change distinguishing from OS to getDecimalSeparator().

larsrc-google reviewed Apr 8, 2022

View reviewed changes

ecngtng added 2 commits April 21, 2022 11:04

Update BuildSummaryStatsModule.java

b3f7b5c

Update BuildSummaryStatsModuleTest.java

f6c21d9

ckolli5 added the awaiting-review PR is awaiting review from an assigned reviewer label Apr 26, 2022

ckolli5 added awaiting-user-response Awaiting a response from the author and removed awaiting-review PR is awaiting review from an assigned reviewer labels Apr 27, 2022

ckolli5 added awaiting-review PR is awaiting review from an assigned reviewer and removed awaiting-user-response Awaiting a response from the author labels May 5, 2022

ecngtng added 8 commits February 10, 2023 20:26

Update BuildSummaryStatsModule.java

4994cfe

Update BuildSummaryStatsModule.java

c80b422

Update BuildSummaryStatsModule.java

ac55df7

Update BuildSummaryStatsModuleTest.java

8479f90

Merge branch 'bazelbuild:master' into my_branch

80a3958

Update BuildSummaryStatsModule.java

8a14d0b

Update BuildSummaryStatsModuleTest.java

ad28e1c

Update BuildSummaryStatsModule.java

1c844ad

larsrc-google reviewed Feb 22, 2023

View reviewed changes

ecngtng added 4 commits February 23, 2023 16:14

Update BuildSummaryStatsModule.java

5b101a9

Update BuildSummaryStatsModuleTest.java

34e2c14

Update Reporter.java

2b0b6a7

Merge branch 'bazelbuild:master' into my_branch

42ba75c

ecngtng added 4 commits February 23, 2023 19:08

Update BuildSummaryStatsModule.java

ee6542f

Update BuildSummaryStatsModuleTest.java

e528ab2

Update BuildSummaryStatsModule.java

303e776

Merge branch 'bazelbuild:master' into my_branch

b8a258c

ecngtng added 3 commits February 24, 2023 03:22

Merge branch 'bazelbuild:master' into my_branch

3f46fc9

Update BuildSummaryStatsModule.java

567295c

Update BuildSummaryStatsModuleTest.java

0960c20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding CPU Measurements and Unit Tests #14470

Adding CPU Measurements and Unit Tests #14470

ecngtng commented Dec 23, 2021 •

edited

ecngtng commented Dec 27, 2021

ecngtng commented Jan 4, 2022

ecngtng commented Jan 4, 2022

larsrc-google commented Jan 24, 2022

ecngtng commented Jan 24, 2022

ulrfa commented Jan 24, 2022

ecngtng commented Jan 25, 2022

meisterT commented Jan 26, 2022

ulrfa commented Jan 26, 2022

ecngtng commented Feb 10, 2022

larsrc-google commented Feb 14, 2022

larsrc-google left a comment

larsrc-google Apr 8, 2022

larsrc-google Apr 8, 2022

larsrc-google Apr 8, 2022

larsrc-google Apr 8, 2022

larsrc-google Apr 8, 2022

larsrc-google Apr 8, 2022

ecngtng Apr 14, 2022

larsrc-google Apr 14, 2022

ecngtng Apr 14, 2022

ckolli5 commented Apr 27, 2022

ecngtng commented Apr 27, 2022

ecngtng commented Feb 22, 2023

larsrc-google Feb 22, 2023

larsrc-google commented Feb 22, 2023

ecngtng commented Feb 23, 2023

larsrc-google commented Feb 23, 2023

ecngtng commented Feb 23, 2023

ecngtng commented Feb 24, 2023

larsrc-google commented Feb 24, 2023

larsrc-google commented Mar 8, 2023

sgowroji commented Mar 15, 2023

larsrc-google commented Mar 15, 2023

Adding CPU Measurements and Unit Tests #14470

Are you sure you want to change the base?

Adding CPU Measurements and Unit Tests #14470

Conversation

ecngtng commented Dec 23, 2021 • edited

Part one

Part two

ecngtng commented Dec 27, 2021

ecngtng commented Jan 4, 2022

ecngtng commented Jan 4, 2022

larsrc-google commented Jan 24, 2022

ecngtng commented Jan 24, 2022

ulrfa commented Jan 24, 2022

ecngtng commented Jan 25, 2022

meisterT commented Jan 26, 2022

ulrfa commented Jan 26, 2022

ecngtng commented Feb 10, 2022

larsrc-google commented Feb 14, 2022

larsrc-google left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ckolli5 commented Apr 27, 2022

ecngtng commented Apr 27, 2022

ecngtng commented Feb 22, 2023

Choose a reason for hiding this comment

larsrc-google commented Feb 22, 2023

ecngtng commented Feb 23, 2023

larsrc-google commented Feb 23, 2023

ecngtng commented Feb 23, 2023

ecngtng commented Feb 24, 2023

larsrc-google commented Feb 24, 2023

larsrc-google commented Mar 8, 2023

sgowroji commented Mar 15, 2023

larsrc-google commented Mar 15, 2023

ecngtng commented Dec 23, 2021 •

edited