Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAMZA-2762: new cpu usage metric which counts child processes usage #1636

Merged
merged 6 commits into from
Nov 7, 2022

Conversation

alnzng
Copy link
Contributor

@alnzng alnzng commented Oct 26, 2022

Symptom

We have observed that some use cases used quasar(TensorFlow framework) to do model inference and this framework spawn child processes(non-JVM) to run TensorFlow serving. These child processes were using high CPU usage(200%) however their CPU usage can't be captured by the existing CPU usage metric process-cpu-usage

Cause

The existing metric process-cpu-usage metric was designed for capturing the CPU usage for the JVM process only, it can't count the child processes(especially for non-JVM processes) usage.

Changes

  • Reply on oshi framwork to capture the CPU usage for the JVM process and all its child processes, and create a new metric to display the total CPU usage.
  • The CPU usage percentage is calculated based on top of the logical CPU count on the system

API Changes

Tests

  • Unit tests
  • Tested with samza-hello-samza and verify the metric data points
    Screen Shot 2022-10-25 at 10 23 58 PM

Signed-off-by: Alan Zhang <shuai.xyz@gmail.com>
Signed-off-by: Alan Zhang <shuai.xyz@gmail.com>
Signed-off-by: Alan Zhang <shuai.xyz@gmail.com>
Signed-off-by: Alan Zhang <shuai.xyz@gmail.com>
Signed-off-by: Alan Zhang <shuai.xyz@gmail.com>
@mynameborat
Copy link
Contributor

Looks good to me mostly. Can you close out the conversations that have been resolved and respond to the comments that are pending.

Also, fix the latest checks so that I can merge them.

Signed-off-by: Alan Zhang <shuai.xyz@gmail.com>
@alnzng
Copy link
Contributor Author

alnzng commented Nov 7, 2022

@mynameborat I have fixed the check issue(checksytle issue) and resolved all the comments. Can you please check? Thanks.

@mynameborat mynameborat merged commit dd8ecf1 into apache:master Nov 7, 2022
@alnzng alnzng deleted the SAMZA-2762 branch November 8, 2022 21:48
ehoner pushed a commit to ehoner/samza that referenced this pull request Apr 11, 2023
…pache#1636)

Symptom
We have observed that some use cases used quasar(TensorFlow framework) to do model inference and this framework spawn child processes(non-JVM) to run TensorFlow serving. These child processes were using high CPU usage(200%) however their CPU usage can't be captured by the existing CPU usage metric process-cpu-usage

Cause
The existing metric process-cpu-usage metric was designed for capturing the CPU usage for the JVM process only, it can't count the child processes(especially for non-JVM processes) usage.

Changes
Reply on oshi framwork to capture the CPU usage for the JVM process and all its child processes, and create a new metric to display the total CPU usage.
The CPU usage percentage is calculated based on top of the logical CPU count on the system

API Changes
Added a new metric total-process-cpu-usage in SamzaContainerMetrics which is similar with how we provided physical-memory-mb metric
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants