Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running async profiler on AWS EC2 virtual machine #417

Closed
jumarko opened this issue Apr 15, 2021 · 6 comments
Closed

Running async profiler on AWS EC2 virtual machine #417

jumarko opened this issue Apr 15, 2021 · 6 comments

Comments

@jumarko
Copy link

jumarko commented Apr 15, 2021

I'm trying to run Async Profiler on one of our AWS Beanstalk machines running Java 8 and it fails very early in the process.

sudo ./profiler.sh -d 10 6934
Failed to inject profiler into 6934

No more information is provided.
If I try to run it under the user running the target java process it just fails silently (no error message):

sudo -u webapp ./profiler.sh -d 10 6934
echo $?
1

Here's the system information from dmesg:

[    0.000000] Linux version 4.14.225-121.362.amzn1.x86_64 (mockbuild@koji-pdx-corp-builder-60005) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)) #1 SMP Tue Mar 23 00:29:14 UTC 2021

Java version:

openjdk version "1.8.0_282"
OpenJDK Runtime Environment (build 1.8.0_282-b08)
OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)

Should this work or am I hitting some limitations of the virtualized AWS environment?

Note: I tried to run it with strace but couldn't figure out what's wrong - the last few lines are:

sudo strace ./profiler.sh -d 10 6934
...
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x43e410, [], SA_RESTORER, 0x7fa3cb079420}, {SIG_DFL, [], SA_RESTORER, 0x7fa3cb079420}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 30036
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30036, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, 0x7fff3d502918, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn()                          = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fa3cb079420}, {0x43e410, [], SA_RESTORER, 0x7fa3cb079420}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
stat("/tmp/async-profiler-log.30030.6934", 0x7fff3d503070) = -1 ENOENT (No such file or directory)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(255)                         = ?
+++ exited with 255 +++

@apangin
Copy link
Collaborator

apangin commented Apr 15, 2021

"Failed to inject profiler" message is documented in README in Troubleshooting section:

The connection with the target JVM has been established, but JVM is unable to load profiler shared library. Make sure the user of JVM process has permissions to access libasyncProfiler.so by exactly the same absolute path. For more information see #78.

Apparently filesystem permissions do not allow webapp account to access the profiler binary.

@jumarko
Copy link
Author

jumarko commented Apr 15, 2021

Thanks for the tip!
I changed permissions for all the files in the folder:

sudo chown -R webapp:webapp async-profiler-2.0-linux-x64

But the problem is that it still doesn't work - fails silently again:

[ec2-user@...]$ sudo -u webapp ./profiler.sh -d 10 1325
[ec2-user@...]$ echo $?
1

As root, I still get the same message as before

$ sudo ./profiler.sh -d 10 1325
Failed to inject profiler into 1325

The process indeed runs under the webapp user:

$ ps aux | grep 1325
webapp    1325  0.9 52.1 2483960 526384 ?      Sl   07:25   1:46 java ...

Looking closer at #78 and https://github.com/jvm-profiling-tools/async-profiler#troubleshooting I started from scratch and downloaded async profiler into /home/webapp.
Still wasn't able to run profiler as the webapp user (perf_event_open failed error happens for root too):

[webapp@...]$ ./profiler.sh  -d 10 1324
[WARN] perf_event_open failed: Permission denied

... with --all-user I got an empty output:

[webapp@...]$ ./profiler.sh  --all-user -d 10 1324
Failed to change credentials to match the target process: Operation not permitted

Here I realized that it might be a problem with process' group - indeed, my java app runs under another group (nginx).
Since I couldn't find a way to run profiler.sh with that group (ec2-user on AWS doesn't seem to have privileges to run sugo -g ) I tried with root again which finally worked

$ sudo  /home/webapp/async-profiler-2.0-linux-x64/profiler.sh --all-user -d 10 1325
...

I guess I'm good now but one question: every now and then I get an empty output - is that normal?

sudo  /home/webapp/async-profiler-2.0-linux-x64/profiler.sh --all-user -d 10 1325
Profiling for 10 seconds
Done
--- Execution profile ---
Total samples       : 0

          ns  percent  samples  top
  ----------  -------  -------  ---

@apangin
Copy link
Collaborator

apangin commented Apr 16, 2021

Yeah, profiling in a container is tricky. I'll think how to improve user experience.

Anyway, the output should not be empty, unless the application is completely idle.
Try itimer mode - does it make difference? sudo profiler.sh -e itimer -d 10 <pid>

Are there any warnings/errors in the VM logs?

@jumarko
Copy link
Author

jumarko commented Apr 16, 2021

No warnings or errors in the logs. It's a web application running on our staging server so I guess there might be periods when it's "idle" although that was surprising.
Anyway, when I explicitly do a request against the app the profiler seems to capture some data.
I think I'm good right now - thanks for your help and all the effort you've put into this great tool!

@jumarko
Copy link
Author

jumarko commented Apr 16, 2021

(feel free to close the issue - I would do that but wasn't sure if you want to keep it open, perhaps for docs improvement or something like that)

@apangin
Copy link
Collaborator

apangin commented May 16, 2021

#413 should slightly improve profiling experience in a container: the host will now display error logs and the output file from the target process that runs in a different mount namespace.

@apangin apangin closed this as completed May 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants