Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entire system freezes on Google Compute Engine VM #114

Open
jacobsa opened this issue Jun 5, 2015 · 3 comments
Open

Entire system freezes on Google Compute Engine VM #114

jacobsa opened this issue Jun 5, 2015 · 3 comments

Comments

@jacobsa
Copy link

jacobsa commented Jun 5, 2015

I'm trying to get dtrace working on a Linux VM running on Google Cloud Engine. I can successfully load the kernel module, but the moment I run dtrace my SSH session stops responding (and eventually fails with "Broken pipe"). I can't SSH in again until I reboot the VM.

For example, I tried to do this as a hello world:

% sudo dtrace -n 'syscall::rmdir:entry { @num[pid,execname] = count(); }'
dtrace: description 'syscall::rmdir:entry ' matched 2 probes
Write failed: Broken pipe
ERROR: (gcloud.compute.ssh) [/usr/local/bin/ssh] exited with return code [255].

I don't know the first thing about dtrace, so maybe I'm doing something dumb here. Is this expected?

Here's my version:

% uname -a                                                          
Linux ubuntu 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:09:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
@cjdelisle
Copy link

GCE is a kvm instance, I recall testing dtrace in a kvm instance on my machine
and it caused repeatable kernel panics but I didn't investigate it much further.
According to Paul it's something about linux behaving different because it is in
a VM but I was under the impression KVM was a full hardware virtualization so it
should not even really be aware that it is virtualized...

On 06/05/2015 01:35 PM, Aaron Jacobs wrote:

I'm trying to get dtrace working on a Linux VM running on Google Cloud Engine. I can successfully load the kernel module, but the moment I run dtrace my SSH session stops responding (and eventually fails with "Broken pipe"). I can't SSH in again until I reboot the VM.

For example, I tried to do this as a hello world:

% sudo dtrace -n 'syscall::rmdir:entry { @num[pid,execname] = count(); }'
dtrace: description 'syscall::rmdir:entry ' matched 2 probes
Write failed: Broken pipe
ERROR: (gcloud.compute.ssh) [/usr/local/bin/ssh] exited with return code [255].

I don't know the first thing about dtrace, so maybe I'm doing something dumb here. Is this expected?

Here's my version:

% uname -a                                                          
Linux ubuntu 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:09:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Reply to this email directly or view it on GitHub:
#114

Satire is the escape hatch from the cycle of sorrow, hatred and violence. #JeSuisCharlie

@dtrace4linux
Copy link
Owner

its possibly its Xen causing the issue. Best thing to try is on an Ubuntu
kvm running on top of a Xen host (I think Ubuntu defaults to this).

When a panic/freeze occurs, likely the kernel is double-faulted - it can be
helpful to try some other syscall you can control (like chdir) and tail -f
/var/log/kernel.log in the background or run on the VM's console. The panic
messages will hit the console, but if you are in an X session or solely
ssh'd in you will never see the console.

If you are lucky, after a reboot /var/log/kern.log will have the trail end
of the death of the VM before the reboot.

Be helpful to see /proc/dtrace/trace before doing any dtrace activity - it
might hilite what is missing in the kernel.

On 5 June 2015 at 13:02, Caleb James DeLisle notifications@github.com
wrote:

GCE is a kvm instance, I recall testing dtrace in a kvm instance on my
machine
and it caused repeatable kernel panics but I didn't investigate it much
further.
According to Paul it's something about linux behaving different because it
is in
a VM but I was under the impression KVM was a full hardware virtualization
so it
should not even really be aware that it is virtualized...

On 06/05/2015 01:35 PM, Aaron Jacobs wrote:

I'm trying to get dtrace working on a Linux VM running on Google Cloud
Engine. I can successfully load the kernel module, but the moment I run
dtrace my SSH session stops responding (and eventually fails with "Broken
pipe"). I can't SSH in again until I reboot the VM.

For example, I tried to do this as a hello world:

% sudo dtrace -n 'syscall::rmdir:entry { @num[pid,execname] = count(); }'
dtrace: description 'syscall::rmdir:entry ' matched 2 probes
Write failed: Broken pipe
ERROR: (gcloud.compute.ssh) [/usr/local/bin/ssh] exited with return code
[255].

I don't know the first thing about dtrace, so maybe I'm doing something
dumb here. Is this expected?

Here's my version:

% uname -a
Linux ubuntu 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:09:58 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux


Reply to this email directly or view it on GitHub:
#114

Satire is the escape hatch from the cycle of sorrow, hatred and violence.
#JeSuisCharlie


Reply to this email directly or view it on GitHub
#114 (comment).

@jacobsa
Copy link
Author

jacobsa commented Jun 8, 2015

Thanks for the tips. I ran this:

sudo dtrace -n 'syscall::chdir:entry { @num[pid,execname] = count(); }'

and the SSH session on which I ran it immediately froze up, with dtrace not responding to Ctrl-C, and eventually timed out. Other sessions continued to work, but the system would no longer respond to new SSH connections. I got this NMI watchdog message in /var/log/kern.log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants