-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
system hang during linuxki #17
Comments
Thank you for reporting this issue. To root cause, I'll have to setup a system to duplicate. I haven't used OpenVswitch before, so it may take me some time. I can update the runki script to quit if OpenVswitch is present, and print a warning. |
I have not been able to duplicate the issue so far. I thought there may be a conflict between the LinuxKI kernel module (likit.ko), and the openvswitch kernel module (openvswitch.ko). However, the 2.5.X version will not build the kernel module if installed on OS versions greater than 4.3. Could you provide more details on how openvswitch was built (ie. ./configure; make; make install), how it was started, and if I need to actually configure a VLAN as well if if just starting the service was sufficient? |
Sorry its taken me so long to reply. I was starting to get Ubuntu Xenial installed on a server and I noticed the hang occurred when running lsof, rather than during the actual trace collection. You can try to execute "runki -p" to omit the lsof and some collections from /proc. You can also try to duplicate by just running lsof in the same manner that the runki script does and see if that hangs as well.
|
|
Can you download and try the latest version - LinuxKI 5.9. I did fix an issue that caused probably right after the likit.ko module was unloaded. |
Already tested. Nothing new... I thing, now I faced with another problem, because now it hangs on stage "spooling trace data to disk" |
Thanks for trying. Unless I can duplicate the issue, or I can get a memory dump of the crash/hang, its very tough to figure it out. If you are interested, there are a few things you can try out to narrow down the issue. You could try to capture only certain events or subsystems with the runki using the "-e" or "-s" options. For example: $ runki -e hardclock <<would only captured the hardclock trace events. It would be interesting to know if the issue happens only with capturing certain subsystems or events. |
ok, we will try to run proposed commands and check result. I collected crash dump, but I'm not a guru in dump analysis. So If you interested to analyse our hang, you can get dump (kernel with debug symbols and dump) from the link: https://webdav.digitalenergy.online/runki-crashdump.tar.gz Thanks for help |
I am unfortunately having issued loading the crash dump as crash gives me the following error: crash: vmlinux-4.13.0-31-generic and dump.201904241450 do not match! I'll try to pull the Ubuntu bits from their site for 16.04.1 and try to duplicate on a physical server. |
I have not been able to duplicate this issue yet. However, another customer reported a problem with LinuxKI on a version modified for Power servers. The problem was due to a change in the perf_callchain_entry struction. Prior to Linux version 4.7, it was defined as follows: struct perf_callchain_entry { With Linux version 4.7, the definition was changed to: struct perf_callchain_entry { The LiKI module used this structure to store stack trace information. But on version 4.7 or later, the structure is now smaller, resulting in corruption for whatever followed. However, none of my testing ever showed an issue. It had only showed up on Power servers. LinuxKI version 6.0 has fixed this. I hope this is related to the Openvswitch issue. |
Thanks for keeping me posted. I will check for a problem as soon as possible. |
Seems like problem really solved. We tried LinuxKI 6.0-1 from deb package and actually on node installed openvswitch 2.11.1-2 packaged into deb from vanila scratches. Thanks a lot! Now we can use runki for analyse problems on our nodes! =) |
I'm glad to hear that everything is working now! If you have any questions, feel free to contact me at mark.ray@hpe.com. |
When I try to start
runki
, system hangs. I have to restart system from IPMI. No output in console.Founded the problem related with OpenvSwitch. When I stop openvswitch-switch.service, runki works without issues.
The text was updated successfully, but these errors were encountered: