-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many machines need process clean-up #770
Comments
@smlambert Do you have anything in place to mitigate this? We could use a multi-configuration jenkins job to run periodically over all the machines and kill anything that looks like it's hung. With only one jenkins executor per machine it should be fairly easy to determine what has been left around (i.e. just about everything running as the |
We should likely put a clean up step in the setup stage of testing to look for and kill test related processes (that may have hung due to previous test jobs that were exited in a hung/bad state). I also agree that a separate job that cleans up stray processes and files will be good and more thorough (as it can search for a broader range of processes to terminate). There is such a job in use at the OpenJ9 project, we can employ same/similar approach. |
https://github.com/eclipse/openj9/blob/master/buildenv/jenkins/jobs/infrastructure/Cleanup-Nodes Note these are band-aid solutions which I don't like doing. The job has the option to run cleanup, sanitize or both. The jenkins agent is also killed in the sanitize path which we didn't like doing but found issues properly killing processes without killing the agent. Also on Windows it does a full reboot instead because of Cygwin issues. |
@smlambert Quick and dirty check to identify the scope of the problem - the test machines in red here appear to have rogue java processes left around: https://ci.adoptopenjdk.net/view/work%20in%20progress/job/SXA-processCheck/ |
Thanks @sxa555 Your shell script does grep java, does this exclude the Jenkins agent itself which runs on all nodes? I will edit your script to print the actual processes so we can better understand the root of the problem... - curious if processes are all leftover from openjdk tests or from other types of testing as well. If from openjdk test jobs, I will say we should additionally be looking at why the underlying framework can not or is not killing/cleaning at the end of the run (and raising an issue against it if so). |
No - that one is explicitly grepped out - so in principle it would be safe to have an option on the job to kill all the processes it has detected ...
The job as-is will already show them in the console logs :-) |
Have removed the offline openlab CentOS machines to allow the jobs to complete - killing off runs 1 & 2 which are not completing because of this :-) |
Ya, I have looked more closely at the hung processes and it is very specifically the jdi tests on openj9, we should disable those tests for now, as there appear to be multiple problems. 1 major problem being that some of those tests expect to query a HotspotDiagnosticBean. It would be good for an OpenJDK contributor to review those tests to see what would be required to make them applicable to more than just hotspot implementations. |
@adamfarley Can you adjust the title of this issue please now that we've seen that it does not appear to be specific to any one machine |
Have done a cleanup of most machiens over the last couple of days and we should try and keep the process check job clean now. Might move it into "production" state soon instead of work in progress |
This is being maintained reasonably well through the process cleanup job at https://ci.adoptopenjdk.net/view/Tooling/job/SXA-processCheck/configure therefore closing for now. |
test-osuosl-ppc64le-ubuntu-16-04-1 had dozens of jdi processes left over from tests that didn't clean up properly after themselves.
An issue has been raised for this, and test-osuosl-ppc64le-ubuntu-16-04-1 is now clean, but I suspect test-osuosl-ppc64le-ubuntu-16-04-2 needs the same treatment.
I found this command cleaned up most of the issues: pkill -f "openjdkbinary"
I recommend pausing the machine on Jenkins, and running that command once there's nothing else running on it.
You'll know if it works if running "ps -aux | grep "java"" fills the screen beforehand, and does not fill the screen afterwards.
Happy hunting!
The text was updated successfully, but these errors were encountered: