-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track corrupt workspaces on Windows Machines #1396
Comments
@karianna was attempting to run a build on
Both Admin and Jenkins didn't have permissions to delete or access the workspace as it apparently had
The following processes were using those filesystems: Once those processes were stopped, |
Ref: adoptium/temurin-build#1892 |
^ This comment is a bit of a more in-depth look at what's stopping Jenkins from deleting a workspace. It appears to be due to a java process that is writing to a log file after a test that isn't being stopped properly by Jenkins. |
At first I thought this may have been due to an aborted run on the affected machine (which usually causes problems like this) but this didn't appear the case as no grinders and only two unsuccessful test runs have been executed on the machine recently.
See the console log for more instances of this occurring and also at the end of the build, it fails to clean up (probably due to the above failures):
The troublesome java process that wasn't cleaned up was created in the first run as the file path in the test_ouput matches the one reported by @Willsparker in #1410 (comment) and I also see the exact same final words in the javacore file of the first run as I do in #1410 . The only thing that confuses me is why didn't the second run fail with the error reported in #1410 ? Why did it take another build before we saw this error? A temp solution for this could be to ensure we always force cleanup the workspaces on the test machines PRIOR to running any tests but if the file cannot be deleted by Jenkins or Admin then I'm not sure how we would go about doing that. Hope this helps! :) |
So we do actually clean up prior to running tests but it's not forced and I don't think there is a way for |
I think the way this needs to be resolved is to ensure all the Java processes have stopped after any given test run - no process, no locked file and The problem is that this job only runs once every 7 days, which is evidently not enough to catch the rogue processes. I suppose we could just cause every test job to trigger a |
Ref: #1415 |
Ref: https://ci.adoptopenjdk.net/view/Failing%20Builds/job/build-scripts/job/jobs/job/jdk/job/jdk-windows-x64-openj9/69/console (from the issue @M-Davies opened above) Affected |
@Willsparker https://ci.adoptopenjdk.net/view/Test_openjdk/job/Test_openjdk8_j9_sanity.openjdk_x86-32_windows/277/console sees |
@Willsparker |
This is a new one - there's 10
Highly useful 😁 |
ref: adoptium/TKG#45 |
Ref: adoptium/temurin-build#2076 (comment) Initial thoughts is that the run was trying to delete a directory that was in use in a different run- however there's only one executor on the machine so that couldn't happen unless someone was running a build locally on the machine. |
Error comes from here: https://github.com/ibmruntimes/openj9-openjdk-jdk8/blob/f9b33244a76dfd671907872ad6800f3e26e6a4a7/common/src/fixpath.c#L254 |
I would suggest putting some extra debug into: https://github.com/ibmruntimes/openj9-openjdk-jdk8/blob/f9b33244a76dfd671907872ad6800f3e26e6a4a7/common/src/fixpath.c#L254 |
We did some experimentation on a call earlier today and the atfile issue seems to have been caused by a lot of leftover The changes we are implementing for #1573 may well mean we can close this soon (Fingers crossed...) |
Okay- with #1573 & adoptium/aqa-tests#2059 , I haven't seen anymore leftover Java processes on Windows Boxes, issues (thank god). Closing :-) |
Ref: adoptium/temurin-build#1855
These have been regularly cropping up in 2 forms:
A file such as the one referred to in the issue referenced above (a file called
..the_
) is unable to be deleted. When doing this manually through File Explorer, it errors saying the file can't be found and to verify it's location. It can be cleared off by removing it using the Cygwin terminal.A file is locked by a process, and often both
Administrator
andJenkins
can't delete them due to not having permissions. To rectify this, the file path can be searched in theAssociated Handles
section, under theCPU
tab, in theresource monitor
. Once the processes that are using those files are found, they can be right clicked and forced exited. The file can then be deleted by theAdministrator
user.This issue is going to act as a way of tracking those sorts of issues to determine if there's a pattern / cause and a way of fixing it so a cleanup isn't required.
The text was updated successfully, but these errors were encountered: