Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System unavailable: test-azure-win2012r2-x64-1: ERROR: Cannot delete workspace #2209

Closed
lumpfish opened this issue Jun 10, 2021 · 27 comments
Closed

Comments

@lumpfish
Copy link

lumpfish commented Jun 10, 2021

Jobs are failing with:

Running on test-azure-win2012r2-x64-1 in C:\Users\jenkins\workspace\Grinder
[Pipeline] {
[Pipeline] cleanWs
[WS-CLEANUP] Deleting project workspace...
[WS-CLEANUP] Deferred wipeout is used...
ERROR: Cannot delete workspace :Unable to delete 'C:\Users\jenkins\workspace\Grinder'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. (Discarded 6 additional exceptions)

e.g. https://ci.adoptopenjdk.net/job/Grinder/804/console

@sophia-guo
Copy link

@sophia-guo
Copy link

This may be related with test cases. Job was cancelled by timeout and tests thread hangs.

@lumpfish
Copy link
Author

This will stop some hanging tests from running: adoptium/aqa-tests#2639

@karianna karianna added this to TODO in infrastructure via automation Jun 11, 2021
@sophia-guo
Copy link

Same issue with test-ibmcloud-win2012r2-x64-1 , which failed job https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_sanity.openjdk_x86-64_windows/508/

adoptium/aqa-tests#2639 will stop new hanging happen. We will need some process to clean all window workspace so the error won't happen again and no job will fail for this.

@smlambert
Copy link
Contributor

Was hoping that the processCheck (see adoptium/aqa-tests#2059) would cover us for being able to delete workspace, wonder if it means that not all processes are being killed, so therefore can not delete workspace?

@sophia-guo
Copy link

The kill process is in try catch finally block https://github.com/adoptium/aqa-tests/blob/master/buildenv/jenkins/JenkinsfileBase#L736. My guess is if job terminates due to timeout the kill process will not run at all.

@Haroon-Khel Haroon-Khel self-assigned this Jul 22, 2021
@Haroon-Khel
Copy link
Contributor

The 3 machines, test-azure-win2012r2-x64-1 and -3, and test-ibmcloud-win2012r2-x64-1 have hanging processes which I can't delete as the jenkins user. The processes are

/Users/jenkins/workspace/Test_openjdk8_dragonwell_extended.openjdk_x86-64_windows_testList_0/openjdkbinary/j2sdk-image/bin/java

/Users/jenkins/workspace/Test_openjdk11_j9_extended.functional_x86-64_windows/openjdkbinary/j2sdk-image/bin/java

/e/workspace/Test_openjdk8_hs_sanity.openjdk_x86-64_windows/openjdkbinary/j2sdk-image/bin/jdb

respectively.

I get a permissions error when trying to delete them as the Jenkins user

@Haroon-Khel
Copy link
Contributor

I was able to kill the process running on test-ibmcloud-win2012r2-x64-1

Running the failed jenkins job on the machine
https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_sanity.openjdk_x86-64_windows/512/console

@Haroon-Khel
Copy link
Contributor

The hanging processes on the azure machines were ended too

@Haroon-Khel
Copy link
Contributor

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Jul 23, 2021

https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_extended.functional_x86-64_windows/140/console failed on test-azure-win2012r2-x64-3 because it couldn't delete the workspace, strange

ERROR: Cannot delete workspace :Unable to delete 'C:\Users\jenkins\workspace\Test_openjdk11_j9_extended.functional_x86-64_windows'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. (Discarded 60 additional exceptions)
[Pipeline] }
[Pipeline] // timeout
[Pipeline] echo
Exception: hudson.AbortException: Cannot delete workspace: Unable to delete 'C:\Users\jenkins\workspace\Test_openjdk11_j9_extended.functional_x86-64_windows'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. (Discarded 60 additional exceptions)

@Haroon-Khel
Copy link
Contributor

On test-azure-win2012r2-x64-3, I've deleted the problem workspace, ended the hanging process and kicked off another job
https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_extended.functional_x86-64_windows/141/console

@Haroon-Khel
Copy link
Contributor

The job on test-ibmcloud-win2012r2-x64-1 ran without a workspace error or hanging processes. Running another job on the same machine to observe any 'failure to delete workspace' errors

https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_sanity.openjdk_x86-64_windows/513/console

@Haroon-Khel
Copy link
Contributor

test-ibmcloud-win2012r2-x64-1 looks to be fine. Two sanity jobs ran back to back without a workspace complaint or a hanging process

@Haroon-Khel
Copy link
Contributor

Test_openjdk8_dragonwell_extended.openjdk_x86-64_windows_testList_0 hung twice on test-azure-win2012r2-x64-1. this being the most recent. I ran extended openjdk tests on a dragonwell jdk on the machine directly, and this ran to completion without hanging.

I ran https://ci.adoptopenjdk.net/view/Test_openjdk/job/Test_openjdk8_hs_extended.openjdk_x86-64_windows_testList_0/12/console on the same machine to test. This ran to completion without hanging

@Haroon-Khel
Copy link
Contributor

The other aborted Test_openjdk8_dragonwell_extended.openjdk_x86-64_windows_testList_0 job on test-azure-win2012r2-x64-1 was https://ci.adoptopenjdk.net/job/Test_openjdk8_dragonwell_extended.openjdk_x86-64_windows_testList_0/9

Both jobs hung for 4 hours during hotspot_jre_0 after which I had to abort

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Jul 28, 2021

hotspot_jre_0 grinder from yesterday on test-azure-win2012r2-x64-1, https://ci.adoptopenjdk.net/job/Grinder/1159/console. Jenkins aborted it a after it hung for 10 hours

@Haroon-Khel
Copy link
Contributor

Rerunning on the same machine without a dragonwell build
https://ci.adoptopenjdk.net/job/Grinder/1161/console

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Jul 28, 2021

Rerunning on the same machine without a dragonwell build
https://ci.adoptopenjdk.net/job/Grinder/1161/console

This ran to completion, didn't hang nor did it leave any hanging processes on the machine

@Haroon-Khel
Copy link
Contributor

hotspot_jre_0 grinder from yesterday on test-azure-win2012r2-x64-1, https://ci.adoptopenjdk.net/job/Grinder/1159/console. Jenkins aborted it a after it hung for 10 hours

Same grinder ran on test-ibmcloud-win2012r2-x64-1, https://ci.adoptopenjdk.net/job/Grinder/1160/console. The test didn't pass, but the grinder did not hang as it does on test-azure-win2012r2-x64-1

@Haroon-Khel
Copy link
Contributor

Re ran the job on test-azure-win2012r2-x64-3 https://ci.adoptopenjdk.net/job/Grinder/1162/console. Test doesn't hang. Seems to be a problem with -1 only

@Haroon-Khel
Copy link
Contributor

I think the test class that causes the hang is TestNoMinidumpAtFullGC. It appears as the last test before the job hangs, on all 3 occasions. Despite the directory being hotspot\test\serviceability\sa\jmap-minidump\TestNoMinidumpAtFullGC.java I can't find any mention of this test file in https://github.com/adoptium/jdk8u

@sxa
Copy link
Member

sxa commented May 30, 2022

Machine currently offline due to low disk space

@sxa sxa added this to the 2022-06 (June) milestone May 30, 2022
@sxa sxa added the os:windows label May 30, 2022
@sxa
Copy link
Member

sxa commented Jul 15, 2022

test-azure-win2012r2-x64-1
https://ci.adoptopenjdk.net/job/Test_openjdk11_hs_sanity.openjdk_x86-64_windows/647/console

There was a bash shell sitting in that directory running as the administrative user, which was preventing jenkins from deleting it. Now resolved and it's running through ok at https://ci.adoptopenjdk.net/job/Test_openjdk11_hs_sanity.openjdk_x86-64_windows/651/console

@sxa
Copy link
Member

sxa commented Jul 15, 2022

As mentioned above the machine is quite low on disk space with 1Gb in C: (Although there is around 26Gb on the workspace drive D:

@Haroon-Khel
Copy link
Contributor

Machine is back online with enough space. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

5 participants