Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce staged timeout system #2150

Merged
merged 12 commits into from Nov 9, 2020
Merged

Conversation

M-Davies
Copy link
Contributor

@M-Davies M-Davies commented Oct 15, 2020

All timeouts are now done on a stage by stage basis, failing the build with a helpful message if the timeout is reached:

TOP LEVEL PIPELINES

  • Adopt api requests = 1 hour
  • Copy artifacts = 6 hours
  • Remove artifacts = 2 hours
  • Archive artifacts = 6 hours
  • Publish artifacts = 3 hours

DOWNSTREAM JOBS

  • Clean master node = 1 hour
  • Clean aix node = 1 hour
  • Adopt api requests = 1 hour
  • Clean workspace on node = 1 hour
  • Checkout scm on node = 1 hour
  • Pull docker image = 2 hours
  • Checkout docker image = 1 hour
  • Main build script (make-adopt-build-farm.sh) = 6 hours
  • Archive artifact = 3 hours
  • Sign build job = 2 hours
  • Installer jobs = 3 hours
  • I have added the nessasary stubs and libraries to ensure that the try-catch fires correctly when a timeout is reached. For the FlowInterruptedException, it required importing the jenkins core lib via gradle (https://stackoverflow.com/a/47353745/14420589).

  • While I was at it, I also added in a node timeout to the pr tester (since that was a param that got implemented recently Make build/test "Queue" phase abort if no nodes online after X mins #2118) as well as correcting the function that ran it to use Jenkins advanced sleep which has more brevity and reliability than normal groovy sleep. I also added a println to when we move into/out-of nodes as it's hard to tell sometimes in the logs

Closes: #2120
Signed-off-by: Morgan Davies morgandavies2020@gmail.com

@M-Davies M-Davies added enhancement Issues that enhance the code or documentation of the repo in any way jenkins Issues that enhance or fix our jenkins server labels Oct 15, 2020
@M-Davies M-Davies marked this pull request as draft October 15, 2020 18:06
@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@M-Davies M-Davies force-pushed the new_timeouts branch 4 times, most recently from ad14a10 to a17a62c Compare October 16, 2020 12:56
@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

2 similar comments
@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

3 similar comments
@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

2 similar comments
@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@M-Davies M-Davies force-pushed the new_timeouts branch 2 times, most recently from 9748d66 to 6b13d18 Compare October 16, 2020 14:50
@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@M-Davies
Copy link
Contributor Author

run tests

@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@karianna karianna added this to TODO in temurin-build via automation Oct 18, 2020
@karianna karianna added this to the October 2020 milestone Oct 18, 2020
@karianna karianna moved this from TODO to In Progress in temurin-build Oct 18, 2020
@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@M-Davies
Copy link
Contributor Author

M-Davies commented Nov 5, 2020

Tester failed due to #2212. This is blocked until that PR is merged

@karianna
Copy link
Contributor

karianna commented Nov 6, 2020

Conflicts as well

@M-Davies
Copy link
Contributor Author

M-Davies commented Nov 6, 2020

Conflicts as well

I'm waiting till 2212 is merged before resolving them since it means I only have to do it once 🙂

@M-Davies
Copy link
Contributor Author

M-Davies commented Nov 6, 2020

run tests

@andrew-m-leonard
Copy link
Contributor

@andrew-m-leonard @karianna Would we prefer to have the timeouts defined on a platform to version basis instead of a general timeout for every platform and version? I'm thinking that we could add in a Map attribute into each platform's build_configuration_map that contains the timeouts for that specific platform. That way we could have longer timeouts for slower systems like AIX and ARM and shorter ones for others?

I'm wary of over complicating things here, and also for example a particular AIX machine maybe slow now, but a simple H/W upgrade could easily change that. I would stick with a "course" set of timeouts that roughly covers the given stage.

@M-Davies
Copy link
Contributor Author

M-Davies commented Nov 6, 2020

Thanks Andrew. In that case, I'll leave as it is for now and let others add in platform specific timeouts in the future should they prove necessary. This is ready for review again (and merge, pending PR tester result)

@M-Davies M-Davies marked this pull request as ready for review November 6, 2020 15:55
@M-Davies M-Davies modified the milestones: October 2020, November 2020 Nov 7, 2020
@adoptopenjdk-github-bot
Copy link
Contributor

 PR TESTER RESULT 

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@M-Davies
Copy link
Contributor Author

M-Davies commented Nov 7, 2020

run tests

EDIT: Looks like the jdk11 pipeline keeps getting stuck for some reason...Round 3 of trying and then investigation if it doesn't work

@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@M-Davies
Copy link
Contributor Author

M-Davies commented Nov 9, 2020

run tests

Linux docker builds can't run with a node timeout in place
@M-Davies
Copy link
Contributor Author

M-Davies commented Nov 9, 2020

run tests

@adoptopenjdk-github-bot
Copy link
Contributor

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@M-Davies
Copy link
Contributor Author

M-Davies commented Nov 9, 2020

🟠 PR TESTER RESULT 🟠

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

Windows failed due to adoptium/infrastructure#1573
aarch64/JDK16/j9 failed due to unknown error. Rebuild passed https://ci.adoptopenjdk.net/job/build-scripts-pr-tester/job/build-test/job/jobs/job/jdk/job/jdk-linux-aarch64-openj9/384/

13:02:34  CCACHE_COMPRESS=1          CCACHE_SLOPPINESS=pch_defines,time_macros CCACHE_BASEDIR=/home/ubuntu/workspace/build-scripts-pr-tester/build-test/jobs/jdk/jdk-linux-aarch64-openj9/workspace/build/src /usr/local/gcc/bin/ccache  /usr/local/gcc/bin/gcc-7.5 -O -fgnu89-inline -g -DLINUX -D_REENTRANT -D_FILE_OFFSET_BITS=64 -fpic -DIPv6_FUNCTION_SUPPORT -DJ9AARCH64 -fstack-protector -Wimplicit -Wreturn-type -Werror -I. -I../include -I../oti -I../gc_include -I../omr/gc/include -I../gc_glue_java -I../nls -I../omr/include_core    -DTR_HOST_ARM64   -E vmcheck.c | sed -n -e '/^DDRFILE_BEGIN /,/^DDRFILE_END /s/^/@/' -e '/^@./p' > vmcheck.i
13:02:35  gmake[6]: Leaving directory '/home/ubuntu/workspace/build-scripts-pr-tester/build-test/jobs/jdk/jdk-linux-aarch64-openj9/workspace/build/src/build/linux-aarch64-server-release/vm/vmchk'
13:02:35  gmake[5]: Leaving directory '/home/ubuntu/workspace/build-scripts-pr-tester/build-test/jobs/jdk/jdk-linux-aarch64-openj9/workspace/build/src/build/linux-aarch64-server-release/vm'
13:02:35  [2020-11-09 13:02:34] Scraping anotations from preprocessed code ...
13:02:41  [2020-11-09 13:02:40] Restoring annotated files ...
13:02:41  [2020-11-09 13:02:41] All done.
13:02:41  Running ddrgen to generate j9ddr.dat and superset.dat
13:03:54  Blob written to file: ../j9ddr.dat
13:03:54  Superset written to file: ../superset.dat
13:03:54  gmake[4]: Leaving directory '/home/ubuntu/workspace/build-scripts-pr-tester/build-test/jobs/jdk/jdk-linux-aarch64-openj9/workspace/build/src/build/linux-aarch64-server-release/vm/ddr'
13:03:54  
13:03:54  ERROR: Build failed for targets 'product-images legacy-jre-image test-image debug-image' in configuration 'linux-aarch64-server-release' (exit code 2) 
13:03:54  Stopping sjavac server
13:03:54  
13:03:54  === Output from failing command(s) repeated here ===
13:03:54  * For target jdk_modules_jdk.jshell__the.jdk.jshell_batch:
13:03:54  * For target jdk_modules_jdk.security.auth__the.jdk.security.auth_batch:
13:03:54  
13:03:54  * All command lines available in /home/ubuntu/workspace/build-scripts-pr-tester/build-test/jobs/jdk/jdk-linux-aarch64-openj9/workspace/build/src/build/linux-aarch64-server-release/make-support/failure-logs.
13:03:54  === End of repeated output ===
13:03:54  
13:03:54  === Make failed targets repeated here ===
13:03:54  CompileJavaModules.gmk:609: recipe for target '/home/ubuntu/workspace/build-scripts-pr-tester/build-test/jobs/jdk/jdk-linux-aarch64-openj9/workspace/build/src/build/linux-aarch64-server-release/jdk/modules/jdk.security.auth/_the.jdk.security.auth_batch' failed
13:03:54  make/Main.gmk:197: recipe for target 'jdk.security.auth-java' failed
13:03:54  CompileJavaModules.gmk:609: recipe for target '/home/ubuntu/workspace/build-scripts-pr-tester/build-test/jobs/jdk/jdk-linux-aarch64-openj9/workspace/build/src/build/linux-aarch64-server-release/jdk/modules/jdk.jshell/_the.jdk.jshell_batch' failed
13:03:54  make/Main.gmk:197: recipe for target 'jdk.jshell-java' failed
13:03:54  === End of repeated output ===

@M-Davies
Copy link
Contributor Author

M-Davies commented Nov 9, 2020

@karianna Assuming @smlambert is happy with these changes, this should be good to merge

Copy link
Contributor

@smlambert smlambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @M-Davies !

temurin-build automation moved this from In Progress to Review/QA Nov 9, 2020
@karianna karianna merged commit 7c1deec into adoptium:master Nov 9, 2020
temurin-build automation moved this from Review/QA to Done Nov 9, 2020
@M-Davies M-Davies deleted the new_timeouts branch April 30, 2021 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues that enhance the code or documentation of the repo in any way jenkins Issues that enhance or fix our jenkins server
Projects
No open projects
temurin-build
  
Done
Development

Successfully merging this pull request may close these issues.

Revamp how timeouts are used in build pipelines
7 participants