Ability to Interleave Performance Runs for Baseline & Test Builds #850

piyush286 · 2019-02-04T19:55:55Z

TLDR

In order to get more reliable results, we should have the ability to run 2 builds (baseline and test) by interleaving them. Currently, we can only run one build at a time. Hence, in order to run 2 tests, we need to launch them one by one, resulting in sequential runs.

I've mentioned some details below with some proposals. Those are just some suggestions, so there might be better ways of doing so.

Details

Background About Interleaved Runs (For more details, please refer a similar issue that I opened in another repo: adoptium/aqa-test-tools#24)

If T = Test Build; B = Baseline Build, # = Iteration

Interleaving Run Pattern:
Do alternate iteration of each baseline and test build in a ping-pong fashion.
T1, B1, T2, B2, T3, B3

Non-interleaved Run Pattern:
Do all iterations for one build and then do all iterations for another.
T1, T2, T3, B1, B2, B3

Possible Solutions

Screenshot below shows some of the relevant files that might require changes.

Option 1: Design

One possible way to enable interleaving could be to exploit the ITERATIONS param and use the loop in runTest() in JenkinsfileBase to launch runs in the desired manner as shown above in the screenshot. We could use a new parameter for passing a second SDK URL (i.e. baseline) or use the existing CUSTOMIZED_SDK_URL parameter if we can pass multiple URLs in it.

Option 1: Constraints

If we want to launch different Jenkins child builds from this Jenkins parent build running JenkinsfileBase, we can't use the same machine to launch both parent and child jobs since only one job can be scheduled on the performance machines. For example, we can't use machine X that's running this parent build with JenkinsfileBase to run the child builds since that would result in a deadlock. One very inefficient way to fix this would be to use machine X to push all material to machine Y, requiring an extra machine and taking longer time to push all material to machine Y. If we want to use this option, we would not be able to create Jenkins child builds, a solution that would result in the output of all iterations for baseline and test builds in one Jenkins build, which would be extremely lengthy and more complicated to parse for benchmark results and to debug any issues.

Option 2: Design

Use a parent build to launch child builds that run JenkinsfileBase by changing the CUSTOMIZED_SDK_URL for each iteration in a ping-pong fashion as shown below. We would first call the child build with the test SDK URL, then with the baseline SDK URL , then with the test SDK URL and so on. Hence, this design would have one parent build output that would have all the links to the child builds. This option would be a lot cleaner, resulting in smaller and separate job outputs for each iteration and making it easier for TRSS (https://github.com/AdoptOpenJDK/openjdk-test-tools/tree/master/TestResultSummaryService) to parse the benchmark results.

Parent Build Pipeline Code:

int iterations = "${Iterations}".toInteger()

for (int i=0; i < iterations; i++) {
    build job: "ODM_x86-64_linux", parameters: [
                    string(name: 'TARGET', value: "perf"),
                    string(name: 'JVM_VERSION', value: "openjdk8-openj9"),
                    string(name: 'CUSTOMIZED_SDK_URL', value: "${Test_Build_Link}"),
                    string(name: 'LABEL', value: "${Machine_Label}"),
                    string(name: 'PERF_CREDENTIALS_ID', value: "45375d80-7180-4cdc-8052-71a9510fbde3"),
                    string(name: 'BUILD_LIST', value: "performance/odm"),
                    string(name: 'JAVA_VERSION', value: "SE80")
                ]
                
    build job: "ODM_x86-64_linux", parameters: [
                    string(name: 'TARGET', value: "perf"),
                    string(name: 'JVM_VERSION', value: "openjdk8-openj9"),
                    string(name: 'CUSTOMIZED_SDK_URL', value: "${Baseline_Build_Link}"),
                    string(name: 'LABEL', value: "${Machine_Label}"),
                    string(name: 'PERF_CREDENTIALS_ID', value: "45375d80-7180-4cdc-8052-71a9510fbde3"),
                    string(name: 'BUILD_LIST', value: "performance/odm"),
                    string(name: 'JAVA_VERSION', value: "SE80")
                ]
}

Child Build Pipeline Code:
Run https://github.com/AdoptOpenJDK/openjdk-tests/buildenv/jenkins/, which calls JenkinsfileBase.

Option 2: Constraints

JenkinsfileBase pipeline script downloads the all the material such as SDK, benchmark package and git repos with relevant test material every time it's called, a process that might be extremely redundant and time consuming if the only thing that changes between multiple iterations is the SDK. We would need to explore or add some capability to JenkinsfileBase so that we can avoid doing so.

The text was updated successfully, but these errors were encountered:

piyush286 · 2019-02-04T20:27:13Z

@llxia @ShelleyLambert We are thinking of what might be the best way to enable interleaving. Would you guys have some suggestions?

smlambert · 2019-02-04T23:01:19Z

Yes, I have some thoughts on this. The main points are:

set up 2 SDKs
supply number of iterations (how many times to run T and B), I will assume if 4 is passed in, it will run T 4x and B 4x in some sort of pattern
supply a pattern by which to run them on the same machine. Default pattern would be alternating T B T B... are any other patterns ever used or considered useful (TT BB TT BB) ?

For 1) since we can already pass multiple zipped files via the CUSTOMIZED_SDK_URL parameter (space-separated URLs), I think we just use that mechanism for setting up the SDKs. Minor changes would have to be made in get.sh for it, so we setup 2 dirs, openjdkbinary/test and openjdkbinary/baseline (in the case where BUILD_LIST is not performance, only openjdkbinary/test is needed).

Adding some sort of logic to check if the SDKs are already present on the machine is 'possible', though in open build farms, we clean up machines after each run and do not leave workspaces laying around.
I prefer this approach as its simpler, but we can measure to see how much time it takes for the setup (I do not think it is a large amount of time compared as a percentage of an entire job run). If we decide it is worth doing, it can be done under a separate issue and PR.

For 2) can use the ITERATIONS param as is

For 3) wondering whether we can just use the playlist.xml <command> for each of the test targets that are needed, where you pass the JDK_DIR in to the benchmark script as a parameter.

An example (which varies the current state of ODM test):
<test> <testCaseName>ODM_interleaved</testCaseName> <command>cd $(TEST_RESROOT)/ilog_wodm881; \ cp $(TEST_RESROOT)/../../../openjdk-tests/performance/odm/scripts/benchmark.sh .; \ chmod a+x benchmark.sh; \ bash benchmark.sh openjdkbinary/test; \ bash benchmark.sh openjdkbinary/baseline; \ ${TEST_STATUS} </command>  <platformRequirements>os.linux,arch.x86,bits.64</platformRequirements> <levels> <level>extended</level> </levels> <groups> <group>perf</group> </groups> </test>

So if I ran that test with ITERATIONS=1, it would run T B
if I ran it with ITERATIONS=2, it would run T B T B
etc.
With this approach, your playlist can contain a regular version of the test, which only runs a T
and an interleaved version with runs a T B

smlambert · 2019-02-04T23:09:56Z

With the approach I suggested above, benefits are:

you are able to interleave via commandline testing (not just in Jenkins-based testing), so developers get interleave feature if they want to run on their laptops
you minimize changes to Jenkinsfilebase (which is a good thing, as its the base for all testing)
you can add other patterns in the playlist file, if you want or need them

Ultimately, I think there is some cleanup that can and should be done to the benchmark.sh scripts that would exist in each of the performance/subdirs (as we talked about in last meeting).

It would be worth giving this approach a try, as its fairly easy to try in a branch and see how it will run (and what pieces of the story may be missing).

piyush286 · 2019-02-05T22:22:39Z

@ShelleyLambert Thank you so much for your suggestions. This discussion would help us all to decide and work towards a more pragmatic solution.

ShelleyLambert: are any other patterns ever used or considered useful (TT BB TT BB)

Besides interleaving pattern (T B T B), the only other pattern that we care about is a Cold/Warm variant of interleaving pattern. We use the Cold/Warm variant only for Liberty and WAS, in which we destroy the shared class cache only for the cold iteration. The pattern would look like this:

T Cold, B Cold, T Warm, B Warm

ShelleyLambert: you are able to interleave via commandline testing (not just in Jenkins-based testing), so developers get interleave feature if they want to run on their laptops

From your thoughts above, I feel that you might be suggesting to go with Option 1, which is to enable interleaving inside Openjdk-tests Framework.

While having interleaving feature in-built in Openjdk-tests Framework (i.e. Doing the interleaving in JenkinsfileBase) would certainly be nice to have but I feel that would really complicate the story of looking at the Jenkins output and parsing it for multiple iterations as I had mentioned earlier. I've quoted that concern below for convenience.

piyush286: If we want to use this option, we would not be able to create Jenkins child builds, a solution that would result in the output of all iterations for baseline and test builds in one Jenkins build, which would be extremely lengthy and more complicated to parse for benchmark results and to debug any issues.

Having in-built interleaving (i.e. Option 1) would certainly be great if we could create child jobs or separate outputs for each iteration. Even if a developer wants to run interleaved builds through command line, that can easily be done by a for loop in a bash script. Maybe we can provide such a script to make it easier in case someone wants to do so.

I feel that option 2 might be slightly easier and cleaner to implement for these reasons:

piyush286: This option would be a lot cleaner, resulting in smaller and separate job outputs for each iteration and making it easier for TRSS (https://github.com/AdoptOpenJDK/openjdk-test-tools/tree/master/TestResultSummaryService) to parse the benchmark results.

In order to address the constraint of Option 2,

piyush286: JenkinsfileBase pipeline script downloads the all the material such as SDK, benchmark package and git repos with relevant test material every time it's called, a process that might be extremely redundant and time consuming if the only thing that changes between multiple iterations is the SDK. We would need to explore or add some capability to JenkinsfileBase so that we can avoid doing so.

we can use a parameter to the Openjdk-tests Framework. In the Jenkins-based interleaving, if the current iteration is not the last one, then we set KEEP_WORKSPACE or a similar parameter to true so that we avoid downloading all the material such as SDK, benchmark package and git repos with relevant test material every time JenkinsfileBase is called. If it's the last iteration, then we can set that parameter to false and that would delete all the material in that workspace.

Please let me know what you think :)

piyush286 · 2019-02-05T22:49:48Z

Forgot to address these points:

ShelleyLambert: you minimize changes to Jenkinsfilebase (which is a good thing, as its the base for all testing)
you can add other patterns in the playlist file, if you want or need them

I totally agree that it would be good to limit the changes in JenkinsfileBase in order to avoid breaking other tests by mistake. Even if we go with Option 1, which is to have interleaving logic inside Openjdk-tests Framework, we would need to have the interleaving code in some common file such as JenkinsfileBase since that code would be common for all benchmark runs. If we move that code to the benchmark level (i.e. odm/playlist.xml), then we would need to have duplicate code for other benchmarks as well.

I've also opened an issue for discussing the design for using benchmark configs with Openjdk-tests Framework: #853

smlambert · 2019-02-05T23:00:41Z

I'd like to have a web conference to discuss in detail. We can extend next week's meetup to allow enough time.

piyush286 · 2019-02-12T22:26:12Z

Jotting down some more thoughts that came up so that I don't forget. We can discuss these thoughts as well in our meeting.

Another Issue with Option 1 & Benefit of Option 2:

With option 1, the running time of the Jenkins build would be extremely high. For example, each SPECjbb2015 iteration takes 2.5 hours so if we interleave with a baseline and run 4 iterations, that would mean that the Jenkins build would run for 20 hours (2.5 hours/iteration * 4 iterations/SDK * 2 SDKs). If that perf machine is urgently needed for some higher priority runs, then we would not be able to use that machine without killing our long 20-hour Jenkins build, resulting in wastage of machine time.

This problem can be easily solved with Option 2. Having builds with lower run times would allow us to book the machine and use it for some higher priority work if needed as mentioned in this issue: #889

karianna added this to TODO in aqa-tests via automation Feb 5, 2019

karianna added the enhancement label Feb 5, 2019

piyush286 mentioned this issue Feb 8, 2019

Tabular View for Comparing Baseline and Test Builds for Selected Platforms & Metrics adoptium/aqa-test-tools#37

Closed

This was referenced Feb 12, 2019

Ability for Jenkins Scheduler to Temporarily Pause The Automated Perf Runs #889

Closed

Ability to Interleave Performance Runs for Baseline & Test Builds adoptium/aqa-test-tools#24

Closed

This was referenced Mar 25, 2019

Updated Perf Test display with parent and children builds separately adoptium/aqa-test-tools#52

Merged

#https://github.com/AdoptOpenJDK/openjdk-tests/issues/850 adoptium/aqa-test-tools#81

Closed

sophiaxu0424 mentioned this issue Apr 9, 2019

Added aggregated information to perf builds and modified Perf Compare display adoptium/aqa-test-tools#85

Closed

sophiaxu0424 mentioned this issue May 3, 2019

Updated Perf Compare (UI) using aggregate information as data source adoptium/aqa-test-tools#104

Merged

smlambert added the type:perf label Oct 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to Interleave Performance Runs for Baseline & Test Builds #850

Ability to Interleave Performance Runs for Baseline & Test Builds #850

piyush286 commented Feb 4, 2019 •

edited

Loading

piyush286 commented Feb 4, 2019

smlambert commented Feb 4, 2019

smlambert commented Feb 4, 2019

piyush286 commented Feb 5, 2019 •

edited

Loading

piyush286 commented Feb 5, 2019 •

edited

Loading

smlambert commented Feb 5, 2019

piyush286 commented Feb 12, 2019

Ability to Interleave Performance Runs for Baseline & Test Builds #850

Ability to Interleave Performance Runs for Baseline & Test Builds #850

Comments

piyush286 commented Feb 4, 2019 • edited Loading

TLDR

Details

Background About Interleaved Runs (For more details, please refer a similar issue that I opened in another repo: adoptium/aqa-test-tools#24)

Possible Solutions

Option 1: Design

Option 1: Constraints

Option 2: Design

Option 2: Constraints

piyush286 commented Feb 4, 2019

smlambert commented Feb 4, 2019

smlambert commented Feb 4, 2019

piyush286 commented Feb 5, 2019 • edited Loading

piyush286 commented Feb 5, 2019 • edited Loading

smlambert commented Feb 5, 2019

piyush286 commented Feb 12, 2019

Another Issue with Option 1 & Benefit of Option 2:

piyush286 commented Feb 4, 2019 •

edited

Loading

piyush286 commented Feb 5, 2019 •

edited

Loading

piyush286 commented Feb 5, 2019 •

edited

Loading