Tabular View for Comparing Baseline and Test Builds for Selected Platforms & Metrics #37

piyush286 · 2019-02-08T22:49:56Z

Background About Benchmarking

For benchmarking, we always launch several iterations of a benchmark with a specific build to get performance results for various metrics such as throughput and startup time. These relative numbers are not very useful since they could change when benchmark is run on another platform, when the machine state isn't identical or when the configs are slightly different. Hence, we always use a baseline to gauge the performance of a newer test build.

While comparing baseline and test builds, it's important to use a relative number (Build 1 Score/Build 2 Score) instead of an absolute number (Build 1 Score - Build 2 Score) to look at the performance gap since the absolute number doesn't really mean much, could change and could have significantly varying range.

We usually use this formula to comparison:

Scenario	Example of Metrics	Comparison Formula
Higher is better	Throughput	Test Build/Baseline Build
Lower is better	Startup time, Footprint	Baseline Build/Test Build

Details about the Proposed Feature

Test Result Summary (TRS) should have the ability to create and show tabular views for comparing baseline and test build. Each view should show the relative comparison between baseline and test build in percentages corresponding to one specific metric and platform in a result cell. These result cells should be painted with different colors to classify the performance according to the table shown below.

Color Scheme for Result Cells

These tabular views would be extremely helpful in finding regression. I'm going to show the benefits of these tabular views with 2 examples.

Example 1:

SPEC Benchmarks

The tabular view above shows the results of all the SPEC benchmarks run on different platforms. From the results above, we can identify the following regressions easily:

x64 Regression (OS: Linux, Windows, macOS) for SPECjEnterprise & SPECjbb2015
Multi-benchmark Regression (SPECjEnterprise & SPECjbb2015) for x64 (Same as the 1st)
Single Platform Regression (Linux s390x) for SPECjbb2005

Example 2

Micro Benchmarks

The tabular view above shows the results of all the micro benchmarks run on different platforms. From the results above, we can identify the following regressions easily:

Cross-platform Linux Regression (HW: x64, ppcle64 & s390x) for ILOG ODM
Power Regression (OS: Linux & AIX) for HiBench

Requirements for Tabular Views

Basic requirements of this tabular comparison view:

Should show the relative comparison percent between baseline and test build for all platforms and metrics selected for that specific view.
Should the show the results of the latest test build against the latest runs for the baseline.
Should identify the score ranges with different colors.
Hovering over a result cell should show some basic information such as Java versions, confidence intervals and average scores
Open a new window with an unique URL when a result cell is clicked to show the full details of the runs such as score of each iteration for all metrics from those runs for both baseline and test build. This detailed view should be the same as the one shown when one clicks on the detailed view URL from the graph view, being developed for issue Performance Analysis Tools (Proposal from Developer JumpStart Tech Challenge) #28.
Should be configurable to show different platforms and metrics that are selected by the user for that specific view.
Ability to show the historic data for all previous weeks. This ability would help in finding the first build that showed a regression.
Ability to use one baseline build with different test builds of the same platform, even though that baseline build may not have been interleaved (More details about interleaving here: Ability to Interleave Performance Runs for Baseline & Test Builds aqa-tests#850 & Ability to Interleave Performance Runs for Baseline & Test Builds #24) with any of those test builds. Let's say, you want to have 2 table views: one for comparing OpenJDK8-OpenJ9 GA vs OpenJDK8-Hotspot Latest and another for OpenJDK8-OpenJ9 GA vs OpenJDK8-OpenJ9 Latest. So we have 2 views, both of which use the same baseline OpenJDK8-OpenJ9 GA. While running these 3 builds, we could have interleaved the baseline build with one of the two test builds (i.e. OpenJDK8-OpenJ9), so we wouldn't want to run the baseline again with the second test build (OpenJDK8-Hotspot) since the baseline would essentially give the same score, a move that would save significant machine time.

Advance Requirements for Tabular Views

Ability to show "Best So Far" build from all the data (To be included in Graph Timeline view as well)
Ability to monitor specific cell
Ability to show the difference between current and previous week for all cells
Ability to show only the results cells that have changed since last week
Ability to check and uncheck a specific cell to monitor for possible regression
Ability to link a GitHub issue to one or more cells

Assigned Contributors

My team would work on adding this functionality.

piyush286 · 2019-02-19T16:37:52Z

Tabular View to Display Passing % of Tests for Different Targets & Platforms

@smlambert had an excellent suggestion that we could leverage this tabular/matrix view design to display non-performance results as well such as functional and system tests.

I've put down a sample view to show one of the ways of how we could display the aggregate results of all Jenkins runs for various targets currently supported by OpenJDK that are mentioned here: https://github.com/AdoptOpenJDK/openjdk-tests. This view is just one of the options and it should certainly be updated to specifically meet the test team's requirements.

Each result cell could be used to show the percentage of tests that passed for a specific platform and target.

Summary of all Runs for Different Targets & Platforms

The tabular view above shows the results of various build lists run on different platforms. From the results above, we can identify the following issues easily:

Several "functional' failures" on all x64 platforms
Few "thirdparty_containers" failures on Windows x64
Several "openjdk_regression" failures on all Linux platforms
Few "jck" failures on all x64 platforms

…ed platforms and metrics - Closes adoptium#37 - Current filters: benchmark, platform, cell color - Allows comparison between different jdk versions and types - Cell on click redirects to Perf Compare - Perf Compare changed to fill in values from URL on load - Added sdkResource to parser, will be added as field to database

…ed platforms and metrics - Closes adoptium#37 - Current filters: benchmark, platform, cell color - Allows comparison between different jdk versions and types - Cell on click redirects to Perf Compare - Perf Compare changed to fill in values from URL on load - Added sdkResource to parser, will be added as field to database Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>

…ed platforms and metrics - Closes AdoptOpenJDK#37 - Current filters: benchmark, platform, cell color - Allows comparison between different jdk versions and types - Cell on click redirects to Perf Compare - Perf Compare changed to fill in values from URL on load - Added sdkResource to parser, will be added as field to database Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>

…ed platforms and metrics - Closes AdoptOpenJDK#37 - Current filters: benchmark, platform, cell color - Allows comparison between different jdk versions and types - Cell on click redirects to Perf Compare - Perf Compare changed to fill in values from URL on load - Added sdkResource to parser, will be added as field to database Co-authored-by: Piyush Gupta piyush286@gmail.com Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>

…ed platforms and metrics - Closes AdoptOpenJDK#37 - Current filters: benchmark, platform, cell color - Allows comparison between different jdk versions and types - Cell on click redirects to Perf Compare - Perf Compare changed to fill in values from URL on load - Added sdkResource to parser, will be added as field to database Co-authored-by: Piyush Gupta <piyush286@gmail.com> Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>

…ed platforms and metrics - Closes AdoptOpenJDK#37 - Current filters: benchmark, platform, cell color - Allows comparison between different jdk versions and types - Cell on click redirects to Perf Compare - Perf Compare changed to fill in values from URL on load - Added sdkResource to parser, will be added as field to database Co-Authored-By: Piyush Gupta <piyush286@gmail.com> Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>

…ed platforms and metrics - Closes AdoptOpenJDK#37 - Current filters: benchmark, platform, cell color - Allows comparison between different jdk versions and types - Cell on click redirects to Perf Compare - Perf Compare changed to fill in values from URL on load - Added sdkResource to parser, will be added as field to database - Warning sign appears if total CI exceeds percentage difference Co-Authored-By: Piyush Gupta <piyush286@gmail.com> Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>

Related to adoptium#136 adoptium/aqa-tests#1144 adoptium#37 Tabular View Changes - Enabled the setting of JDK date (i.e. benchmarkProduct) to be dynamic instead of expecting the launch agents such as PerfNext or TestKitGen to set it. ○ JDK date is used on Tabular View to show the data of latest baseline and test builds before that JDK date. - Updated the Tabular View query for fetching unique build names, sdk resource and build servers, options that are displayed for choosing desired baseline or test builds for comparison ○ Query didn't have the correct SDK resource location. - Resolved the issue of Tabular View incorrectly setting states for dropdown options - Fixed the Tabular View query for getting the filtered data by updating the SDK resource location Benchmark Parser Changes - Enable the parsing of some benchmarks such Liberty under Adopt openjdk-tests repo to be parsed by TRSS. This design to be extended further in future PRs to allow parsing of other benchmarks as well. - Simplied perf parser regexes to get various benchmark info such as benchmark name, variant and JDK info ○ Removed some constraints so that all info can be parsed without being affected by Jenkins timestamps - Updated the Java version regex for Open builds - Added extra regex check for parent builds in order to avoid TRSS from considering a perf build from Adopt as a test build. Example of an Adopt perf pipeline name (https://ci.adoptopenjdk.net/view/Test_perf/): Test_openjdk8_j9_sanity.perf_x86-64_linux. - Enabled parsing for ODM 300 Ruleset Signed-off-by: Piyush Gupta <piyush286@gmail.com>

Related to adoptium#136 adoptium/aqa-tests#1144 adoptium#37 Tabular View Changes - Enabled the setting of JDK date (i.e. benchmarkProduct) to be dynamic instead of expecting the launch agents such as PerfNext or TestKitGen to set it. ○ JDK date is used on Tabular View to show the data of latest baseline and test builds before that JDK date. - Updated the Tabular View query for fetching unique build names, sdk resource and build servers, options that are displayed for choosing desired baseline or test builds for comparison ○ Query didn't have the correct SDK resource location. - Resolved the issue of Tabular View incorrectly setting states for dropdown options - Fixed the Tabular View query for getting the filtered data by updating the SDK resource location Benchmark Parser Changes - Enable the parsing of some benchmarks such Liberty under Adopt openjdk-tests repo to be parsed by TRSS. This design to be extended further in future PRs to allow parsing of other benchmarks as well. - Simplied perf parser regexes to get various benchmark info such as benchmark name, variant and JDK info ○ Removed some constraints so that all info can be parsed without being affected by Jenkins timestamps - Updated the Java version regex for Open builds - Enabled parsing for ODM 300 Ruleset Signed-off-by: Piyush Gupta <piyush286@gmail.com>

Related to adoptium#136 adoptium/aqa-tests#1144 adoptium#37 Tabular View Changes - Enabled the setting of JDK date (i.e. benchmarkProduct) to be dynamic instead of expecting the launch agents such as PerfNext or TestKitGen to set it. ○ JDK date is used on Tabular View to show the data of latest baseline and test builds before that JDK date. - Updated the Tabular View query for fetching unique build names, sdk resource and build servers, options that are displayed for choosing desired baseline or test builds for comparison ○ Query didn't have the correct SDK resource location. - Resolved the issue of Tabular View incorrectly setting states for dropdown options - Fixed the Tabular View query for getting the filtered data by updating the SDK resource location Benchmark Parser Changes - Enable the parsing of some benchmarks such Liberty under Adopt openjdk-tests repo to be parsed by TRSS. This design to be extended further in future PRs to allow parsing of other benchmarks as well. - Simplied perf parser regexes to get various benchmark info such as benchmark name, variant and JDK info ○ Removed some constraints so that all info can be parsed without being affected by Jenkins timestamps - Updated the Java version regex for capture all kinds of JDK builds (IBM J9, Open J9, HotSpot, OpenJDK) - Moved `javaVersion` to higher common level in data structure - Updated Perf graph widgets to use `jdkDate` instead of `jdkBuildDateUnixTime` - Removed `jdkBuildDateUnixTime` since it's redundant now as we're storing the jdk - Renamed `benchmarkProduct` to `jdkDate` to reflect the correct data that it's storing and updated the code in Data Manager, Perf Compare and Tabular View - Enabled parsing for ODM 300 Ruleset Signed-off-by: Piyush Gupta <piyush286@gmail.com>

karianna added this to To do in aqa-test-tools via automation Feb 10, 2019

karianna added the enhancement New feature or request label Feb 10, 2019

piyush286 mentioned this issue Feb 19, 2019

Aggregate and sub-aggregate tests dashboard #16

Open

piyush286 mentioned this issue Mar 21, 2019

Aggregate Perf Results From Multiple Benchmark Iterations #73

Closed

awsafsakif mentioned this issue Aug 6, 2019

Adding Tabular View for comparing baseline and test builds for selected platforms and metrics #131

Merged

piyush286 mentioned this issue Aug 23, 2019

Optimize Tabular View Code #133

Open

piyush286 changed the title ~~Tabular Views for Comparing Baseline and Test Builds for Selected Platforms & Metrics~~ Tabular View for Comparing Baseline and Test Builds for Selected Platforms & Metrics Aug 23, 2019

llxia closed this as completed in #131 Aug 29, 2019

aqa-test-tools automation moved this from To do to Done Aug 29, 2019

piyush286 mentioned this issue Sep 11, 2019

Enhancements for Tabular View & Benchmark Parser #137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tabular View for Comparing Baseline and Test Builds for Selected Platforms & Metrics #37

Tabular View for Comparing Baseline and Test Builds for Selected Platforms & Metrics #37

piyush286 commented Feb 8, 2019 •

edited

Loading

piyush286 commented Feb 19, 2019 •

edited

Loading

Tabular View for Comparing Baseline and Test Builds for Selected Platforms & Metrics #37

Tabular View for Comparing Baseline and Test Builds for Selected Platforms & Metrics #37

Comments

piyush286 commented Feb 8, 2019 • edited Loading

Background About Benchmarking

Details about the Proposed Feature

Example 1:

Example 2

Requirements for Tabular Views

Advance Requirements for Tabular Views

Assigned Contributors

piyush286 commented Feb 19, 2019 • edited Loading

Tabular View to Display Passing % of Tests for Different Targets & Platforms

piyush286 commented Feb 8, 2019 •

edited

Loading

piyush286 commented Feb 19, 2019 •

edited

Loading