Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabular View for Comparing Baseline and Test Builds for Selected Platforms & Metrics #37

Closed
piyush286 opened this issue Feb 8, 2019 · 1 comment · Fixed by #131
Closed
Labels
enhancement New feature or request

Comments

@piyush286
Copy link
Contributor

piyush286 commented Feb 8, 2019

Background About Benchmarking

For benchmarking, we always launch several iterations of a benchmark with a specific build to get performance results for various metrics such as throughput and startup time. These relative numbers are not very useful since they could change when benchmark is run on another platform, when the machine state isn't identical or when the configs are slightly different. Hence, we always use a baseline to gauge the performance of a newer test build.

While comparing baseline and test builds, it's important to use a relative number (Build 1 Score/Build 2 Score) instead of an absolute number (Build 1 Score - Build 2 Score) to look at the performance gap since the absolute number doesn't really mean much, could change and could have significantly varying range.

We usually use this formula to comparison:

Scenario Example of Metrics Comparison Formula
Higher is better Throughput Test Build/Baseline Build
Lower is better Startup time, Footprint Baseline Build/Test Build

Details about the Proposed Feature

Test Result Summary (TRS) should have the ability to create and show tabular views for comparing baseline and test build. Each view should show the relative comparison between baseline and test build in percentages corresponding to one specific metric and platform in a result cell. These result cells should be painted with different colors to classify the performance according to the table shown below.

Color Scheme for Result Cells
image
These tabular views would be extremely helpful in finding regression. I'm going to show the benefits of these tabular views with 2 examples.

Example 1:

SPEC Benchmarks
image

The tabular view above shows the results of all the SPEC benchmarks run on different platforms. From the results above, we can identify the following regressions easily:

  1. x64 Regression (OS: Linux, Windows, macOS) for SPECjEnterprise & SPECjbb2015
  2. Multi-benchmark Regression (SPECjEnterprise & SPECjbb2015) for x64 (Same as the 1st)
  3. Single Platform Regression (Linux s390x) for SPECjbb2005

Example 2

Micro Benchmarks
image

The tabular view above shows the results of all the micro benchmarks run on different platforms. From the results above, we can identify the following regressions easily:

  1. Cross-platform Linux Regression (HW: x64, ppcle64 & s390x) for ILOG ODM
  2. Power Regression (OS: Linux & AIX) for HiBench

Requirements for Tabular Views

Basic requirements of this tabular comparison view:

  1. Should show the relative comparison percent between baseline and test build for all platforms and metrics selected for that specific view.
  2. Should the show the results of the latest test build against the latest runs for the baseline.
  3. Should identify the score ranges with different colors.
  4. Hovering over a result cell should show some basic information such as Java versions, confidence intervals and average scores
  5. Open a new window with an unique URL when a result cell is clicked to show the full details of the runs such as score of each iteration for all metrics from those runs for both baseline and test build. This detailed view should be the same as the one shown when one clicks on the detailed view URL from the graph view, being developed for issue Performance Analysis Tools (Proposal from Developer JumpStart Tech Challenge) #28.
  6. Should be configurable to show different platforms and metrics that are selected by the user for that specific view.
  7. Ability to show the historic data for all previous weeks. This ability would help in finding the first build that showed a regression.
  8. Ability to use one baseline build with different test builds of the same platform, even though that baseline build may not have been interleaved (More details about interleaving here: Ability to Interleave Performance Runs for Baseline & Test Builds aqa-tests#850 & Ability to Interleave Performance Runs for Baseline & Test Builds #24) with any of those test builds. Let's say, you want to have 2 table views: one for comparing OpenJDK8-OpenJ9 GA vs OpenJDK8-Hotspot Latest and another for OpenJDK8-OpenJ9 GA vs OpenJDK8-OpenJ9 Latest. So we have 2 views, both of which use the same baseline OpenJDK8-OpenJ9 GA. While running these 3 builds, we could have interleaved the baseline build with one of the two test builds (i.e. OpenJDK8-OpenJ9), so we wouldn't want to run the baseline again with the second test build (OpenJDK8-Hotspot) since the baseline would essentially give the same score, a move that would save significant machine time.

Advance Requirements for Tabular Views

  1. Ability to show "Best So Far" build from all the data (To be included in Graph Timeline view as well)
  2. Ability to monitor specific cell
  3. Ability to show the difference between current and previous week for all cells
  4. Ability to show only the results cells that have changed since last week
  5. Ability to check and uncheck a specific cell to monitor for possible regression
  6. Ability to link a GitHub issue to one or more cells

Assigned Contributors

My team would work on adding this functionality.

@karianna karianna added this to To do in aqa-test-tools via automation Feb 10, 2019
@karianna karianna added the enhancement New feature or request label Feb 10, 2019
@piyush286
Copy link
Contributor Author

piyush286 commented Feb 19, 2019

Tabular View to Display Passing % of Tests for Different Targets & Platforms

@smlambert had an excellent suggestion that we could leverage this tabular/matrix view design to display non-performance results as well such as functional and system tests.

I've put down a sample view to show one of the ways of how we could display the aggregate results of all Jenkins runs for various targets currently supported by OpenJDK that are mentioned here: https://github.com/AdoptOpenJDK/openjdk-tests. This view is just one of the options and it should certainly be updated to specifically meet the test team's requirements.

Each result cell could be used to show the percentage of tests that passed for a specific platform and target.

Summary of all Runs for Different Targets & Platforms
image

The tabular view above shows the results of various build lists run on different platforms. From the results above, we can identify the following issues easily:

  1. Several "functional' failures" on all x64 platforms
  2. Few "thirdparty_containers" failures on Windows x64
  3. Several "openjdk_regression" failures on all Linux platforms
  4. Few "jck" failures on all x64 platforms

awsafsakif added a commit to awsafsakif/openjdk-test-tools that referenced this issue Aug 6, 2019
…ed platforms and metrics

- Closes adoptium#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database
awsafsakif added a commit to awsafsakif/openjdk-test-tools that referenced this issue Aug 6, 2019
…ed platforms and metrics

- Closes adoptium#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database

Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>
awsafsakif added a commit to awsafsakif/openjdk-test-tools that referenced this issue Aug 6, 2019
…ed platforms and metrics

- Closes adoptium#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database

Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>
awsafsakif referenced this issue in awsafsakif/openjdk-test-tools Aug 6, 2019
…ed platforms and metrics

- Closes AdoptOpenJDK#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database

Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>
awsafsakif referenced this issue in awsafsakif/openjdk-test-tools Aug 8, 2019
…ed platforms and metrics

- Closes AdoptOpenJDK#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database

Co-authored-by: Piyush Gupta piyush286@gmail.com

Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>
awsafsakif referenced this issue in awsafsakif/openjdk-test-tools Aug 12, 2019
…ed platforms and metrics

- Closes AdoptOpenJDK#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database

Co-authored-by: Piyush Gupta <piyush286@gmail.com>

Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>
awsafsakif referenced this issue in awsafsakif/openjdk-test-tools Aug 19, 2019
…ed platforms and metrics

- Closes AdoptOpenJDK#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database

Co-Authored-By: Piyush Gupta <piyush286@gmail.com>
Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>
awsafsakif referenced this issue in awsafsakif/openjdk-test-tools Aug 19, 2019
…ed platforms and metrics

- Closes AdoptOpenJDK#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database

Co-Authored-By: Piyush Gupta <piyush286@gmail.com>
Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>
awsafsakif referenced this issue in awsafsakif/openjdk-test-tools Aug 19, 2019
…ed platforms and metrics

- Closes AdoptOpenJDK#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database

Co-Authored-By: Piyush Gupta <piyush286@gmail.com>
Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>
awsafsakif referenced this issue in awsafsakif/openjdk-test-tools Aug 20, 2019
…ed platforms and metrics

- Closes AdoptOpenJDK#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database
- Warning sign appears if total CI exceeds percentage difference

Co-Authored-By: Piyush Gupta <piyush286@gmail.com>
Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>
awsafsakif referenced this issue in awsafsakif/openjdk-test-tools Aug 20, 2019
…ed platforms and metrics

- Closes AdoptOpenJDK#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database
- Warning sign appears if total CI exceeds percentage difference

Co-Authored-By: Piyush Gupta <piyush286@gmail.com>
Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>
@piyush286 piyush286 changed the title Tabular Views for Comparing Baseline and Test Builds for Selected Platforms & Metrics Tabular View for Comparing Baseline and Test Builds for Selected Platforms & Metrics Aug 23, 2019
awsafsakif referenced this issue in awsafsakif/openjdk-test-tools Aug 28, 2019
…ed platforms and metrics

- Closes AdoptOpenJDK#37
- Current filters: benchmark, platform, cell color
- Allows comparison between different jdk versions and types
- Cell on click redirects to Perf Compare
- Perf Compare changed to fill in values from URL on load
- Added sdkResource to parser, will be added as field to database
- Warning sign appears if total CI exceeds percentage difference

Co-Authored-By: Piyush Gupta <piyush286@gmail.com>
Signed-off-by: Awsaf Arefin Sakif <awsaf.sakif@ibm.com>
aqa-test-tools automation moved this from To do to Done Aug 29, 2019
piyush286 added a commit to piyush286/openjdk-test-tools that referenced this issue Sep 11, 2019
Related to adoptium#136 adoptium/aqa-tests#1144 adoptium#37

Tabular View Changes
	- Enabled the setting of JDK date (i.e. benchmarkProduct) to be dynamic instead of expecting the launch agents such as PerfNext or TestKitGen to set it.
		○ JDK date is used on Tabular View to show the data of latest baseline and test builds before that JDK date.
	- Updated the Tabular View query for fetching unique build names, sdk resource and build servers, options that are displayed for choosing desired baseline or test builds for comparison
		○ Query didn't have the correct SDK resource location.
	- Resolved the issue of Tabular View incorrectly setting states for dropdown options
	- Fixed the Tabular View query for getting the filtered data by updating the SDK resource location

Benchmark Parser Changes
	- Enable the parsing of some benchmarks such Liberty under Adopt openjdk-tests repo to be parsed by TRSS. This design to be extended further in future PRs to allow parsing of other benchmarks as well.
	- Simplied perf parser regexes to get various benchmark info such as benchmark name, variant and JDK info
		○ Removed some constraints so that all info can be parsed without being affected by Jenkins timestamps
	- Updated the Java version regex for Open builds
	- Added extra regex check for parent builds in order to avoid TRSS from considering a perf build from Adopt as a test build. Example of an Adopt perf pipeline name (https://ci.adoptopenjdk.net/view/Test_perf/): Test_openjdk8_j9_sanity.perf_x86-64_linux.
	- Enabled parsing for ODM 300 Ruleset

Signed-off-by: Piyush Gupta <piyush286@gmail.com>
piyush286 added a commit to piyush286/openjdk-test-tools that referenced this issue Sep 16, 2019
Related to adoptium#136 adoptium/aqa-tests#1144 adoptium#37

Tabular View Changes
	- Enabled the setting of JDK date (i.e. benchmarkProduct) to be dynamic instead of expecting the launch agents such as PerfNext or TestKitGen to set it.
		○ JDK date is used on Tabular View to show the data of latest baseline and test builds before that JDK date.
	- Updated the Tabular View query for fetching unique build names, sdk resource and build servers, options that are displayed for choosing desired baseline or test builds for comparison
		○ Query didn't have the correct SDK resource location.
	- Resolved the issue of Tabular View incorrectly setting states for dropdown options
	- Fixed the Tabular View query for getting the filtered data by updating the SDK resource location

Benchmark Parser Changes
	- Enable the parsing of some benchmarks such Liberty under Adopt openjdk-tests repo to be parsed by TRSS. This design to be extended further in future PRs to allow parsing of other benchmarks as well.
	- Simplied perf parser regexes to get various benchmark info such as benchmark name, variant and JDK info
		○ Removed some constraints so that all info can be parsed without being affected by Jenkins timestamps
	- Updated the Java version regex for Open builds
	- Enabled parsing for ODM 300 Ruleset

Signed-off-by: Piyush Gupta <piyush286@gmail.com>
piyush286 added a commit to piyush286/openjdk-test-tools that referenced this issue Sep 18, 2019
Related to adoptium#136 adoptium/aqa-tests#1144 adoptium#37

Tabular View Changes
	- Enabled the setting of JDK date (i.e. benchmarkProduct) to be dynamic instead of expecting the launch agents such as PerfNext or TestKitGen to set it.
		○ JDK date is used on Tabular View to show the data of latest baseline and test builds before that JDK date.
	- Updated the Tabular View query for fetching unique build names, sdk resource and build servers, options that are displayed for choosing desired baseline or test builds for comparison
		○ Query didn't have the correct SDK resource location.
	- Resolved the issue of Tabular View incorrectly setting states for dropdown options
	- Fixed the Tabular View query for getting the filtered data by updating the SDK resource location

Benchmark Parser Changes
	- Enable the parsing of some benchmarks such Liberty under Adopt openjdk-tests repo to be parsed by TRSS. This design to be extended further in future PRs to allow parsing of other benchmarks as well.
	- Simplied perf parser regexes to get various benchmark info such as benchmark name, variant and JDK info
		○ Removed some constraints so that all info can be parsed without being affected by Jenkins timestamps
	- Updated the Java version regex for capture all kinds of JDK builds (IBM J9, Open J9, HotSpot, OpenJDK)
	- Moved `javaVersion` to higher common level in data structure
	- Updated Perf graph widgets to use `jdkDate` instead of `jdkBuildDateUnixTime`
	- Removed  `jdkBuildDateUnixTime` since it's redundant now as we're storing the jdk
	- Renamed `benchmarkProduct` to `jdkDate` to reflect the correct data that it's storing and updated the code in Data Manager, Perf Compare and Tabular View
	- Enabled parsing for ODM 300 Ruleset

Signed-off-by: Piyush Gupta <piyush286@gmail.com>
piyush286 added a commit to piyush286/openjdk-test-tools that referenced this issue Sep 19, 2019
Related to adoptium#136 adoptium/aqa-tests#1144 adoptium#37

Tabular View Changes
	- Enabled the setting of JDK date (i.e. benchmarkProduct) to be dynamic instead of expecting the launch agents such as PerfNext or TestKitGen to set it.
		○ JDK date is used on Tabular View to show the data of latest baseline and test builds before that JDK date.
	- Updated the Tabular View query for fetching unique build names, sdk resource and build servers, options that are displayed for choosing desired baseline or test builds for comparison
		○ Query didn't have the correct SDK resource location.
	- Resolved the issue of Tabular View incorrectly setting states for dropdown options
	- Fixed the Tabular View query for getting the filtered data by updating the SDK resource location

Benchmark Parser Changes
	- Enable the parsing of some benchmarks such Liberty under Adopt openjdk-tests repo to be parsed by TRSS. This design to be extended further in future PRs to allow parsing of other benchmarks as well.
	- Simplied perf parser regexes to get various benchmark info such as benchmark name, variant and JDK info
		○ Removed some constraints so that all info can be parsed without being affected by Jenkins timestamps
	- Updated the Java version regex for capture all kinds of JDK builds (IBM J9, Open J9, HotSpot, OpenJDK)
	- Moved `javaVersion` to higher common level in data structure
	- Updated Perf graph widgets to use `jdkDate` instead of `jdkBuildDateUnixTime`
	- Removed  `jdkBuildDateUnixTime` since it's redundant now as we're storing the jdk
	- Renamed `benchmarkProduct` to `jdkDate` to reflect the correct data that it's storing and updated the code in Data Manager, Perf Compare and Tabular View
	- Enabled parsing for ODM 300 Ruleset

Signed-off-by: Piyush Gupta <piyush286@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
2 participants