Aggregate Perf Results From Multiple Benchmark Iterations #73

piyush286 · 2019-03-21T18:25:56Z

Problem Description

Currently, we don't aggregate numbers for multiple benchmark iterations when each Jenkins build is stored in the database. As a result, all results such as average, median and confidence interval need to be calculated when Perf Compare is used to compare 2 builds. This design is not preferred due to the following reasons:
1) It takes time to generate Perf Reports through Perf Compare.
2) Aggregated results are not stored so they would need to be generated every time they are needed even though they don't change.
3) It requires more CPU time and puts unnecessary pressure on the database.

These issues should be resolved with the proposed changes mentioned below. This would significantly improve the speed of getting results, which would be needed for different views such as Dashboard (#28) and Tabular View (#37).

Proposed Changes

Move the math library from frontend to backend: https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/eb9c0d302759787e15afe96fe8d24b0e2b1f907c/test-result-summary-client/src/PerfCompare/lib/BenchmarkMath.js
Generate all the aggregated results for a master build that may have single or multiple child jobs and store them in the parent object in testResults collection.

Additional data that needs to be added to Parent Object: benchmarkName, benchmarkVariant, benchmarkProduct, testData.
Additional data in testData for Parent Object: Aggregated numbers for all metrics: Mean, Median, Confidence Interval, Min, Max, StdDev
Note: For Liberty startup, there will only be 1 index in testData.metrics.[0].value for parent object.

Instead of Perf Compare generating the perf numbers, it should just make a request to the backend to query the database and fetch the stored results.

Assigned Contributors

Sophia (@sophiaxu0424) from my team will work on this feature.

The text was updated successfully, but these errors were encountered:

piyush286 · 2019-05-03T15:01:30Z

Design for Aggregating Perf Data

For perf, we'll have 2 kinds of builds: Parent and Child Builds. We'll be doing similar aggregating for both of them with minor differences.

Child Build

For each child build, we'll aggregate the data of all iterations run inside that Jenkins build. We aggregate only the good data and ignore the iterations for which values might be null.

Raw Data for Child:

            "testData" : {
                "metrics" : [ 
                    {
                        "name" : "Footprint in kb",
                        "value" : [ 
                            167268, 
                            168804, 
                            169748, 
                            168884, 
                            167272, 
                            167888, 
                            169628, 
                            165008
                        ]
                    }, 
                    {
                        "name" : "Startup time in ms",
                        "value" : [ 
                            4207, 
                            4298, 
                            4112, 
                            4161, 
                            4575, 
                            4213, 
                            4606, 
                            4406
                        ]
                    }
                ],
                "javaVersion" : "<javaVersion>",
                "jdkBuildDateUnixTime" : 1554206400000.0
            }

Aggregated Data Structure for Child

    "aggregateInfo" : [ 
        {
            "benchmarkName" : "LibertyStartupDT",
            "benchmarkVariant" : "17dev-4way-0-256-qs",
            "benchmarkProduct" : "Build-20190402_03",
            "metrics" : [ 
                {
                    "name" : "Footprint in kb",
                    "value" : {
                        "mean" : 168062.5,
                        "max" : 169748,
                        "min" : 165008,
                        "median" : 168346,
                        "stddev" : 1564.4682346225,
                        "CI" : 0.0077836281768634,
                        "iteration" : 8
                    }
                }, 
                {
                    "name" : "Startup time in ms",
                    "value" : {
                        "mean" : 4322.25,
                        "max" : 4606,
                        "min" : 4112,
                        "median" : 4255.5,
                        "stddev" : 188.005888965517,
                        "CI" : 0.0363703702021615,
                        "iteration" : 8
                    }
                }
            ]
        }
    ],

Parent Build

For each parent build, we'll aggregate the "raw" (and not "aggregated") data of all child builds that were launched by that parent build. Each child may carry different weight if they have different number of valid data points as compared to other child jobs. Hence, we'll use weighted average to get the most accurate results.

For each parent build, we'll aggregate the "aggregated" (and not "raw") data of all child builds that were launched by that parent build. Since all child jobs are identical configurations for perf testing, we weigh each child job equally even if one child job may have different # of good values to aggregate due to some unexpected failures, something that isn't very common. We need weigh child jobs equally due to thermal and other factors.

Raw Data for Parent
Doesn't keep any raw data.

Aggregated Data for Parent

    "aggregateInfo" : [ 
        {
            "benchmarkName" : "LibertyStartupDT",
            "benchmarkVariant" : "17dev-4way-0-256-qs",
            "benchmarkProduct" : "<JDKName>",
            "metrics" : [ 
                {
                    "name" : "Footprint in kb",
                    "value" : {
                        "mean" : 168801.25,
                        "max" : 169342.5,
                        "min" : 168260,
                        "median" : 168801.25,
                        "stddev" : 765.443090634438,
                        "CI" : 0.0407409453425256,
                        "iteration" : 2
                    }
                }, 
                {
                    "name" : "Startup time in ms",
                    "value" : {
                        "mean" : 4296.3125,
                        "max" : 4305.5,
                        "min" : 4287.125,
                        "median" : 4296.3125,
                        "stddev" : 12.9930871043028,
                        "CI" : 0.0271712951513653,
                        "iteration" : 2
                    }
                }
            ]
        }
    ]

piyush286 · 2019-05-03T15:02:02Z

Sample Parent Job

{
    "_id" : ObjectId("5ccb3473865c57c13bbff657"),
    "url" : "<JenkinsURL>",
    "buildName" : "PerfNext-Parent",
    "buildNum" : 16,
    "buildDuration" : 725926,
    "buildResult" : "SUCCESS",
    "timestamp" : 1554733203557.0,
    "type" : "Perf",
    "status" : "Done",
    "artifactory" : null,
    "buildData" : {},
    "buildOutputId" : ObjectId("5ccb347843ed61c13e693338"),
    "buildUrl" : "<JenkinsURL>/job/PerfNext-Parent/16/",
    "hasChildren" : true,
    "machine" : null,
    "parserType" : "ParentBuild",
    "startBy" : "user <email>",
    "aggregateInfo" : [ 
        {
            "benchmarkName" : "LibertyStartupDT",
            "benchmarkVariant" : "17dev-4way-0-256-qs",
            "benchmarkProduct" : "<JDKName>",
            "metrics" : [ 
                {
                    "name" : "Footprint in kb",
                    "value" : {
                        "mean" : 168801.25,
                        "max" : 169342.5,
                        "min" : 168260,
                        "median" : 168801.25,
                        "stddev" : 765.443090634438,
                        "CI" : 0.0407409453425256,
                        "iteration" : 2
                    }
                }, 
                {
                    "name" : "Startup time in ms",
                    "value" : {
                        "mean" : 4296.3125,
                        "max" : 4305.5,
                        "min" : 4287.125,
                        "median" : 4296.3125,
                        "stddev" : 12.9930871043028,
                        "CI" : 0.0271712951513653,
                        "iteration" : 2
                    }
                }
            ]
        }
    ]
}

piyush286 · 2019-05-03T15:02:22Z

Sample Child Job

{
    "_id" : ObjectId("5ccb347943ed61c13e69333a"),
    "url" : "<JenkinsURL>",
    "buildName" : "PerfNext-Child",
    "buildNameStr" : "PerfNext-Child",
    "buildNum" : 27,
    "parentId" : ObjectId("5ccb3473865c57c13bbff658"),
    "type" : "Perf",
    "status" : "Done",
    "aggregateInfo" : [ 
        {
            "benchmarkName" : "LibertyStartupDT",
            "benchmarkVariant" : "17dev-4way-0-256-qs",
            "benchmarkProduct" : "Build-20190402_03",
            "metrics" : [ 
                {
                    "name" : "Footprint in kb",
                    "value" : {
                        "mean" : 168062.5,
                        "max" : 169748,
                        "min" : 165008,
                        "median" : 168346,
                        "stddev" : 1564.4682346225,
                        "CI" : 0.0077836281768634,
                        "iteration" : 8
                    }
                }, 
                {
                    "name" : "Startup time in ms",
                    "value" : {
                        "mean" : 4322.25,
                        "max" : 4606,
                        "min" : 4112,
                        "median" : 4255.5,
                        "stddev" : 188.005888965517,
                        "CI" : 0.0363703702021615,
                        "iteration" : 8
                    }
                }
            ]
        }
    ],
    "artifactory" : null,
    "buildDuration" : 1087621,
    "buildResult" : "SUCCESS",
    "buildUrl" : "<JenkinsURL>/job/PerfNext-Child/27/",
    "hasChildren" : false,
    "machine" : "kermit",
    "parserType" : "BenchmarkParser",
    "startBy" : "upstream project \"PerfNext-Master\" build number 15",
    "tests" : [ 
        {
            "_id" : ObjectId("5ccb348c70da0cc141677686"),
            "testOutputId" : ObjectId("5ccb348c70da0cc141677685"),
            "testResult" : "PASSED",
            "testIndex" : 1,
            "benchmarkName" : "LibertyStartupDT",
            "benchmarkVariant" : "17dev-4way-0-256-qs",
            "benchmarkProduct" : "<JDKName>",
            "testData" : {
                "metrics" : [ 
                    {
                        "name" : "Footprint in kb",
                        "value" : [ 
                            167268, 
                            168804, 
                            169748, 
                            168884, 
                            167272, 
                            167888, 
                            169628, 
                            165008
                        ]
                    }, 
                    {
                        "name" : "Startup time in ms",
                        "value" : [ 
                            4207, 
                            4298, 
                            4112, 
                            4161, 
                            4575, 
                            4213, 
                            4606, 
                            4406
                        ]
                    }
                ],
                "javaVersion" : "<javaVersion>",
                "jdkBuildDateUnixTime" : 1554206400000.0
            }
        }
    ],
    "timestamp" : 1554322240238.0
}

llxia · 2019-05-03T17:47:32Z

How parentBuild aggregateInfo is calculated? Are parentBuild max, mean, median, stddev, etc calculated based on max, mean, median, stddev, etc in all childBuilds aggregateInfo?
For example,
- parentBuild max = max (all childBuilds max values in aggregateInfo)
- parentBuild mean = mean (all childBuilds mean values in aggregateInfo)
- parentBuild stddev = stddev (all childBuilds stddev values in aggregateInfo)
  ...
If this is the case, this will cause inaccurate results. For example,
- childBuild1 has 5 valid runs (childBuild1Mean = sum of 5 runs / 5)
- childBuild2 has 10 valid runs (childBuild2Mean = sum of 10 runs / 10)
- aggregate the "aggregated" => parentBuildMean = (childBuild1Mean + childBuild2Mean) /2
  This parentBuildMean is not the same as ( sum of 5 runs + sum of 10 runs) / (5 + 10)
We should add detail about how PerfCompare will look like
Should Perf Dashboard be updated to reflect this design?

piyush286 · 2019-05-03T19:06:25Z

@llxia It's useful to take all child with equal weights when we are interleaving. Just so 2 interleave builds for baseline and test have similar weight for each iteration since similar factors would be affecting the same iteration for both.

But you're right! It's more accurate to take weighted averages so that we divide the mean by the valid # of data points. @sophiaxu0424 Could you please update your changes? Thanks!

I'll create another issue for updating Dashboard & Perf Compare.

piyush286 · 2020-02-28T23:50:31Z

Closing this since all related work to this issue has been completed.

karianna added this to To do in aqa-test-tools via automation Mar 21, 2019

karianna added the enhancement New feature or request label Mar 21, 2019

This was referenced Apr 26, 2019

Aggregated information added #96

Closed

Add aggregate info to each node(newly updated) #103

Merged

sophiaxu0424 mentioned this issue May 3, 2019

Updated Perf Compare (UI) using aggregate information as data source #104

Merged

This was referenced May 3, 2019

Use Aggregated Perf Results in Perf Compare #105

Closed

Show Aggregated Perf Results in Perf Graph View #106

Open

piyush286 mentioned this issue Jun 18, 2019

Add Parsers & Perf Graph for AcmeAir #119

Closed

piyush286 closed this as completed Feb 28, 2020

aqa-test-tools automation moved this from To do to Done Feb 28, 2020

karianna added this to the February 2020 milestone Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregate Perf Results From Multiple Benchmark Iterations #73

Aggregate Perf Results From Multiple Benchmark Iterations #73

piyush286 commented Mar 21, 2019

piyush286 commented May 3, 2019 •

edited

Loading

piyush286 commented May 3, 2019

piyush286 commented May 3, 2019

llxia commented May 3, 2019 •

edited

Loading

piyush286 commented May 3, 2019

piyush286 commented Feb 28, 2020

Aggregate Perf Results From Multiple Benchmark Iterations #73

Aggregate Perf Results From Multiple Benchmark Iterations #73

Comments

piyush286 commented Mar 21, 2019

Problem Description

Proposed Changes

Assigned Contributors

piyush286 commented May 3, 2019 • edited Loading

Design for Aggregating Perf Data

Child Build

Parent Build

piyush286 commented May 3, 2019

Sample Parent Job

piyush286 commented May 3, 2019

Sample Child Job

llxia commented May 3, 2019 • edited Loading

piyush286 commented May 3, 2019

piyush286 commented Feb 28, 2020

piyush286 commented May 3, 2019 •

edited

Loading

llxia commented May 3, 2019 •

edited

Loading