Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate Perf Results From Multiple Benchmark Iterations #73

Closed
piyush286 opened this issue Mar 21, 2019 · 6 comments
Closed

Aggregate Perf Results From Multiple Benchmark Iterations #73

piyush286 opened this issue Mar 21, 2019 · 6 comments
Labels
enhancement New feature or request
Milestone

Comments

@piyush286
Copy link
Contributor

Problem Description

Currently, we don't aggregate numbers for multiple benchmark iterations when each Jenkins build is stored in the database. As a result, all results such as average, median and confidence interval need to be calculated when Perf Compare is used to compare 2 builds. This design is not preferred due to the following reasons:
1) It takes time to generate Perf Reports through Perf Compare.
2) Aggregated results are not stored so they would need to be generated every time they are needed even though they don't change.
3) It requires more CPU time and puts unnecessary pressure on the database.

These issues should be resolved with the proposed changes mentioned below. This would significantly improve the speed of getting results, which would be needed for different views such as Dashboard (#28) and Tabular View (#37).

Proposed Changes

  1. Move the math library from frontend to backend: https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/eb9c0d302759787e15afe96fe8d24b0e2b1f907c/test-result-summary-client/src/PerfCompare/lib/BenchmarkMath.js
  2. Generate all the aggregated results for a master build that may have single or multiple child jobs and store them in the parent object in testResults collection.
  • Additional data that needs to be added to Parent Object: benchmarkName, benchmarkVariant, benchmarkProduct, testData.
  • Additional data in testData for Parent Object: Aggregated numbers for all metrics: Mean, Median, Confidence Interval, Min, Max, StdDev
  • Note: For Liberty startup, there will only be 1 index in testData.metrics.[0].value for parent object.
  1. Instead of Perf Compare generating the perf numbers, it should just make a request to the backend to query the database and fetch the stored results.

Assigned Contributors

Sophia (@sophiaxu0424) from my team will work on this feature.

@karianna karianna added this to To do in aqa-test-tools via automation Mar 21, 2019
@karianna karianna added the enhancement New feature or request label Mar 21, 2019
@piyush286
Copy link
Contributor Author

piyush286 commented May 3, 2019

Design for Aggregating Perf Data

For perf, we'll have 2 kinds of builds: Parent and Child Builds. We'll be doing similar aggregating for both of them with minor differences.

Child Build

For each child build, we'll aggregate the data of all iterations run inside that Jenkins build. We aggregate only the good data and ignore the iterations for which values might be null.

Raw Data for Child:

            "testData" : {
                "metrics" : [ 
                    {
                        "name" : "Footprint in kb",
                        "value" : [ 
                            167268, 
                            168804, 
                            169748, 
                            168884, 
                            167272, 
                            167888, 
                            169628, 
                            165008
                        ]
                    }, 
                    {
                        "name" : "Startup time in ms",
                        "value" : [ 
                            4207, 
                            4298, 
                            4112, 
                            4161, 
                            4575, 
                            4213, 
                            4606, 
                            4406
                        ]
                    }
                ],
                "javaVersion" : "<javaVersion>",
                "jdkBuildDateUnixTime" : 1554206400000.0
            }

Aggregated Data Structure for Child

    "aggregateInfo" : [ 
        {
            "benchmarkName" : "LibertyStartupDT",
            "benchmarkVariant" : "17dev-4way-0-256-qs",
            "benchmarkProduct" : "Build-20190402_03",
            "metrics" : [ 
                {
                    "name" : "Footprint in kb",
                    "value" : {
                        "mean" : 168062.5,
                        "max" : 169748,
                        "min" : 165008,
                        "median" : 168346,
                        "stddev" : 1564.4682346225,
                        "CI" : 0.0077836281768634,
                        "iteration" : 8
                    }
                }, 
                {
                    "name" : "Startup time in ms",
                    "value" : {
                        "mean" : 4322.25,
                        "max" : 4606,
                        "min" : 4112,
                        "median" : 4255.5,
                        "stddev" : 188.005888965517,
                        "CI" : 0.0363703702021615,
                        "iteration" : 8
                    }
                }
            ]
        }
    ],

Parent Build

For each parent build, we'll aggregate the "raw" (and not "aggregated") data of all child builds that were launched by that parent build. Each child may carry different weight if they have different number of valid data points as compared to other child jobs. Hence, we'll use weighted average to get the most accurate results.

For each parent build, we'll aggregate the "aggregated" (and not "raw") data of all child builds that were launched by that parent build. Since all child jobs are identical configurations for perf testing, we weigh each child job equally even if one child job may have different # of good values to aggregate due to some unexpected failures, something that isn't very common. We need weigh child jobs equally due to thermal and other factors.

Raw Data for Parent
Doesn't keep any raw data.

Aggregated Data for Parent

    "aggregateInfo" : [ 
        {
            "benchmarkName" : "LibertyStartupDT",
            "benchmarkVariant" : "17dev-4way-0-256-qs",
            "benchmarkProduct" : "<JDKName>",
            "metrics" : [ 
                {
                    "name" : "Footprint in kb",
                    "value" : {
                        "mean" : 168801.25,
                        "max" : 169342.5,
                        "min" : 168260,
                        "median" : 168801.25,
                        "stddev" : 765.443090634438,
                        "CI" : 0.0407409453425256,
                        "iteration" : 2
                    }
                }, 
                {
                    "name" : "Startup time in ms",
                    "value" : {
                        "mean" : 4296.3125,
                        "max" : 4305.5,
                        "min" : 4287.125,
                        "median" : 4296.3125,
                        "stddev" : 12.9930871043028,
                        "CI" : 0.0271712951513653,
                        "iteration" : 2
                    }
                }
            ]
        }
    ]

@piyush286
Copy link
Contributor Author

Sample Parent Job

{
    "_id" : ObjectId("5ccb3473865c57c13bbff657"),
    "url" : "<JenkinsURL>",
    "buildName" : "PerfNext-Parent",
    "buildNum" : 16,
    "buildDuration" : 725926,
    "buildResult" : "SUCCESS",
    "timestamp" : 1554733203557.0,
    "type" : "Perf",
    "status" : "Done",
    "artifactory" : null,
    "buildData" : {},
    "buildOutputId" : ObjectId("5ccb347843ed61c13e693338"),
    "buildUrl" : "<JenkinsURL>/job/PerfNext-Parent/16/",
    "hasChildren" : true,
    "machine" : null,
    "parserType" : "ParentBuild",
    "startBy" : "user <email>",
    "aggregateInfo" : [ 
        {
            "benchmarkName" : "LibertyStartupDT",
            "benchmarkVariant" : "17dev-4way-0-256-qs",
            "benchmarkProduct" : "<JDKName>",
            "metrics" : [ 
                {
                    "name" : "Footprint in kb",
                    "value" : {
                        "mean" : 168801.25,
                        "max" : 169342.5,
                        "min" : 168260,
                        "median" : 168801.25,
                        "stddev" : 765.443090634438,
                        "CI" : 0.0407409453425256,
                        "iteration" : 2
                    }
                }, 
                {
                    "name" : "Startup time in ms",
                    "value" : {
                        "mean" : 4296.3125,
                        "max" : 4305.5,
                        "min" : 4287.125,
                        "median" : 4296.3125,
                        "stddev" : 12.9930871043028,
                        "CI" : 0.0271712951513653,
                        "iteration" : 2
                    }
                }
            ]
        }
    ]
}

@piyush286
Copy link
Contributor Author

Sample Child Job

{
    "_id" : ObjectId("5ccb347943ed61c13e69333a"),
    "url" : "<JenkinsURL>",
    "buildName" : "PerfNext-Child",
    "buildNameStr" : "PerfNext-Child",
    "buildNum" : 27,
    "parentId" : ObjectId("5ccb3473865c57c13bbff658"),
    "type" : "Perf",
    "status" : "Done",
    "aggregateInfo" : [ 
        {
            "benchmarkName" : "LibertyStartupDT",
            "benchmarkVariant" : "17dev-4way-0-256-qs",
            "benchmarkProduct" : "Build-20190402_03",
            "metrics" : [ 
                {
                    "name" : "Footprint in kb",
                    "value" : {
                        "mean" : 168062.5,
                        "max" : 169748,
                        "min" : 165008,
                        "median" : 168346,
                        "stddev" : 1564.4682346225,
                        "CI" : 0.0077836281768634,
                        "iteration" : 8
                    }
                }, 
                {
                    "name" : "Startup time in ms",
                    "value" : {
                        "mean" : 4322.25,
                        "max" : 4606,
                        "min" : 4112,
                        "median" : 4255.5,
                        "stddev" : 188.005888965517,
                        "CI" : 0.0363703702021615,
                        "iteration" : 8
                    }
                }
            ]
        }
    ],
    "artifactory" : null,
    "buildDuration" : 1087621,
    "buildResult" : "SUCCESS",
    "buildUrl" : "<JenkinsURL>/job/PerfNext-Child/27/",
    "hasChildren" : false,
    "machine" : "kermit",
    "parserType" : "BenchmarkParser",
    "startBy" : "upstream project \"PerfNext-Master\" build number 15",
    "tests" : [ 
        {
            "_id" : ObjectId("5ccb348c70da0cc141677686"),
            "testOutputId" : ObjectId("5ccb348c70da0cc141677685"),
            "testResult" : "PASSED",
            "testIndex" : 1,
            "benchmarkName" : "LibertyStartupDT",
            "benchmarkVariant" : "17dev-4way-0-256-qs",
            "benchmarkProduct" : "<JDKName>",
            "testData" : {
                "metrics" : [ 
                    {
                        "name" : "Footprint in kb",
                        "value" : [ 
                            167268, 
                            168804, 
                            169748, 
                            168884, 
                            167272, 
                            167888, 
                            169628, 
                            165008
                        ]
                    }, 
                    {
                        "name" : "Startup time in ms",
                        "value" : [ 
                            4207, 
                            4298, 
                            4112, 
                            4161, 
                            4575, 
                            4213, 
                            4606, 
                            4406
                        ]
                    }
                ],
                "javaVersion" : "<javaVersion>",
                "jdkBuildDateUnixTime" : 1554206400000.0
            }
        }
    ],
    "timestamp" : 1554322240238.0
}

@llxia
Copy link
Contributor

llxia commented May 3, 2019

  • How parentBuild aggregateInfo is calculated? Are parentBuild max, mean, median, stddev, etc calculated based on max, mean, median, stddev, etc in all childBuilds aggregateInfo?
    For example,

    • parentBuild max = max (all childBuilds max values in aggregateInfo)
    • parentBuild mean = mean (all childBuilds mean values in aggregateInfo)
    • parentBuild stddev = stddev (all childBuilds stddev values in aggregateInfo)
      ...

    If this is the case, this will cause inaccurate results. For example,

    • childBuild1 has 5 valid runs (childBuild1Mean = sum of 5 runs / 5)
    • childBuild2 has 10 valid runs (childBuild2Mean = sum of 10 runs / 10)
    • aggregate the "aggregated" => parentBuildMean = (childBuild1Mean + childBuild2Mean) /2
      This parentBuildMean is not the same as ( sum of 5 runs + sum of 10 runs) / (5 + 10)
  • We should add detail about how PerfCompare will look like

  • Should Perf Dashboard be updated to reflect this design?

@piyush286
Copy link
Contributor Author

@llxia It's useful to take all child with equal weights when we are interleaving. Just so 2 interleave builds for baseline and test have similar weight for each iteration since similar factors would be affecting the same iteration for both.

But you're right! It's more accurate to take weighted averages so that we divide the mean by the valid # of data points. @sophiaxu0424 Could you please update your changes? Thanks!

I'll create another issue for updating Dashboard & Perf Compare.

@piyush286
Copy link
Contributor Author

Closing this since all related work to this issue has been completed.

aqa-test-tools automation moved this from To do to Done Feb 28, 2020
@karianna karianna added this to the February 2020 milestone Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

3 participants