Would it be possible to add a two-sample t-test to tell if 2 samples are statistically different ?
For example, consider the case where you have two long running branches: master and next which regularly run the same bench like https://bencher.dev/perf/git?lower_value=false&upper_value=false&lower_boundary=false&upper_boundary=true&x_axis=date_time&branches=595859eb-071c-48e9-97cf-195e0a3d6ed1%2C37f822b4-6514-4709-988b-b5fd9e713872&testbeds=02dcb8ad-6873-494c-aabc-9a6237601308&benchmarks=5e5c6ae1-ec8e-4c25-b27d-dcf773d33a51&measures=63dafffb-98c4-4c27-ba43-7112cae627fc&tab=branches&plots_search=0d7f6186-f80a-4fbe-9022-75b6caf5164e&key=true&reports_per_page=4&branches_per_page=8&testbeds_per_page=8&benchmarks_per_page=8&plots_per_page=8&reports_page=1&branches_page=1&testbeds_page=1&benchmarks_page=1&plots_page=1&end_time=1745971200000&start_time=1740787200000&utm_medium=share&utm_source=bencher&utm_content=img&utm_campaign=perf%2Bimg&utm_term=git&clear=true&heads=%2C5e4f5a49-5dfa-4aa0-bd0f-d40de2fc7004
I think here a question a dev can ask themself is whether their feature branch is introducing a performance regression compared to the main one or not.
If I understood them correctly, the current tests performs prediction intervals tests, answering the question whether or not it is reasonable to accept the new point as part of a distribution the previous points are part of.
If you are using start_point, it only extends the population of your previous point and you get your comparison next population versus the master population only for the first point.
Instead I would like all the points produce by the next branch compared with all the point produced by the master one.
Start_point is a nice criteria to tell which points should be flagged as part of the master population and which points should be flagged as part of the next population.
Would it be possible to add a two-sample t-test to tell if 2 samples are statistically different ?
For example, consider the case where you have two long running branches: master and next which regularly run the same bench like https://bencher.dev/perf/git?lower_value=false&upper_value=false&lower_boundary=false&upper_boundary=true&x_axis=date_time&branches=595859eb-071c-48e9-97cf-195e0a3d6ed1%2C37f822b4-6514-4709-988b-b5fd9e713872&testbeds=02dcb8ad-6873-494c-aabc-9a6237601308&benchmarks=5e5c6ae1-ec8e-4c25-b27d-dcf773d33a51&measures=63dafffb-98c4-4c27-ba43-7112cae627fc&tab=branches&plots_search=0d7f6186-f80a-4fbe-9022-75b6caf5164e&key=true&reports_per_page=4&branches_per_page=8&testbeds_per_page=8&benchmarks_per_page=8&plots_per_page=8&reports_page=1&branches_page=1&testbeds_page=1&benchmarks_page=1&plots_page=1&end_time=1745971200000&start_time=1740787200000&utm_medium=share&utm_source=bencher&utm_content=img&utm_campaign=perf%2Bimg&utm_term=git&clear=true&heads=%2C5e4f5a49-5dfa-4aa0-bd0f-d40de2fc7004
I think here a question a dev can ask themself is whether their feature branch is introducing a performance regression compared to the main one or not.
If I understood them correctly, the current tests performs prediction intervals tests, answering the question whether or not it is reasonable to accept the new point as part of a distribution the previous points are part of.
If you are using start_point, it only extends the population of your previous point and you get your comparison next population versus the master population only for the first point.
Instead I would like all the points produce by the next branch compared with all the point produced by the master one.
Start_point is a nice criteria to tell which points should be flagged as part of the master population and which points should be flagged as part of the next population.