Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ CI failed ⚠️ #271

Closed
github-actions bot opened this issue Aug 20, 2022 · 11 comments
Closed

⚠️ CI failed ⚠️ #271

github-actions bot opened this issue Aug 20, 2022 · 11 comments
Labels
ci-failure h2o-benchmark work related to h2o benchmark regression

Comments

@github-actions
Copy link

Workflow Run URL

@ncclementi
Copy link
Contributor

The two regressions detected are:
test_download_throughput[s3fs] in0.0.4-py3.9 shows this bump, but it seems that it went down afterward.
Screen Shot 2022-08-22 at 12 08 16 PM

test_q3 5GB (parquet) (red line) in upstream-py3.9 this one seems like a legit regression. look at the increase in duration
Screen Shot 2022-08-22 at 12 12 14 PM

@ian-r-rose Do you think this latest one is related to dask/dask#9397 (comment) ?

@ian-r-rose
Copy link
Contributor

Interesting, it doesn't look to me like the timing lines up (that fix was merged on Aug 19). I'm not sure, but it's worth investigating

@fjetter
Copy link
Member

fjetter commented Aug 23, 2022

Timing wise, all dask/dask changes that might be relevant

git log --since='2022-08-15 14:15' --until='2022-08-18 14:15' --pretty=oneline
097decd5f4914d7cd85b3babd6ecdd20a633bf8d Shuffle-based groupby aggregation for high-cardinality groups (#9302)
c1bc90aa5149d8f515aadd8bcf11712e9ce7cdee Use `entry_points` utility in `sizeof` (#9390)
a9ee6c2fdf0a3093747e675997143e0dbe584bad Add `entry_points` compatibility utility (#9388)
feb40fdc4a20ddcdde3ea07273dd44d89ba7c59b Clarify that ``bind()`` etc. regenerate the keys (#9385)
0a7b58b134ba38e00c97f715030fe15eb84d0bb2 Implement ma.*_like functions (#9378)
b894f72d740425c252e7896089ce328bc56a71fc Upload environment file artifact for each CI build (#9372)
1fcfb7c93e83908eee4449b473f6e325b79e1f1a Fix `SeriesGroupBy` cumulative functions with `axis=1` (#9377)

all dask/distributed changes that may be relevant

61fca1cff93acbeaebbfd5b5935556e6424b8d6d Clean up `cluster` process reaping (#6840)
7768f6c4ea12a5f0af908bb41ed197eba1f87b17 Don't use `bokeh` `Figure` in tests (#6721)
1d0701b51c1dc354fb22af9b27b8f722f02daf63 Only close scheduler in SpecCluster if it exists (#6888)
3647cfefd004281dac0ffa92349e23bc2a94d68a Work around incompatibility of crick with setuptools 65 (#6887)

Nothing suspicious jumps out.

Of course, this might also be a coiled related change.

cc @ntabris @shughes-uk is there a way for us to see when there were coiled deployments?

@FabioRosado
Copy link

Timing wise, all dask/dask changes that might be relevant

git log --since='2022-08-15 14:15' --until='2022-08-18 14:15' --pretty=oneline
097decd5f4914d7cd85b3babd6ecdd20a633bf8d Shuffle-based groupby aggregation for high-cardinality groups (#9302)
c1bc90aa5149d8f515aadd8bcf11712e9ce7cdee Use `entry_points` utility in `sizeof` (#9390)
a9ee6c2fdf0a3093747e675997143e0dbe584bad Add `entry_points` compatibility utility (#9388)
feb40fdc4a20ddcdde3ea07273dd44d89ba7c59b Clarify that ``bind()`` etc. regenerate the keys (#9385)
0a7b58b134ba38e00c97f715030fe15eb84d0bb2 Implement ma.*_like functions (#9378)
b894f72d740425c252e7896089ce328bc56a71fc Upload environment file artifact for each CI build (#9372)
1fcfb7c93e83908eee4449b473f6e325b79e1f1a Fix `SeriesGroupBy` cumulative functions with `axis=1` (#9377)

all dask/distributed changes that may be relevant

61fca1cff93acbeaebbfd5b5935556e6424b8d6d Clean up `cluster` process reaping (#6840)
7768f6c4ea12a5f0af908bb41ed197eba1f87b17 Don't use `bokeh` `Figure` in tests (#6721)
1d0701b51c1dc354fb22af9b27b8f722f02daf63 Only close scheduler in SpecCluster if it exists (#6888)
3647cfefd004281dac0ffa92349e23bc2a94d68a Work around incompatibility of crick with setuptools 65 (#6887)

Nothing suspicious jumps out.

Of course, this might also be a coiled related change.

cc @ntabris @shughes-uk is there a way for us to see when there were coiled deployments?

Do you have access to datadog? If so you can look at the APM services and check the deployments, here's a link that shows the month's deployments

This will show the last deployment and the version contains the timestamp:
Screenshot 2022-08-23 at 13 10 58

@ntabris
Copy link
Member

ntabris commented Aug 23, 2022

The only recent platform change that I'd expect to possibly impact perf was setting the cgroup for memory on the container. This means the machine won't freeze from dask using too much memory, but it also means that dask potentially won't use as much memory (e.g., it will restart worker before hitting the ceiling that it was hitting before).

This change went to prod on evening of August 16.

@fjetter
Copy link
Member

fjetter commented Aug 24, 2022

The only recent platform change that I'd expect to possibly impact perf was setting the cgroup for memory on the container. This means the machine won't freeze from dask using too much memory, but it also means that dask potentially won't use as much memory (e.g., it will restart worker before hitting the ceiling that it was hitting before).

This change went to prod on evening of August 16.

I guess this is it then. This change not only protects from OOM but also changes a few internal memory monitoring mechanisms. For instance, if these workloads are operating at high memory pressure, chances are that we're now spilling more data making the entire thing a bit slower

Edit: I ran the workload myself and strongly doubt that this is related to the changes to cgroups/memory limits. This workload is far away from any limiting/spilling. This is something else.

If I drive the workload myself it also finishes about 25% faster than if it's running in the benchmark suite. We've seen some of these systematic problems in other tests as well where we suspect that the scheduler is accumulating some state that slows everything down in turn, e.g. #253

Now I'm wondering what kind of additional tests we started to run during that time period. Indeed, we started to run much more workloads during that time

82d6b21 (main) Integration tests for spilling (#229)
eeb2b65 Benchmarks for some common workloads (#243)

cc @gjoseph92

@fjetter
Copy link
Member

fjetter commented Aug 24, 2022

Didn't have a chance to test this but just skimming our code base, this change could help dask/distributed#6944

@ncclementi
Copy link
Contributor

ncclementi commented Aug 24, 2022

Notice that the regression is in the durations but not as much in the memory. @ian-r-rose and I think that since this query is a groupby there could be a change that this PR
097decd5f4914d7cd85b3babd6ecdd20a633bf8d Shuffle-based groupby aggregation for high-cardinality groups (#9302) , even though is turned off by default could have caused something. I haven't taken a deeper look at this yet but I'm planning on opening an issue on dask/dask and ping Rick (author of the PR) to see if he has any input.

@ian-r-rose
Copy link
Contributor

@ncclementi you might be interested in reviewing/testing #269 in this context :)

@ncclementi
Copy link
Contributor

After doing a git bisect it looks like the regression comes from the Shuffle PR see dask/dask#9428 for reference. I pinged Rick since he wrote the PR to take a look at it and get some feedback.

@ncclementi ncclementi added h2o-benchmark work related to h2o benchmark regression labels Aug 25, 2022
@ncclementi
Copy link
Contributor

Closing this as the regression was exposed after fixing a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure h2o-benchmark work related to h2o benchmark regression
Projects
None yet
Development

No branches or pull requests

5 participants