Skip to content

Update package versions#149

Merged
jrbourbeau merged 13 commits intomainfrom
update-packages
Jun 16, 2022
Merged

Update package versions#149
jrbourbeau merged 13 commits intomainfrom
update-packages

Conversation

@jrbourbeau
Copy link
Copy Markdown
Contributor

@jrbourbeau jrbourbeau commented May 27, 2022

Updates packages to their latest release

xref #139

Closes #171

Comment thread recipe/meta.yaml Outdated
- pip
- coiled
- nodejs ==17.8.0
- nodejs ==17.9.0
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ian-r-rose with jupyterlab extensions no longer needing to have nodejs installed locally, is there any reason a Dask users might have to have this package included?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ian-r-rose pinging you again here, do you have an answer/comment to James' question?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to miss this! We shouldn't need nodejs anymore.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrbourbeau then I think we can remove it.

Suggested change
- nodejs ==17.9.0

@ncclementi
Copy link
Copy Markdown
Contributor

This is more of a general question but I wonder how do we decide to bump versions of packages that are not dask or distirbuted?

Should we be testing against the latest release or upstream of every package and if things look good, we will update them on the next coiled-runtime release?

Comment thread .github/workflows/tests.yml Outdated
os: ["ubuntu-latest"]
python-version: ["3.9"]
runtime-version: ["latest", "0.0.3"]
run: [1, 2, 3, 4, 5]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might not be doing what we think it's doing.

I'm seeing something very weird happening with the clusters generated by this CI https://github.com/coiled/coiled-runtime/runs/6715920312?check_suite_focus=true

We have a fixture that creates the cluster and it has only 10 workers, but I'm seeing in the dashboard that the number requested keeps increasing while the assigned ones keep oscillating. These are the details page

https://cloud.coiled.io/dask-engineering/clusters/31592/details
https://cloud.coiled.io/dask-engineering/clusters/31572/details

Screen Shot 2022-06-02 at 5 24 21 PM 2

I keep seeing the workers going up and down
Screen Shot 2022-06-02 at 5 41 30 PM

@jrbourbeau jrbourbeau changed the title Update package versions [DNM] Update package versions Jun 3, 2022
@jrbourbeau
Copy link
Copy Markdown
Contributor Author

Note that the latest build here has become flaky, in part, due to dask/distributed#6494. I'm off tomorrow, but plan to work on this on Monday

Comment thread conftest.py Outdated
worker_memory="8 GiB",
worker_vm_types=["m5.large"],
scheduler_vm_types=["m5.large"],
scheduler_options={"idle_timeout": "1 hour"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm noticing we don't have the scheduler timeout. Do we need/want it?. I believe this change is part of solving a conflict with main but taking the opportunity to think about whether we need this or not.

I believe we don't as the clusters, in this case, won't be idle.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I bumped the idle timeout up to an hour as a workaround for dask/distributed#6494, but as that issue has been closed in main now, I think we can revert back to using the default 20 minute idle timeout

@jrbourbeau
Copy link
Copy Markdown
Contributor Author

Btw, the latest commit here confirmed that @gjoseph92's fix for dask/distributed#6494 gets CI passing here (when running against distributeds main branch). Thanks @gjoseph92!

@ncclementi
Copy link
Copy Markdown
Contributor

@jrbourbeau Looks like we are running into some AWS cpu limits problem. @ntabris Do you know what could be happening here?

https://github.com/coiled/coiled-runtime/runs/6870670549?check_suite_focus=true#step:6:575

E               coiled.errors.ClusterCreationError: Cluster status is error (reason: Scheduler Stopped -> Instance failed: AWS failed to create requested instance.
E               VcpuLimitExceeded - You have requested more vCPU capacity than your current vCPU limit of 1144 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.) (cluster_id: 34203)
/usr/share/miniconda3/envs/test/lib/python3.9/site-packages/coiled/_beta/cluster.py:370: ClusterCreationError

@ntabris
Copy link
Copy Markdown
Member

ntabris commented Jun 13, 2022

Do you know what could be happening here?

Lots of instances/clusters were running in the oss account.

They're all stopped now but I think there were about 84 clusters recently running, here are some...

dask-engineering-parquet-c2689dda-scheduler
dask-engineering-test_shuffle-e2ea8225-scheduler
dask-engineering-test_h2o_benchmarks-1687c9ef-scheduler
dask-engineering-test_shuffle-7a5c2331-scheduler
dask-engineering-test_shuffle-8d1995bf-scheduler
dask-engineering-test_array-bc24f403-scheduler
dask-engineering-test_array-ff59d0be-scheduler
dask-engineering-test_shuffle-582c65d4-scheduler
dask-engineering-test_h2o_benchmarks-f26b8d61-scheduler
dask-engineering-test_shuffle-d61eb058-scheduler
dask-engineering-test_array-2a49e35b-scheduler
dask-engineering-test_array-0e24864a-scheduler
dask-engineering-test_deadlock-d375df44dd6742fe9a3b14c478b38e17-scheduler
dask-engineering-parquet-d651edc5-scheduler
dask-engineering-test_shuffle-07dac254-scheduler
dask-engineering-test_shuffle-8b03547c-scheduler
dask-engineering-test_array-44d1a078-scheduler
dask-engineering-test_deadlock-6c6c29e53fa749109483a756b29fabb5-scheduler
dask-engineering-test_shuffle-9d595e67-scheduler
dask-engineering-test_deadlock-09f2017187e84e7d80bbf4f59d3b792f-scheduler
dask-engineering-test_deadlock-586d7d9fed864feda66937aef1d75861-scheduler
dask-engineering-test_array-3ca64e86-scheduler
dask-engineering-test_default_cluster_spinup_time-1de549e7-scheduler
dask-engineering-test_array-e98803cf-scheduler
dask-engineering-parquet-3334d672-scheduler
dask-engineering-test_deadlock-bb913227c4624d5b89901b28d0c67fca-scheduler
dask-engineering-test_array-db170c4a-scheduler
dask-engineering-test_array-a8c31253-scheduler
dask-engineering-test_shuffle-e06fb1f6-scheduler
dask-engineering-parquet-ed04e657-scheduler
dask-engineering-test_shuffle-34426a0f-scheduler
dask-engineering-test_shuffle-cc531f4d-scheduler
dask-engineering-test_shuffle-3c62acb6-scheduler
dask-engineering-test_array-5a5972ae-scheduler
dask-engineering-test_deadlock-90eafcc0a3c04785b83f59588fe2300a-scheduler
dask-engineering-test_array-dfefe888-scheduler
dask-engineering-test_h2o_benchmarks-b25f2698-scheduler
dask-engineering-parquet-4a499e8e-scheduler
dask-engineering-test_deadlock-c50f38b5b0f74efca26fb22c843e7e00-scheduler
dask-engineering-test_array-84e58dbc-scheduler
dask-engineering-test_h2o_benchmarks-9f2af024-scheduler
dask-engineering-test_default_cluster_spinup_time-58ac6220-scheduler
dask-engineering-test_shuffle-46795aad-scheduler
dask-engineering-test_array-f6805845-scheduler
dask-engineering-test_shuffle-a19b1eaa-scheduler
dask-engineering-test_coiled-f2042517-scheduler
dask-engineering-test_default_cluster_spinup_time-2a4539c8-scheduler
dask-engineering-test_array-04b63da8-scheduler
dask-engineering-test_array-33d711ef-scheduler
dask-engineering-test_coiled-cf592eb0-scheduler

Is that unexpected?

@jrbourbeau
Copy link
Copy Markdown
Contributor Author

Ah, I see. That's because I temporarily increased the number of jobs we were running on this PR by a factor of 5 to see if we could trigger a flaky test that showed up (xref #166). Since then we've been able to confirm the test is flaky, so I've just revert the extra stress testing here.

@jrbourbeau jrbourbeau changed the title [DNM] Update package versions Update package versions Jun 16, 2022
@jrbourbeau
Copy link
Copy Markdown
Contributor Author

All green, so going to merge this in

@jrbourbeau jrbourbeau merged commit b36cd03 into main Jun 16, 2022
@jrbourbeau jrbourbeau deleted the update-packages branch June 16, 2022 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Block coiled-runtime until verification of dask patch

4 participants