Update package versions#149
Conversation
| - pip | ||
| - coiled | ||
| - nodejs ==17.8.0 | ||
| - nodejs ==17.9.0 |
There was a problem hiding this comment.
@ian-r-rose with jupyterlab extensions no longer needing to have nodejs installed locally, is there any reason a Dask users might have to have this package included?
There was a problem hiding this comment.
@ian-r-rose pinging you again here, do you have an answer/comment to James' question?
There was a problem hiding this comment.
Sorry to miss this! We shouldn't need nodejs anymore.
There was a problem hiding this comment.
@jrbourbeau then I think we can remove it.
| - nodejs ==17.9.0 |
|
This is more of a general question but I wonder how do we decide to bump versions of packages that are not dask or distirbuted? Should we be testing against the latest release or upstream of every package and if things look good, we will update them on the next coiled-runtime release? |
| os: ["ubuntu-latest"] | ||
| python-version: ["3.9"] | ||
| runtime-version: ["latest", "0.0.3"] | ||
| run: [1, 2, 3, 4, 5] |
There was a problem hiding this comment.
I think this might not be doing what we think it's doing.
I'm seeing something very weird happening with the clusters generated by this CI https://github.com/coiled/coiled-runtime/runs/6715920312?check_suite_focus=true
We have a fixture that creates the cluster and it has only 10 workers, but I'm seeing in the dashboard that the number requested keeps increasing while the assigned ones keep oscillating. These are the details page
https://cloud.coiled.io/dask-engineering/clusters/31592/details
https://cloud.coiled.io/dask-engineering/clusters/31572/details
|
Note that the |
| worker_memory="8 GiB", | ||
| worker_vm_types=["m5.large"], | ||
| scheduler_vm_types=["m5.large"], | ||
| scheduler_options={"idle_timeout": "1 hour"}, |
There was a problem hiding this comment.
I'm noticing we don't have the scheduler timeout. Do we need/want it?. I believe this change is part of solving a conflict with main but taking the opportunity to think about whether we need this or not.
I believe we don't as the clusters, in this case, won't be idle.
There was a problem hiding this comment.
That's correct. I bumped the idle timeout up to an hour as a workaround for dask/distributed#6494, but as that issue has been closed in main now, I think we can revert back to using the default 20 minute idle timeout
|
Btw, the latest commit here confirmed that @gjoseph92's fix for dask/distributed#6494 gets CI passing here (when running against |
|
@jrbourbeau Looks like we are running into some AWS cpu limits problem. @ntabris Do you know what could be happening here? https://github.com/coiled/coiled-runtime/runs/6870670549?check_suite_focus=true#step:6:575 E coiled.errors.ClusterCreationError: Cluster status is error (reason: Scheduler Stopped -> Instance failed: AWS failed to create requested instance.
E VcpuLimitExceeded - You have requested more vCPU capacity than your current vCPU limit of 1144 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.) (cluster_id: 34203)
/usr/share/miniconda3/envs/test/lib/python3.9/site-packages/coiled/_beta/cluster.py:370: ClusterCreationError |
Lots of instances/clusters were running in the oss account. They're all stopped now but I think there were about 84 clusters recently running, here are some... dask-engineering-parquet-c2689dda-scheduler Is that unexpected? |
|
Ah, I see. That's because I temporarily increased the number of jobs we were running on this PR by a factor of 5 to see if we could trigger a flaky test that showed up (xref #166). Since then we've been able to confirm the test is flaky, so I've just revert the extra stress testing here. |
|
All green, so going to merge this in |


Updates packages to their latest release
xref #139
Closes #171