Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpecCluster implmentation closes the scheduler unexpectedly #7025

Open
BadAsstronaut opened this issue Sep 8, 2022 · 1 comment
Open

SpecCluster implmentation closes the scheduler unexpectedly #7025

BadAsstronaut opened this issue Sep 8, 2022 · 1 comment
Labels
stability Issue or feature related to cluster stability (e.g. deadlock)

Comments

@BadAsstronaut
Copy link

See dask/dask-cloudprovider#375

What happened:
I want to have a scheduler that stays alive and adaptive workers scale-out as needed. This is working well, except when an adaptive scaling process needs to close. Calling cluster.close() results in the remote scheduler getting terminated via a terminate command via the scheduler_comm. See referenced issue for details.

What you expected to happen:
A 'worker cluster' should be able to close independently of the scheduler cluster.

Minimal Complete Verifiable Example:

def main():
    async def run():
        logger.info('-' * 47)
        cluster = FargateCluster(asynchronous=True,
                                 image=image,
                                 fargate_spot=True,
                                 scheduler_address=scheduler_address,
                                 cluster_arn=cluster_arn,
                                 execution_role_arn=execution_role_arn,
                                 task_role_arn=task_role_arn,
                                 cloudwatch_logs_group=cloudwatch_logs_group,
                                 vpc=vpc,
                                 subnets=subnets,
                                 security_groups=security_groups,
                                 fargate_use_private_ip=True)
        cluster.adapt(minimum=1, maximum=5)
        await cluster._start()
        logger.info('-' * 47)
        await asyncio.sleep(120)
        await cluster.close()
    try:
        asyncio.run(run())
    finally:
        logger.info('Adaptive scaler has ended')
@BadAsstronaut
Copy link
Author

@jacobtomlinson Opened this issue per your suggestion. Thanks for your help.

@phobson phobson added the stability Issue or feature related to cluster stability (e.g. deadlock) label Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stability Issue or feature related to cluster stability (e.g. deadlock)
Projects
None yet
Development

No branches or pull requests

2 participants