"Rate exceeded" error on FargateCluster #56

rsignell-usgs · 2020-01-26T23:12:06Z

from dask_cloudprovider import FargateCluster
cluster = FargateCluster(n_workers=1, image='rsignell/pangeo-worker:2020-01-23c')
client = Client(cluster)

worked on my SageMaker instance on uswest-2, but failed my useast-1 instance with:

Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /home/ec2-user/SageMaker/myenvs/pangeo/lib/python3.6/asyncio/tasks.py:530> exception=ClientError('An error occurred (ThrottlingException) when calling the DescribeTasks operation (reached max retries: 4): Rate exceeded',)>
Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/myenvs/pangeo/lib/python3.6/asyncio/tasks.py", line 537, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/home/ec2-user/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 130, in _
    await self.start()
  File "/home/ec2-user/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 240, in start
    await self._update_task()
  File "/home/ec2-user/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 158, in _update_task
    cluster=self.cluster_arn, tasks=[self.task_arn]
  File "/home/ec2-user/SageMaker/myenvs/pangeo/lib/python3.6/site-packages/aiobotocore/client.py", line 102, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DescribeTasks operation (reached max retries: 4): Rate exceeded

It looks like the PR #44 was designed to address at these problems, but it seems I'm still having them despite running v0.1.1 which includes that PR. I'm wondering whether others are still experiencing this?

The text was updated successfully, but these errors were encountered:

jacobtomlinson · 2020-01-27T10:17:07Z

Thanks for raising this @rsignell-usgs. Sounds like we aren't honouring the throttling backoffs well enough. We should be catching this exception and trying again with an exponential backoff.

#44 applied this to log retrieval but we also should do something similar in other places.

rpanai · 2020-08-14T03:12:54Z

I'm having the same issue.

lukeorland · 2020-08-14T13:16:27Z

I'm having a similar issue, except for me it's botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DeregisterTaskDefinition operation (reached max retries: 4): Rate exceeded

lukeorland · 2020-08-14T13:52:12Z

Would something like this be helpful?

diff --git a/dask_cloudprovider/providers/aws/ecs.py b/dask_cloudprovider/providers/aws/ecs.py
index d178b3a..f115d08 100644
--- a/dask_cloudprovider/providers/aws/ecs.py
+++ b/dask_cloudprovider/providers/aws/ecs.py
@@ -28,6 +28,30 @@ DEFAULT_TAGS = {
 }  # Package tags to apply to all resources
 
 
+MAX_THROTTLING_TRIES = 10  # arbitrary...
+
+
+async def retry_when_throttled(
+    func, *args, max_tries=MAX_THROTTLING_TRIES, **kwargs,
+):
+    current_try = 0
+
+    while current_try < max_tries:
+        try:
+            return await func(*args, **kwargs)
+        except ClientError as e:
+            if e.response["Error"]["Code"] == "ThrottlingException":
+                warnings.warn(
+                    "get_log_events rate limit exceeded, retrying after delay.",
+                    RuntimeWarning,
+                )
+                backoff_duration = get_sleep_duration(current_try)
+                await asyncio.sleep(backoff_duration)
+                current_try += 1
+            else:
+                raise
+
+
 class Task:
     """ A superclass for managing ECS Tasks
     Parameters
@@ -296,7 +320,6 @@ class Task:
         )
 
     async def logs(self, follow=False):
-        current_try = 0
         next_token = None
         read_from = 0
 
@@ -304,13 +327,15 @@ class Task:
             try:
                 async with self._client("logs") as logs:
                     if next_token:
-                        l = await logs.get_log_events(
+                        l = await retry_when_throttled(
+                            logs.get_log_events,
                             logGroupName=self.log_group,
                             logStreamName=self._log_stream_name,
                             nextToken=next_token,
                         )
                     else:
-                        l = await logs.get_log_events(
+                        l = await retry_when_throttled(
+                            logs.get_log_events,
                             logGroupName=self.log_group,
                             logStreamName=self._log_stream_name,
                             startTime=read_from,
@@ -327,18 +352,6 @@ class Task:
                 for event in l["events"]:
                     read_from = event["timestamp"]
                     yield event["message"]
-            except ClientError as e:
-                if e.response["Error"]["Code"] == "ThrottlingException":
-                    warnings.warn(
-                        "get_log_events rate limit exceeded, retrying after delay.",
-                        RuntimeWarning,
-                    )
-                    backoff_duration = get_sleep_duration(current_try)
-                    await asyncio.sleep(backoff_duration)
-                    current_try += 1
-                else:
-                    raise
-
     def __repr__(self):
         return "<ECS Task %s: status=%s>" % (type(self).__name__, self.status)

lukeorland · 2020-08-14T13:55:57Z

This version is meant to honor max_tries:

diff --git a/dask_cloudprovider/providers/aws/ecs.py b/dask_cloudprovider/providers/aws/ecs.py
index d178b3a..503fbcf 100644
--- a/dask_cloudprovider/providers/aws/ecs.py
+++ b/dask_cloudprovider/providers/aws/ecs.py
@@ -28,6 +28,32 @@ DEFAULT_TAGS = {
 }  # Package tags to apply to all resources
 
 
+MAX_THROTTLING_TRIES = 10  # arbitrary...
+
+
+async def retry_when_throttled(
+    func, *args, max_tries=MAX_THROTTLING_TRIES, **kwargs,
+):
+    current_try = 0
+
+    while True:
+        try:
+            return await func(*args, **kwargs)
+        except ClientError as e:
+            if e.response["Error"]["Code"] == "ThrottlingException":
+                backoff_duration = get_sleep_duration(current_try)
+                current_try += 1
+                if current_try == max_tries:
+                    raise
+                warnings.warn(
+                    "get_log_events rate limit exceeded, retrying after delay.",
+                    RuntimeWarning,
+                )
+                await asyncio.sleep(backoff_duration)
+            else:
+                raise
+
+
 class Task:
     """ A superclass for managing ECS Tasks
     Parameters
@@ -296,7 +322,6 @@ class Task:
         )
 
     async def logs(self, follow=False):
-        current_try = 0
         next_token = None
         read_from = 0
 
@@ -304,13 +329,15 @@ class Task:
             try:
                 async with self._client("logs") as logs:
                     if next_token:
-                        l = await logs.get_log_events(
+                        l = await retry_when_throttled(
+                            logs.get_log_events,
                             logGroupName=self.log_group,
                             logStreamName=self._log_stream_name,
                             nextToken=next_token,
                         )
                     else:
-                        l = await logs.get_log_events(
+                        l = await retry_when_throttled(
+                            logs.get_log_events,
                             logGroupName=self.log_group,
                             logStreamName=self._log_stream_name,
                             startTime=read_from,
@@ -327,18 +354,6 @@ class Task:
                 for event in l["events"]:
                     read_from = event["timestamp"]
                     yield event["message"]
-            except ClientError as e:
-                if e.response["Error"]["Code"] == "ThrottlingException":
-                    warnings.warn(
-                        "get_log_events rate limit exceeded, retrying after delay.",
-                        RuntimeWarning,
-                    )
-                    backoff_duration = get_sleep_duration(current_try)
-                    await asyncio.sleep(backoff_duration)
-                    current_try += 1
-                else:
-                    raise
-
     def __repr__(self):
         return "<ECS Task %s: status=%s>" % (type(self).__name__, self.status)

jacobtomlinson · 2020-08-14T16:02:11Z

@lukeorland yes I think that kind of thing would be a great PR.

I have a feeling the exception from AWS may contain information on how long to wait. It's pretty common for APIs that tell you to back off to suggest how long you should wait.

zflamig · 2020-08-17T16:12:42Z

I'm not sure how aiobotocore interacts with normal boto but you probably already have appropriate retry logic from boto and just need to tweak the parameters? See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html and perhaps try using adaptive mode with more retries.

jacobtomlinson · 2020-08-17T16:37:03Z

These features are not available in botocore or aiobotocore. And the higher level boto3 doesn't support aiobotocore. However we could definitely copy that behaviour here.

zflamig · 2020-08-17T17:14:53Z

Hmm, the code appears to exist in botocore: https://github.com/boto/botocore/tree/e0fc11c3785437368435a59c41021c0bcb86275f/botocore/retries

jacobtomlinson · 2020-08-18T09:31:37Z

Ah fantastic, thanks for pointing to that @zflamig! In that case yes we should go for it.

rpanai · 2020-08-25T03:06:01Z

@jacobtomlinson I've a reproducible example

import pandas as pd
import numpy as np
import time
from dask.distributed import Client, progress
from dask import compute, delayed
from dask_cloudprovider import FargateCluster


def fun(fn):
    m = int(1e5)
    n = 10
    columns = [f"col_{i+1:02d}" for i in range(n)]
    df = pd.DataFrame(np.random.rand(m, n),
                      columns=columns)
    df = df.astype(str)
    time.sleep(2)
    df.to_csv(fn, index=False)

my_vpc = # your vpc
my_subnets = # your subnets
bucket = # a bucket you have access to write
fldr = "tests"

# start FargateCluster
cpu = 0.5
ram = 1
cluster = FargateCluster(n_workers=1,
                         image='rpanai/feats-worker:2020-08-24',
                         vpc=my_vpc,
                         subnets=my_subnets,
                         worker_cpu=int(cpu * 1024),
                         worker_mem=int(ram * 1024),
                         cloudwatch_logs_group="my_log_group",
                         task_role_policies=['arn:aws:iam::aws:policy/AmazonS3FullAccess'],
                         scheduler_timeout='20 minutes'
#                          skip_cleanup=True
                        )

cluster.adapt(minimum=1,
                      maximum=100) 
client = Client(cluster)
client

fns = [f"s3://{bucket}/{fldr}/{i+1:04d}.csv" for i in range(2000)]
to_process = [delayed(fun)(fn)
               for fn in fns]
out = compute(to_process)

where the image rpanai/feats-worker:2020-08-24 is basically daskdev/dask:latest plus s3fs.

After a while I start to see these warnings

Task exception was never retrieved
future: <Task finished name='Task-3134' coro=<_wrap_awaitable() done, defined at /home/ec2-user/anaconda3/envs/features/lib/python3.8/asyncio/tasks.py:677> exception=ClientError('An error occurred (ThrottlingException) when calling the DescribeTasks operation (reached max retries: 4): Rate exceeded')>
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/features/lib/python3.8/asyncio/tasks.py", line 684, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/home/ec2-user/anaconda3/envs/features/lib/python3.8/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 130, in _
    await self.start()
  File "/home/ec2-user/anaconda3/envs/features/lib/python3.8/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 243, in start
    await self._update_task()
  File "/home/ec2-user/anaconda3/envs/features/lib/python3.8/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 159, in _update_task
    await ecs.describe_tasks(
  File "/home/ec2-user/anaconda3/envs/features/lib/python3.8/site-packages/aiobotocore/client.py", line 134, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DescribeTasks operation (reached max retries: 4): Rate exceeded

jacobtomlinson · 2020-08-25T09:21:38Z

Yes so it looks like ECS is taking a long time to start your container and we are polling the status every second, which is eventually resulting in a ThrottlingException.

It also seems that the boto libraries will automatically retry on their own four times before giving up and raising the exception.

In this instance we probably want to rety forever as we need to just wait until the container has started. The boto3 docs suggest that if you increase the retry limit it will exponentially increase the time until a maximum of 20 seconds and then give up at your max retry limit.

So it would probably make sense for us to also retry with an exponential backoff to a maximum of 20 seconds but do so forever.

@lukeorland seems to have proposed a suitable solution, however I don't think we need to worry about the max retries as that is just for boto to decide when to raise an exception.

We are currently checking the status until the container stops pending or provisioning. So it is likely we will continuously poll, eventually hit an exception and increase our polling times until the request is successful. However once the request is successful that doesn't mean that the container is running, it may still be provisioning. In which case we will continue polling.

I'm going to raise a quick PR to add a backoff here, but would support follow up PRs to make a more generic function to handle this.

jacobtomlinson · 2020-08-25T09:32:50Z

@rpanai if you could test the fix in #124 it would be much appreciated.

You can install the PR version with pip install git+https://github.com/dask/dask-cloudprovider.git@refs/pull/124/head.

rpanai · 2020-08-25T14:11:28Z

I tried the PR version and now the output is:

ERROR - Task exception was never retrieved
future: <Task finished name='Task-2959' coro=<_wrap_awaitable() done, defined at /home/ec2-user/SageMaker/kernels/features2/lib/python3.8/asyncio/tasks.py:677> exception=InvalidParameterException('An error occurred (InvalidParameterException) when calling the RunTask operation: Error retrieving security group information for [sg-05937427457b61ff5]: Request limit exceeded. (ErrorCode: RequestLimitExceeded)')>
Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/asyncio/tasks.py", line 684, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 144, in _
    await self.start()
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 202, in start
    while timeout.run():
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/dask_cloudprovider/utils/timeout.py", line 74, in run
    raise self.exception
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 212, in start
    response = await ecs.run_task(
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/aiobotocore/client.py", line 134, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidParameterException: An error occurred (InvalidParameterException) when calling the RunTask operation: Error retrieving security group information for [sg-05937427457b61ff5]: Request limit exceeded. (ErrorCode: RequestLimitExceeded)
ERROR - Task exception was never retrieved
future: <Task finished name='Task-2885' coro=<_wrap_awaitable() done, defined at /home/ec2-user/SageMaker/kernels/features2/lib/python3.8/asyncio/tasks.py:677> exception=InvalidParameterException('An error occurred (InvalidParameterException) when calling the RunTask operation: Error retrieving security group information for [sg-05937427457b61ff5]: Request limit exceeded. (ErrorCode: RequestLimitExceeded)')>

jacobtomlinson · 2020-08-25T14:40:50Z

Thanks @rpanai. Does that happen consistently?

rpanai · 2020-08-25T14:51:58Z

I tried with

cluster.adapt(minimum=1,
              maximum=40) # reducing from 100
client = Client(cluster)
client

And I didn't have any warning. In another experiment when I read and write to S3 I get the warning even when maximum=40.

In the example with maximum=100 the following

client.close()
cluster.close()

takes forever and I still can see many workers (~80) still on several minutes after the computation is done.

rpanai · 2020-08-25T15:01:44Z

I tried with maximum=250 and I got

tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <zmq.eventloop.ioloop.ZMQIOLoop object at 0x7f8a513e04f0>>, <Task finished name='Task-3002' coro=<SpecCluster._correct_state_internal() done, defined at /home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/distributed/deploy/spec.py:320> exception=RuntimeError({'tasks': [], 'failures': [{'reason': "You've reached the limit on the number of tasks you can run concurrently"}], 'ResponseMetadata': {'RequestId': '79d015b6-6cdc-451d-a6e4-69b081eda0cb', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '79d015b6-6cdc-451d-a6e4-69b081eda0cb', 'content-type': 'application/x-amz-json-1.1', 'content-length': '111', 'date': 'Tue, 25 Aug 2020 15:00:02 GMT'}, 'RetryAttempts': 0}})>)
Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/distributed/deploy/spec.py", line 355, in _correct_state_internal
    await w  # for tornado gen.coroutine support
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 144, in _
    await self.start()
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 202, in start
    while timeout.run():
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/dask_cloudprovider/utils/timeout.py", line 74, in run
    raise self.exception
  File "/home/ec2-user/SageMaker/kernels/features2/lib/python3.8/site-packages/dask_cloudprovider/providers/aws/ecs.py", line 241, in start
    raise RuntimeError(response)  # print entire response
RuntimeError: {'tasks': [], 'failures': [{'reason': "You've reached the limit on the number of tasks you can run concurrently"}], 'ResponseMetadata': {'RequestId': '79d015b6-6cdc-451d-a6e4-69b081eda0cb', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '79d015b6-6cdc-451d-a6e4-69b081eda0cb', 'content-type': 'application/x-amz-json-1.1', 'content-length': '111', 'date': 'Tue, 25 Aug 2020 15:00:02 GMT'}, 'RetryAttempts': 0}}

jacobtomlinson · 2020-08-25T15:09:52Z

Yes that error is expected. In order to avoid that you will need to request AWS to increase your service limits.

valpesendorfer · 2021-06-01T07:51:10Z

Hi @jacobtomlinson,

should this issue be resolved with #124 ?

I'm running dask-cloudprovider==2021.3.1 and run into the same issues described above. When launching a cluster with a sizeable amount of workers (say 60) some of them fail to start, leaving the client waiting indefinitely. I tried both launching the cluster with the required number of workers and launching only with one and scaling up afterwards, both times running into the same issue

Here's the traceback:

Task exception was never retrieved
future: <Task finished name='Task-492' coro=<_wrap_awaitable() done, defined at /usr/local/lib/python3.8/asyncio/tasks.py:688> exception=RuntimeError('Worker failed to start')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/lib/python3.8/site-packages/dask_cloudprovider/aws/ecs.py", line 163, in _
    await self.start()
  File "/usr/local/lib/python3.8/site-packages/dask_cloudprovider/aws/ecs.py", line 290, in start
    raise RuntimeError("%s failed to start" % type(self).__name__)
RuntimeError: Worker failed to start
Task exception was never retrieved
future: <Task finished name='Task-494' coro=<_wrap_awaitable() done, defined at /usr/local/lib/python3.8/asyncio/tasks.py:688> exception=RuntimeError('Worker failed to start')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/lib/python3.8/site-packages/dask_cloudprovider/aws/ecs.py", line 163, in _
    await self.start()
  File "/usr/local/lib/python3.8/site-packages/dask_cloudprovider/aws/ecs.py", line 290, in start
    raise RuntimeError("%s failed to start" % type(self).__name__)
RuntimeError: Worker failed to start
Task exception was never retrieved
future: <Task finished name='Task-457' coro=<_wrap_awaitable() done, defined at /usr/local/lib/python3.8/asyncio/tasks.py:688> exception=RuntimeError('Worker failed to start')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/lib/python3.8/site-packages/dask_cloudprovider/aws/ecs.py", line 163, in _
    await self.start()
  File "/usr/local/lib/python3.8/site-packages/dask_cloudprovider/aws/ecs.py", line 290, in start
    raise RuntimeError("%s failed to start" % type(self).__name__)
RuntimeError: Worker failed to start
Task exception was never retrieved
future: <Task finished name='Task-451' coro=<_wrap_awaitable() done, defined at /usr/local/lib/python3.8/asyncio/tasks.py:688> exception=RuntimeError('Worker failed to start')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/local/lib/python3.8/site-packages/dask_cloudprovider/aws/ecs.py", line 163, in _
    await self.start()
  File "/usr/local/lib/python3.8/site-packages/dask_cloudprovider/aws/ecs.py", line 290, in start
    raise RuntimeError("%s failed to start" % type(self).__name__)
RuntimeError: Worker failed to start

The example code:

from dask_cloudprovider.aws import FargateCluster

cluster = FargateCluster(
    n_workers=60,
    vpc="VPC-ID",
    security_groups=["SG-ID"],
    subnets=["SN-ID"],
    worker_cpu=512,
    worker_mem=1024
)

Thanks!

jacobtomlinson · 2021-06-01T11:02:04Z

@valpesendorfer can you share any errors from ECS on why they failed to start?

valpesendorfer · 2021-06-01T12:22:03Z

Apologies, should have checked that myself. Looks like I've ran out of public IPs within my subnet - some workers didn't even spin up with the error:

STOPPED (Unexpected EC2 error while attempting to Create Network Interface with public IP assignment enabled in subnet 'SN-ID': InsufficientFreeAddressesInSubnet)

So it's an AWS issue, not a dask-cloudprovider one. Thanks!

rpanai · 2021-06-03T15:36:02Z

Thanks @rpanai. Does that happen consistently?

Hi I'm still having the same problem despite using the latest version of Dask and dask-cloudprovider.
If I'm processing many input with bag or delayed I still continue to receive the following error

botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DescribeTasks operation (reached max retries: 4): Rate exceeded

long after all the computation is done. As consequence my script never ends and, as I'm using Airflow in order to manage my pipeline, I got stucked.

michaellee1 · 2022-07-26T18:22:40Z

Hi - I'm getting the same error as a previous user in this thread.

Error retrieving security group information for [sg-0176ba5002edd59c3]: Request limit exceeded. (ErrorCode: RequestLimitExceeded)

To my knowledge, there isn't anything I can do to increase this from Service Quotas. This only happens when I send off a large Batch array job on Fargate.

jacobtomlinson · 2022-07-27T08:40:37Z

@michaellee1 would you mind opening a new issue, this one is closed and hasn't seen any activity for over a year.

Changes by create-pull-request action

jacobtomlinson added the bug Something isn't working label Jan 27, 2020

mrocklin mentioned this issue Feb 25, 2020

AzureMLCluster #67

Merged

jacobtomlinson mentioned this issue Aug 25, 2020

Backoff and retry status polling #124

Merged

jacobtomlinson closed this as completed in #124 Oct 29, 2020

valpesendorfer mentioned this issue Jun 1, 2021

AWS Fargate: public scheduler /w private workers? #288

Closed

lukaszo pushed a commit to lukaszo/dask-cloudprovider that referenced this issue Oct 12, 2023

Merge pull request dask#56 from koyeb/create-pull-request/patch

07a2331

Changes by create-pull-request action

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Rate exceeded" error on FargateCluster #56

"Rate exceeded" error on FargateCluster #56

rsignell-usgs commented Jan 26, 2020 •

edited

jacobtomlinson commented Jan 27, 2020

rpanai commented Aug 14, 2020

lukeorland commented Aug 14, 2020

lukeorland commented Aug 14, 2020

lukeorland commented Aug 14, 2020 •

edited

jacobtomlinson commented Aug 14, 2020

zflamig commented Aug 17, 2020

jacobtomlinson commented Aug 17, 2020

zflamig commented Aug 17, 2020

jacobtomlinson commented Aug 18, 2020

rpanai commented Aug 25, 2020

jacobtomlinson commented Aug 25, 2020

jacobtomlinson commented Aug 25, 2020

rpanai commented Aug 25, 2020

jacobtomlinson commented Aug 25, 2020

rpanai commented Aug 25, 2020

rpanai commented Aug 25, 2020

jacobtomlinson commented Aug 25, 2020

valpesendorfer commented Jun 1, 2021

jacobtomlinson commented Jun 1, 2021

valpesendorfer commented Jun 1, 2021

rpanai commented Jun 3, 2021

michaellee1 commented Jul 26, 2022

jacobtomlinson commented Jul 27, 2022

"Rate exceeded" error on FargateCluster #56

"Rate exceeded" error on FargateCluster #56

Comments

rsignell-usgs commented Jan 26, 2020 • edited

jacobtomlinson commented Jan 27, 2020

rpanai commented Aug 14, 2020

lukeorland commented Aug 14, 2020

lukeorland commented Aug 14, 2020

lukeorland commented Aug 14, 2020 • edited

jacobtomlinson commented Aug 14, 2020

zflamig commented Aug 17, 2020

jacobtomlinson commented Aug 17, 2020

zflamig commented Aug 17, 2020

jacobtomlinson commented Aug 18, 2020

rpanai commented Aug 25, 2020

jacobtomlinson commented Aug 25, 2020

jacobtomlinson commented Aug 25, 2020

rpanai commented Aug 25, 2020

jacobtomlinson commented Aug 25, 2020

rpanai commented Aug 25, 2020

rpanai commented Aug 25, 2020

jacobtomlinson commented Aug 25, 2020

valpesendorfer commented Jun 1, 2021

jacobtomlinson commented Jun 1, 2021

valpesendorfer commented Jun 1, 2021

rpanai commented Jun 3, 2021

michaellee1 commented Jul 26, 2022

jacobtomlinson commented Jul 27, 2022

rsignell-usgs commented Jan 26, 2020 •

edited

lukeorland commented Aug 14, 2020 •

edited