Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise batch jobs with a client config that does not timeout #40

Open
dazza-codes opened this issue Sep 8, 2021 · 1 comment
Open

Comments

@dazza-codes
Copy link
Owner

dazza-codes commented Sep 8, 2021

This is related to aio-libs/aiobotocore#864

A batch job monitor can crash hard due to an expired signature in the client, e.g.


  File "/opt/conda/envs/gis/lib/python3.7/site-packages/aio_aws/aio_aws_batch.py", line 1102, in aio_batch_job_manager
    await aio_batch_job_waiter(job, config=config)
  File "/opt/conda/envs/gis/lib/python3.7/site-packages/aio_aws/aio_aws_batch.py", line 1001, in aio_batch_job_waiter
    response = await aio_batch_job_status([job.job_id], config)
  File "/opt/conda/envs/gis/lib/python3.7/site-packages/aio_aws/aio_aws_batch.py", line 824, in aio_batch_job_status
    return await batch_client.describe_jobs(jobs=jobs)
  File "/opt/conda/envs/gis/lib/python3.7/site-packages/aiobotocore/client.py", line 155, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the DescribeJobs operation: Signature expired: 20210908T202653Z is now earlier than 20210908T202940Z (20210908T203440Z - 5 min.)

It might be possible to work around this by adding options to increase a connection or read timeout, e.g.

    client_config = AioConfig(
        connect_timeout=20,
        read_timeout=900,
        max_pool_connections=max_pool_connections,
    )
    async with aio_batch_config.create_client("batch", config=client_config) as batch_client:
        # run batch monitoring for any long-running batch jobs
        pass

Another similar pattern uses a default config:

        client_config = aio_config.session.get_default_client_config()
        s3_config = AioConfig(signature_version=UNSIGNED)
        s3_config = client_config.merge(s3_config)

        async with aio_config.create_client("s3", config=s3_config) as s3_client:
            # do s3 stuff
            pass

The monitoring code might need to detect and catch exceptions for invalid signatures. It could replace the client with a new one, or find some way to update the signature for a client.

@dazza-codes
Copy link
Owner Author

This might be solved by limiting configs to using a single client in any connection pool, so that clients cannot become stale in the pool. e.g.

    aio_batch_config = AWSBatchConfig(
        aio_batch_db=jobs_db,
        min_pause=20,
        max_pause=40,
        start_pause=60,
        max_pool_connections=1,
        sem=500,
    )
    asyncio.run(aio_batch_monitor_jobs(jobs=jobs, config=aio_batch_config))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant