Skip to content

Use presigned URL for delivering logs from worker to central S3 bucket #64254

@rpofuk

Description

@rpofuk

Description

Add support for pre-signed URL based log I/O as a RemoteLogIO implementation, where:

  • API server (reader): Uses the built-in S3RemoteLogIO with direct IAM access for reading logs (no change needed).
  • Worker (uploader): Uses a new RemoteLogIO that requests a pre-signed PUT URL from the API server and uploads via plain HTTP. No AWS credentials needed on the worker.

How it works

Worker (after task)          API Server               S3
  |                            |                       |
  |-- POST /presigned-url ---->|                       |
  |                            |-- generate PUT URL -->|
  |<-- { presigned_url } ------|                       |
  |                                                    |
  |------------- HTTP PUT log file ------------------->|

The API server endpoint that generates pre-signed URLs can enforce custom authorization rules before issuing the URL - e.g. verifying the worker's service account is only allowed to upload logs for DAGs in its bundle.

The only change a worker deployment needs is to use new functionalit:

[logging]
remote_log_io_role = worker

Optional: custom auth hook

By default, the presigned URL endpoints use standard Airflow authentication (the requesting user must be authenticated). For deployments that need additional authorization logic (e.g. bundle-scoped access, tenant isolation), an optional callable can be configured:

[logging]
presigned_url_auth_hook = mypackage.auth.validate_log_access

I'm deploying solution on our side to pruduction and would gladely contribute if it would be accepted (Dont want to go to trouble of getting appoval to opensource it nobody is interested :))

Use case/motivation

Airflow 3.x introduced RemoteLogIO as the protocol for remote log upload/download from the supervisor process. Currently, the only built-in implementation uses direct S3 access (S3RemoteLogIO), which requires the worker to have S3 credentials.

In multi-account or zero-trust deployments, workers run on a separate AWS account and should not be trusted with S3 write credentials. There is no way to plug in a custom log upload/download mechanism that uses pre-signed URLs instead of direct S3 access.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions