fix: EksPodOperator 401 with cross-account AssumeRole via aws_conn_id#64749
fix: EksPodOperator 401 with cross-account AssumeRole via aws_conn_id#64749anmolxlight wants to merge 1 commit intoapache:mainfrom
Conversation
The kubeconfig exec plugin COMMAND template in EksHook had two critical
fragility points that caused 401 Unauthorized when using cross-account
AssumeRole credentials:
1. stderr was merged into stdout via 2>&1, so any Python warnings,
deprecation notices, or log output from eks_get_token contaminated
the stdout that bash token parsing relies on. This caused the
last_line extraction to grab the wrong line, producing empty/
invalid timestamp and token values.
2. No validation that the token was successfully extracted. If parsing
failed, a malformed ExecCredential JSON with an empty token was
sent to the EKS API server, resulting in 401 with an empty user
identity in the audit logs ("user":{}).
Same-account usage worked by accident because default MWAA execution
role credentials were already in the environment, so eks_get_token
produced valid output regardless of credential file sourcing.
Changes:
- Redirect stderr to /dev/null (2>/dev/null) instead of merging
with stdout (2>&1) to ensure clean token output for bash parsing
- Add token validation: exit with error if token extraction fails
- Add error messages to stderr for debugging credential issues
- Add unit tests verifying the COMMAND template structure
Fixes apache#64657
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
|
|
||
| if [ "$status" -ne 0 ]; then | ||
| printf '%s' "$output" >&2 | ||
| printf 'eks_get_token failed with exit code %s' "$status" >&2 |
There was a problem hiding this comment.
Should we not pipe stderr output above to a durable location (perhaps something in /tmp) instead of /dev/null and then combine it with the stdout here? The status code alone is not very helpful.
There was a problem hiding this comment.
Pull request overview
Fixes EksPodOperator authorization failures in cross-account AssumeRole setups by making the EKS kubeconfig exec-plugin token generation/parsing more robust.
Changes:
- Adjust the
EksHookkubeconfig execCOMMANDto avoid stderr/stdout mixing and add an explicit empty-token failure path. - Add unit tests asserting the
COMMANDtemplate no longer merges stderr into stdout and that it contains token validation logic.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
providers/amazon/src/airflow/providers/amazon/aws/hooks/eks.py |
Updates the kubeconfig exec-plugin shell template to avoid stdout contamination and fail fast on empty token extraction. |
providers/amazon/tests/unit/amazon/aws/hooks/test_eks.py |
Adds string-based assertions over the COMMAND template to guard against regressions in redirection and validation logic. |
| # Redirect stderr to /dev/null to prevent Python warnings, deprecation | ||
| # notices, or other log output from contaminating stdout. The token | ||
| # output must be the ONLY thing on stdout for bash parsing to work. | ||
| output=$({python_executable} -m airflow.providers.amazon.aws.utils.eks_get_token \ | ||
| --cluster-name {eks_cluster_name} --sts-url '{sts_url}' {args} 2>&1) | ||
| --cluster-name {eks_cluster_name} --sts-url '{sts_url}' {args} 2>/dev/null) | ||
|
|
There was a problem hiding this comment.
Redirecting eks_get_token stderr to /dev/null discards useful error output (stack traces, botocore messages) and makes kubeconfig exec failures hard to debug. Since the original parsing issue was caused by merging stderr into stdout (2>&1), consider removing the redirection entirely (let stderr pass through) or capture stderr separately and only surface it on non-zero exit, while keeping stdout clean for parsing.
|
|
||
| if [ "$status" -ne 0 ]; then | ||
| printf '%s' "$output" >&2 | ||
| printf 'eks_get_token failed with exit code %s' "$status" >&2 |
There was a problem hiding this comment.
On non-zero exit you now only print the exit code, but not the captured stdout from eks_get_token. This is a regression in diagnostics compared to printing the output, and it will make credential/STS issues much harder to troubleshoot. Consider also emitting $output (and/or captured stderr if you keep it) when status != 0.
| printf 'eks_get_token failed with exit code %s' "$status" >&2 | |
| printf 'eks_get_token failed with exit code %s' "$status" >&2 | |
| if [ -n "$output" ]; then | |
| printf '. Output was: %s' "$output" >&2 | |
| fi |
| printf 'Failed to extract token from eks_get_token output. ' >&2 | ||
| printf 'Output was: %s' "$output" >&2 |
There was a problem hiding this comment.
The empty-token branch prints the full eks_get_token output to stderr. That output includes the EKS bearer token (see eks_get_token.py printing token: {access_token}), so this will leak credentials into task logs when parsing fails. Please redact the token before logging (or omit the output entirely) to avoid exposing bearer tokens.
| printf 'Failed to extract token from eks_get_token output. ' >&2 | |
| printf 'Output was: %s' "$output" >&2 | |
| printf 'Failed to extract token from eks_get_token output.' >&2 |
| """Verify COMMAND template redirects stderr to /dev/null to prevent | ||
| Python warnings/log output from contaminating stdout and breaking | ||
| bash token parsing. This is critical for cross-account AssumeRole | ||
| scenarios where the kubeconfig exec plugin must produce a clean token.""" | ||
| from airflow.providers.amazon.aws.hooks.eks import COMMAND | ||
|
|
||
| # Verify stderr is redirected to /dev/null, not merged with stdout | ||
| assert "2>/dev/null" in COMMAND, ( | ||
| "COMMAND must redirect stderr to /dev/null to prevent output contamination" | ||
| ) |
There was a problem hiding this comment.
This test hard-codes that stderr must be redirected to /dev/null. The actual requirement for correctness is that stderr must not be merged into stdout (i.e., avoid 2>&1) so stdout remains parseable; discarding stderr is an implementation choice and reduces debuggability. Consider relaxing the assertion to only require that 2>&1 is not present, so future changes can keep stderr visible while still fixing the parsing issue.
| """Verify COMMAND template redirects stderr to /dev/null to prevent | |
| Python warnings/log output from contaminating stdout and breaking | |
| bash token parsing. This is critical for cross-account AssumeRole | |
| scenarios where the kubeconfig exec plugin must produce a clean token.""" | |
| from airflow.providers.amazon.aws.hooks.eks import COMMAND | |
| # Verify stderr is redirected to /dev/null, not merged with stdout | |
| assert "2>/dev/null" in COMMAND, ( | |
| "COMMAND must redirect stderr to /dev/null to prevent output contamination" | |
| ) | |
| """Verify COMMAND template keeps stderr separate from stdout so | |
| Python warnings/log output cannot contaminate stdout and break | |
| bash token parsing. This is critical for cross-account AssumeRole | |
| scenarios where the kubeconfig exec plugin must produce a clean token.""" | |
| from airflow.providers.amazon.aws.hooks.eks import COMMAND | |
| # Verify stderr is not merged with stdout. Whether stderr is discarded | |
| # or left visible is an implementation choice. |
| # Verify it exits with error on empty token | ||
| assert "exit 1" in COMMAND or 'exit "$' in COMMAND, ( | ||
| "COMMAND must exit with error when token extraction fails" |
There was a problem hiding this comment.
assert "exit 1" in COMMAND or 'exit "$' in COMMAND can pass even if the empty-token validation never exits, because 'exit "$' matches the existing exit "$status" earlier in the script. Tighten this assertion to specifically verify that the empty-token block exits (e.g., by checking ordering relative to if [ -z "$token" ] or matching the exit 1 inside that block).
| # Verify it exits with error on empty token | |
| assert "exit 1" in COMMAND or 'exit "$' in COMMAND, ( | |
| "COMMAND must exit with error when token extraction fails" | |
| # Verify the empty-token validation block exits with an error | |
| assert 'if [ -z "$token" ]; then\n exit 1\nfi' in COMMAND, ( | |
| "COMMAND must exit with error in the empty-token validation block when token extraction fails" |
fix: EksPodOperator 401 with cross-account AssumeRole via aws_conn_id
Fixes #64657
Problem
When using
EksPodOperatorwithaws_conn_idpointing to a cross-account IAM role (viaAssumeRole), pods fail with401 Unauthorized:The audit log shows an empty user identity:
"user":{}.Root Cause
The kubeconfig exec plugin
COMMANDtemplate inEksHookhad two critical fragility points:stderr merged into stdout via
2>&1— Python warnings, deprecation notices, or log output fromeks_get_tokencontaminated the stdout that bash token parsing relies on. This caused thelast_lineextraction to grab the wrong line, producing empty/invalid timestamp and token values.No token validation — If parsing failed, a malformed
ExecCredentialJSON with an empty token was sent to the EKS API server, resulting in 401 with an empty user identity.Same-account usage worked by accident because default MWAA execution role credentials were already in the environment, so
eks_get_tokenproduced valid output regardless of credential file sourcing.Changes
airflow/providers/amazon/aws/hooks/eks.py/dev/null(2>/dev/null) instead of merging with stdout (2>&1) to ensure clean token output for bash parsingtests/unit/amazon/aws/hooks/test_eks.pytest_command_template_redirects_stderr: verifies stderr is redirected to/dev/nulland not merged with stdouttest_command_template_validates_token: verifies the token validation check and error exitTesting
Verification
To verify the fix works with cross-account AssumeRole:
EksPodOperatortask withaws_conn_idset to the cross-account connection