fix(amazon): Support AWS China region endpoints in RedshiftSQLHook OpenLineage identifier parsing#65483
Merged
vincbeck merged 1 commit intoapache:mainfrom Apr 20, 2026
Merged
Conversation
…enLineage identifier parsing The _get_identifier_from_hostname method in RedshiftSQLHook only handled global AWS endpoints (amazonaws.com) but not AWS China region endpoints (amazonaws.com.cn). This caused the OpenLineage authority part to fall back to the full hostname instead of correctly parsing cluster_identifier.region_name. Global endpoint format (6 dot-separated parts): my-cluster.id.us-east-1.redshift.amazonaws.com China endpoint format (7 dot-separated parts): my-cluster.id.cn-north-1.redshift.amazonaws.com.cn The same issue affects both provisioned clusters and Redshift Serverless workgroups in cn-north-1 and cn-northwest-1.
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
|
vincbeck
approved these changes
Apr 20, 2026
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
59 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fix
_get_identifier_from_hostnameinRedshiftSQLHookto correctly parse AWS China region endpoints (amazonaws.com.cn).Why
AWS China regions (cn-north-1, cn-northwest-1) use a different endpoint suffix than global regions:
my-cluster.id.us-east-1.redshift.amazonaws.commy-cluster.id.cn-north-1.redshift.amazonaws.com.cnThe existing code only checks
hostname.endswith("amazonaws.com") and len(parts) == 6, which fails for China endpoints. This causes the OpenLineage authority to fall back to the raw hostname instead of the expectedcluster_identifier.region_nameformat.Affects both provisioned clusters and Redshift Serverless workgroups in China regions.
How
Added a check for
amazonaws.com.cnwithlen(parts) == 7, using the same parsing logic (parts[0].parts[2]) since the cluster identifier and region are at the same positions.Testing
Added two parametrized test cases covering:
Verified with a real Redshift Serverless endpoint in cn-north-1:
airflow-test.029295324148.cn-north-1.redshift-serverless.amazonaws.com.cn