Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new SSM hook and use it in the System Test context builder #28755

Merged
merged 5 commits into from
Jan 12, 2023

Conversation

ferruzzi
Copy link
Contributor

@ferruzzi ferruzzi commented Jan 5, 2023

Adds a new hook for AWS Systems Manager (SSM), including unit tests, and incorporates it into existing system tests.

airflow/providers/amazon/aws/hooks/ssm.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/hooks/ssm.py Outdated Show resolved Hide resolved
@ferruzzi
Copy link
Contributor Author

ferruzzi commented Jan 5, 2023

I see some tests are failing, I've got a lot on my plate today, I'll try to get to them tomorrow. Sorry about that.

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Jan 6, 2023

Causing test_should_not_do_database_queries to fail on a whole slew of system tests with the following message:

E           AssertionError: The expected number of db queries is 0 with extra margin: 0. The current number is 1.
E           
E           Recorded query locations:
E           	cached_property.py:__get__:36>base_aws.py:conn_config:513>base.py:get_connection:72>connection.py:get_connection_from_secrets:425>metastore.py:get_connection:39:	1

tests/test_utils/asserts.py:115: AssertionError

I'll try to figure it out.

@ferruzzi
Copy link
Contributor Author

Reverting the change here does pass tests (locally at least) but surely there must e a way to use the hook there. If I can't figure it out by tomorrow I'll just revert that one line so the rest of this can move forward.

@@ -92,15 +93,15 @@ def _fetch_from_ssm(key: str, test_name: str | None = None) -> str:
:return: The value of the provided key from SSM
"""
_test_name: str = test_name if test_name else _get_test_name()
ssm_client: BaseClient = boto3.client("ssm")
hook = SsmHook()
Copy link
Contributor

@Taragolis Taragolis Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ferruzzi I guess the issues with tests here.
Due to the nature of AwsBaseHook by default it use aws_conn_id="aws_default" as result when you try to create SSM Client than hook obtain connection first.

@cached_property
def conn_config(self) -> AwsConnectionWrapper:
"""Get the Airflow Connection object and wrap it in helper (cached)."""
connection = None
if self.aws_conn_id:
try:
connection = self.get_connection(self.aws_conn_id)
except AirflowNotFoundException:
warnings.warn(
f"Unable to find AWS Connection ID '{self.aws_conn_id}', switching to empty. "
"This behaviour is deprecated and will be removed in a future releases. "
"Please provide existed AWS connection ID or if required boto3 credential strategy "
"explicit set AWS Connection ID to None.",
DeprecationWarning,
stacklevel=2,
)
return AwsConnectionWrapper(
conn=connection, region_name=self._region_name, botocore_config=self._config, verify=self._verify
)

I think if you set aws_conn_id to None it should not use Airflow Connection

Suggested change
hook = SsmHook()
hook = SsmHook(aws_conn_id=None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had two solutions I was playing with and that was not one of them. I'll try that out, if it works it would be a much less intrusive fix than the one I decided to implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this through, that would mean that the connection itself is not being tested. Is that something we want to accept in a system test?

I think it's alright in this case. The system tests are supposed to be testing the specific services, not whether the connection to SSM is working.... but that does mean it's a less-complete system test.

Copy link
Contributor

@Taragolis Taragolis Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this through, that would mean that the connection itself is not being tested.

We tests connections for AwsBaseHook and all other helpers. Actually there is 3 cases and all of them tested:

  1. aws_conn_id = None
  2. aws_conn_id = not-existed-conn, we know only after we tried connect to DB (one in the future we would remove this one)
  3. aws_conn_id = exists-conn

So we don't need test for all Hooks which based on AwsBaseHook and AwsGenericHook and how they obtain connection from Airflow.

The actual error happen in this test

@pytest.mark.parametrize("example", example_dags_except_db_exception(), ids=relative_path)
def test_should_not_do_database_queries(example):
with assert_queries_count(0):
DagBag(
dag_folder=example,
include_examples=False,
)

I'm not familiar with this test, but seems like it test that example DAGs do not perform queries to Airflow Database during import example dags: #7419

I don't know maybe this test less relevant nowadays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was digging in that code yesterday and one option I worked on was adding AWS system tests as an exception and to assert that AWS tests make exactly one query, but I don't really know how important that restriction is. so I was trying to look into the restriction and why it's there. It looks like your change does fix the test (this PR is passing now) so maybe let's run with this.

@o-nikolas, can you have a look at this when you get a few minutes and let me know what you think? I think I'm inclined to agree with Taragolis that this is the right solution here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love changing the "prod" code in this way to make the tests pass (I think the expectations in that unit test should be updated for AWS examples). But on the other hand the "prod" code here is technically also test code, so it's not so bad and it's a clean change 👍 Happy to merge!

Should bypass the DB query and pass tests

Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
@o-nikolas o-nikolas merged commit 870ecd4 into apache:main Jan 12, 2023
@pierrejeambrun pierrejeambrun added this to the Airflow 2.6.0 milestone Feb 27, 2023
@pierrejeambrun pierrejeambrun added the type:new-feature Changelog: New Features label Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants