Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate remaining Always & CLI tests to pytest #28459

Closed

Conversation

Taragolis
Copy link
Contributor

Migrate remaining Always & CLI tests to pytest in selected directories

  • tests/always/*
  • tests/cli/*

As usual add findings, info about significant changes and potential follow up in comments

@@ -30,68 +30,79 @@
)


class TestProjectStructure(unittest.TestCase):
class TestProjectStructure:
Copy link
Contributor Author

@Taragolis Taragolis Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought TestProjectStructure should be re-worked, I miss the main idea of this test so it would be nice if someone could share the idea about this tests


test_deprecated_packages - I thought this test not properly worked since actual files replaced by PEP-562 getattr - #26153


test_providers_modules_should_have_tests - This test broken for a while, generators wrapped into generators and when it actually transformed to set this line broke expected_test_files calculation and it always set([])

modules_files = set(modules_files)

Instead of raise an error in this test I print warning until we fix or rewrite this test logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this module come from our past renaming we've doe in AIP-21 - and some of them can be removed or simplified or moved elsewhere as they never caught up as "common provider" approach:

  • test_deprecated_packages -> we should indeed remove, they are no problem any more
  • Test...ProviderProjectStructure - we should remove them - they bring no value and having then for few providers in here makes littel sense
  • ExampleCoverageTest - similarly
  • AssetsCoverageTest -> I think this is largely used by Google team. Naybe @bhirsz might tell more about it (though in any case I thik if it is useful for Google it should go to google provider tests.
  • test_providers_modules_should_have_tests -> is the only one I find useful from those for now but if it's broken, it might be we are far-off and fixing it would be problematic.

Copy link
Contributor Author

@Taragolis Taragolis Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_providers_modules_should_have_tests after changes by this PR shows this warning

Detect missing tests in providers module.

Modules Files: 566, Current Test Files: 583, Missing Tests Files: 97.

tests/providers/amazon/aws/hooks/test_dms.py
tests/providers/amazon/aws/links/test_base_aws.py
tests/providers/amazon/aws/links/test_batch.py
tests/providers/amazon/aws/links/test_emr.py
tests/providers/amazon/aws/links/test_logs.py
tests/providers/amazon/aws/operators/test_aws_lambda.py
tests/providers/amazon/aws/operators/test_dms.py
tests/providers/amazon/aws/operators/test_emr.py
tests/providers/amazon/aws/operators/test_s3.py
tests/providers/amazon/aws/operators/test_sagemaker.py
tests/providers/amazon/aws/sensors/test_dms.py
tests/providers/amazon/aws/sensors/test_ecs.py
tests/providers/amazon/aws/sensors/test_emr.py
tests/providers/amazon/aws/sensors/test_s3.py
tests/providers/amazon/aws/sensors/test_sagemaker.py
tests/providers/amazon/aws/test_exceptions.py
tests/providers/amazon/aws/utils/test_rds.py
tests/providers/amazon/aws/waiters/test_base_waiter.py
tests/providers/apache/cassandra/hooks/test_cassandra.py
tests/providers/apache/druid/operators/test_druid_check.py
tests/providers/cncf/kubernetes/backcompat/test_backwards_compat_converters.py
tests/providers/cncf/kubernetes/test_python_kubernetes_script.py
tests/providers/cncf/kubernetes/utils/test_xcom_sidecar.py
tests/providers/databricks/hooks/test_databricks_base.py
tests/providers/databricks/utils/test_databricks.py
tests/providers/dbt/cloud/hooks/test_dbt.py
tests/providers/dbt/cloud/operators/test_dbt.py
tests/providers/dbt/cloud/sensors/test_dbt.py
tests/providers/elasticsearch/log/test_es_json_formatter.py
tests/providers/google/cloud/links/test_base.py
tests/providers/google/cloud/links/test_bigquery.py
tests/providers/google/cloud/links/test_bigquery_dts.py
tests/providers/google/cloud/links/test_bigtable.py
tests/providers/google/cloud/links/test_cloud_build.py
tests/providers/google/cloud/links/test_cloud_functions.py
tests/providers/google/cloud/links/test_cloud_memorystore.py
tests/providers/google/cloud/links/test_cloud_sql.py
tests/providers/google/cloud/links/test_cloud_storage_transfer.py
tests/providers/google/cloud/links/test_cloud_tasks.py
tests/providers/google/cloud/links/test_compute.py
tests/providers/google/cloud/links/test_data_loss_prevention.py
tests/providers/google/cloud/links/test_datacatalog.py
tests/providers/google/cloud/links/test_dataflow.py
tests/providers/google/cloud/links/test_dataform.py
tests/providers/google/cloud/links/test_dataplex.py
tests/providers/google/cloud/links/test_dataprep.py
tests/providers/google/cloud/links/test_dataproc.py
tests/providers/google/cloud/links/test_datastore.py
tests/providers/google/cloud/links/test_kubernetes_engine.py
tests/providers/google/cloud/links/test_life_sciences.py
tests/providers/google/cloud/links/test_mlengine.py
tests/providers/google/cloud/links/test_pubsub.py
tests/providers/google/cloud/links/test_spanner.py
tests/providers/google/cloud/links/test_stackdriver.py
tests/providers/google/cloud/links/test_vertex_ai.py
tests/providers/google/cloud/links/test_workflows.py
tests/providers/google/cloud/operators/vertex_ai/test_auto_ml.py
tests/providers/google/cloud/operators/vertex_ai/test_batch_prediction_job.py
tests/providers/google/cloud/operators/vertex_ai/test_custom_job.py
tests/providers/google/cloud/operators/vertex_ai/test_dataset.py
tests/providers/google/cloud/operators/vertex_ai/test_endpoint_service.py
tests/providers/google/cloud/operators/vertex_ai/test_hyperparameter_tuning_job.py
tests/providers/google/cloud/operators/vertex_ai/test_model_service.py
tests/providers/google/cloud/sensors/test_dataform.py
tests/providers/google/cloud/transfers/test_presto_to_gcs.py
tests/providers/google/cloud/transfers/test_trino_to_gcs.py
tests/providers/google/cloud/triggers/test_cloud_composer.py
tests/providers/google/cloud/triggers/test_dataproc.py
tests/providers/google/cloud/utils/test_bigquery.py
tests/providers/google/cloud/utils/test_bigquery_get_data.py
tests/providers/google/cloud/utils/test_dataform.py
tests/providers/google/common/links/test_storage.py
tests/providers/google/common/test_consts.py
tests/providers/google/test_go_module_utils.py
tests/providers/microsoft/azure/hooks/test_batch.py
tests/providers/microsoft/azure/hooks/test_container_instance.py
tests/providers/microsoft/azure/hooks/test_container_registry.py
tests/providers/microsoft/azure/hooks/test_container_volume.py
tests/providers/microsoft/azure/hooks/test_cosmos.py
tests/providers/microsoft/azure/hooks/test_data_factory.py
tests/providers/microsoft/azure/hooks/test_data_lake.py
tests/providers/microsoft/azure/hooks/test_fileshare.py
tests/providers/microsoft/azure/hooks/test_synapse.py
tests/providers/microsoft/azure/operators/test_adls.py
tests/providers/microsoft/azure/operators/test_batch.py
tests/providers/microsoft/azure/operators/test_container_instances.py
tests/providers/microsoft/azure/operators/test_cosmos.py
tests/providers/microsoft/azure/operators/test_data_factory.py
tests/providers/microsoft/azure/operators/test_synapse.py
tests/providers/microsoft/azure/secrets/test_key_vault.py
tests/providers/microsoft/azure/sensors/test_cosmos.py
tests/providers/microsoft/azure/sensors/test_data_factory.py
tests/providers/mongo/sensors/test_mongo.py
tests/providers/presto/transfers/test_gcs_to_presto.py
tests/providers/redis/operators/test_redis_publish.py
tests/providers/redis/sensors/test_redis_key.py
tests/providers/trino/transfers/test_gcs_to_trino.py

Some of them about deprecated modules

Copy link
Contributor Author

@Taragolis Taragolis Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • some of tests moved into tests.integrations
  • Amazon links tested in one go in same module
  • Some of amazon tests split across different test modules, for example: test_s3_bucket.py + test_s3_bucket_tagging.py + test_s3_file_transform.py + test_s3_list.py + test_s3_list_prefixes.py + test_s3_object.py but module named as s3.py (merged version)

So in theory there is no problem to fix some issue, add tests.integration as search path and add temporary exclusion for remaining and turn on this test again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Sounds like a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • test_deprecated_packages -> we should indeed remove, they are no problem any more

Done

  • Test...ProviderProjectStructure - we should remove them - they bring no value and having then for few providers in here makes littel sense

Done, keep only ProjectStructureTest because ProjectStructureTest depends on it

  • ExampleCoverageTest - similarly

Removed

  • AssetsCoverageTest -> I think this is largely used by Google team. Naybe @bhirsz might tell more about it (though in any case I thik if it is useful for Google it should go to google provider tests.

Keep for a while

  • test_providers_modules_should_have_tests -> is the only one I find useful from those for now but if it's broken, it might be we are far-off and fixing it would be problematic.

Will try to fix (make it run again) as follow up

Copy link
Contributor

@dimberman dimberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see if we could write this in a way that's more generatable. Pytest has some pretty powerful features that should prevent us from having to write out our parameters one by one.

Comment on lines +56 to +79
"scheme://user:password@host%2Flocation:1234/schema?extra1=a%20value&extra2=%2Fpath%2F",
{
"conn_type": "scheme",
"host": "host/location",
"schema": "schema",
"login": "user",
"password": "password",
"port": 1234,
"extra_dejson": {"extra1": "a value", "extra2": "/path/"},
},
id="with extras",
),
pytest.param(
"scheme://user:password@host%2Flocation:1234/schema?__extra__=single+value",
{
"conn_type": "scheme",
"host": "host/location",
"schema": "schema",
"login": "user",
"password": "password",
"port": 1234,
"extra": "single value",
},
id="with extras single value",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a lot of repeat code here. Is there any way of compressing this into a generator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is any problem to generate tests cases:

Right now general idea migrate all tests to pytest it is sill remaining about 100-120 modules (most of them sits into Google Provider).

Also all code generation could contain some bugs and complicated logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always reorganize and refactor those tests later - l;et's not do both changing the type of tests and refactoring the way how they are defined in one PR

This is also a question of DRY vs DAMP - I think in many cases for tests DRY does not work that well. if you have to look at several places (inevitable with DRY) to understand what is the expected test input and output, it's worse than if you have the inputs and outputs together (and no common code that you also have to look-up). It's really nice to see all the cases with input/output clearly grouped together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we can find a good way also having similar properties and less code - that's cool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimberman - BTW. Do you have some propoal in mind ? Maybe some example that you can show how it could look like ? I would be interested to see how this could be done nicer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About DRY, right now a lot of providers tests usually do the the same things:

  • Mock Airflow Connections
  • Mock their clients
  • Create task and execute them

Most of this stuff could be achived by couple fixtures or context managers. Unfortunetly right now structure of internal framework split across tests package. My suggestion move all generic stuff into separate package module and inject them as pytest plugin.

In this case we could also test our fixtures/test context managers and helpers functions if we want.

If we talk about this part it is a bit tricky generate some stuff because within this test cases we tests our Connection model and how it transform URI -> Connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About DRY, right now a lot of providers tests usually do the the same things:

  • Mock Airflow Connections
  • Mock their clients
  • Create task and execute them

Most of this stuff could be achived by couple fixtures or context managers. Unfortunetly right now structure of internal framework split across tests package. My suggestion move all generic stuff into separate package module and inject them as pytest plugin.

Yep. It would be good if we can do some generalisation. But we have to take into account that we plan at some point of time to split providers out - possibly even to a separate repo per provider - and for sure we want to restructure the providers to become independent standalone "sub-folders" as an interim step - https://lists.apache.org/thread/3s5tn1wnvo0cw9vofwmbjl0rkyvhrtbx and there any common code might become challenging. Pytest plugins might actually help with it.

If we talk about this part it is a bit tricky generate some stuff because within this test cases we tests our Connection model and how it transform URI -> Connection.

Yeah I also find it a but challenging how to do it "dry-er" vs loosing some of hte DAMPness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've also suggest in dev-list about separate generic tests parts into package this should also help split providers into separate sub-folders and even in separate repository because in this case all of them would have access to this package (we just nee install them) and all tests configurations which live inside tests/conftest.py could be moved into it.

Of course it is in theory and I plan to create PoC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah. I missed that :( ... sorry, been a bit swamed recently :)

@potiuk
Copy link
Member

potiuk commented Jan 17, 2023

@dimberman ? What do you think?

@potiuk potiuk force-pushed the migrate-always-cli-to-pytest branch from 7e91f8f to 9730195 Compare January 17, 2023 12:20
@Taragolis Taragolis added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) type:misc/internal Changelog: Misc changes that should appear in change log labels Jan 19, 2023
@potiuk
Copy link
Member

potiuk commented Jan 19, 2023

LGTM. @dimberman -> do you still have doubts? Or should we merge this one?

@Taragolis
Copy link
Contributor Author

👀 😢 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants