Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding MSGraphOperator in Microsoft Azure provider #38111

Merged
merged 172 commits into from
Apr 14, 2024

Conversation

dabla
Copy link
Contributor

@dabla dabla commented Mar 13, 2024

As already discussed before I did a proposition to add the MS Graph Operator. After discussion on the airflow dev list we came to an agreement that the operator could be added as part of the Microsoft Azure provider. This is my initial commit so any suggestions are welcome, I still need to add examples and propably need to update some docstrings but this PR can already give an idea of what will be added. I already created a PR so I can see what comes out of the Airflow QA.

In the meantime, our article on how we use the operator at Infrabel has been published on the Apache Airflow medium:

https://medium.com/apache-airflow/optimizing-integration-with-the-ms-graph-api-and-power-bi-with-the-msgraphasyncoperator-in-airflow-d1071f7c1b62


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

David Blain and others added 27 commits March 13, 2024 17:19
…w connection and request adapter + make multiple patches into one context manager Python 3.8 compatible
@potiuk
Copy link
Member

potiuk commented Apr 12, 2024

You should remove your fix and rebase - in change you applied you removed the markers on tests but it needs pytestmark on the module level instead,

@dabla
Copy link
Contributor Author

dabla commented Apr 12, 2024

You should remove your fix and rebase - in change you applied you removed the markers on tests but it needs pytestmark on the module level instead,

So I need to do like this then? Not completely understanding what you meant (I reverted my fix containing the db markers on the test-methods):

@pytest.mark.db_test
class TestMSGraphAsyncOperator(Base):
    def test_execute(self):
        ...

@potiuk
Copy link
Member

potiuk commented Apr 12, 2024

See how it it's done in main (and you rebased to it) pytestmark is assigned at the whole module level- so all those tests are nowdb_tests` - your previous attempt removed all of the existing markers makiung them more susceptible to pytest-xdist/async bug we seem to be hitting

@dabla
Copy link
Contributor Author

dabla commented Apr 12, 2024

See how it it's done in main (and you rebased to it) pytestmark is assigned at the whole module level- so all those tests are nowdb_tests` - your previous attempt removed all of the existing markers makiung them more susceptible to pytest-xdist/async bug we seem to be hitting

Ok then I don't understand why sometimes it's still failing, it's not alway the case as on some python version it succeeds but on others not

@potiuk
Copy link
Member

potiuk commented Apr 12, 2024

Those are new tests you addded that are now failing, It's a different error you get:

You also need to mark the test as db_test per instruction:

FAILED tests/providers/microsoft/azure/operators/test_msgraph.py::TestMSGraphAsyncOperator::test_execute - airflow.exceptions.AirflowInternalRuntimeError: Your test accessed the DB but _AIRFLOW_SKIP_DB_TESTS is set.
Either make sure your test does not use database or mark the test with @pytest.mark.db_test
See https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#best-practices-for-db-tests on how to deal with it and consult examples.

Now why they are not failing in 3.10+ I am not sure - maybe your tests follow different path on those python versions without trying to access DB ?

@dabla
Copy link
Contributor Author

dabla commented Apr 12, 2024

Those are new tests you addded that are now failing, It's a different error you get:

You also need to mark the test as db_test per instruction:

FAILED tests/providers/microsoft/azure/operators/test_msgraph.py::TestMSGraphAsyncOperator::test_execute - airflow.exceptions.AirflowInternalRuntimeError: Your test accessed the DB but _AIRFLOW_SKIP_DB_TESTS is set.
Either make sure your test does not use database or mark the test with @pytest.mark.db_test
See https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#best-practices-for-db-tests on how to deal with it and consult examples.

Now why they are not failing in 3.10+ I am not sure - maybe your tests follow different path on those python versions without trying to access DB ?

Hmm those tests are not new, the only new one is the TestMSGraphSensor and that one doesn't fail apparently. Weird...

@potiuk
Copy link
Member

potiuk commented Apr 12, 2024

some main errrs fixed already - you will need to rebase

@potiuk
Copy link
Member

potiuk commented Apr 14, 2024

The docker example error workarounded in main. Merging

@potiuk potiuk merged commit 1c9a660 into apache:main Apr 14, 2024
91 of 92 checks passed
@dabla
Copy link
Contributor Author

dabla commented Apr 15, 2024

Thank you @potiuk :)

utkarsharma2 pushed a commit to astronomer/airflow that referenced this pull request Apr 22, 2024
* refactor: Initial commit contains the new MSGraphOperator

* refactor: Extracted common method into Base class for patching airflow connection and request adapter + make multiple patches into one context manager Python 3.8 compatible

* refactor: Refactored some typing issues related to msgraph

* refactor: Added some docstrings and fixed additional typing issues

* refactor: Fixed more static checks

* refactor: Added license on top of test serializer and fixed import

* Revert "refactor: Added license on top of test serializer and fixed import"

This reverts commit 04d6b85.

* refactor: Added license on top of serializer files and fixed additional static checks

* refactor: Added new line at end of json test files

* refactor: Try fixing docstrings on operator and serializer

* refactor: Replaced NoneType with None

* refactor: Made type unions Python 3.8 compatible

* refactor: Reformatted some files to comply with static checks formatting

* refactor: Reformatted base to comply with static checks formatting

* refactor: Added msgraph-core dependency to provider.yaml

* refactor: Added msgraph integration info to provider.yaml

* refactor: Added init in resources

* fix: Fixed typing of response_handler

* refactor: Added assertions on conn_id, tenant_id, client_id and client_secret

* refactor: Fixed some static checks

* Revert "refactor: Added assertions on conn_id, tenant_id, client_id and client_secret"

This reverts commit 88aa7dc.

* refactor: Changed imports in hook as we don't use mockito anymore we don't need the module before constructor

* refactor: Renamed test methods

* refactor: Replace List type with list

* refactor: Moved docstring as one line

* refactor: Fixed typing for tests and added test for response_handler

* refactor: Refactored tests

* fix: Fixed MS Graph logo filename

* refactor: Fixed additional static checks remarks

* refactor: Added white line in type checking block

* refactor: Added msgraph-core dependency to provider_dependencies.json

* refactor: Updated docstring on response handler

* refactor: Moved ResponseHandler and Serializer to triggers module

* docs: Added documentation on how to use the MSGraphAsyncOperator

* docs: Fixed END tag in examples

* refactor: Removed docstring from CallableResponseHandler

* refactor: Ignore UP031 Use format specifiers instead of percent format as this is not possible here the way the DAG is evaluated in Airflow (due to XCom's)

* Revert "refactor: Removed docstring from CallableResponseHandler"

This reverts commit 6a14ebe.

* refactor: Simplified docstring on CallableResponseHandler

* refactor: Updated provider.yaml to add reference of msgraph to how-to-guide

* refactor: Updated docstrings on operator and trigger

* refactor: Fixed additional static checks

* refactor: Ignore UP031 Use format specifiers instead of percent format as this is not possible here the way the DAG is evaluated in Airflow (due to XCom's)

* refactor: Added param to docstring ResponseHandler

* refactor: Updated pyproject.toml as main

* refactor: Reformatted docstrings in trigger

* refactor: Removed unused serialization module

* fix: Fixed execution of consecutive tasks in execute_operator method

* refactor: Added customizable pagination_function parameter to Operator and made operator PowerBI compatible

* refactor: Reformatted operator and trigger

* refactor: Added check if query_parameters is not None

* refactor: Removed typing of top and odata_count

* refactor: Ignore type for tenant_id (this is an issue in the ClientSecretCredential class)

* refactor: Changed docstring on MSGraphTrigger

* refactor: Changed docstring on MSGraphTrigger

* refactor: Added docstring to handle_response_async method

* refactor: Fixed docstring to imperative for handle_response_async method

* refactor: Try quoting Sharepoint so it doesn't get spell checked

* refactor: Try double quoting Sharepoint so it doesn't get spell checked

* refactor: Always get a new event loop and close it after test is done

* refactor: Reordered imports from contextlib

* refactor: Added Sharepoint to spelling_wordlist.txt

* refactor: Removed connection-type for KiotaRequestAdapterHook

* refactor: Refactored encoded_query_parameters

* refactor: Suppress ImportError

* refactor: Added return type to paginate method

* refactor: Updated paging_function type in MSGraphAsyncOperator

* refactor: Pass the method name from method reference instead of hard coded string which is re-factor friendly

* refactor: Changed return type of paginate method

* refactor: Added MSGraphSensor which easily allows us to poll PowerBI statuses

* refactor: Moved BytesIO and Context to type checking block for MSGraphSensor

* refactor: Added noqa check on pull_execute_complete method of MSGraphOperator

* fix: Fixed test_serialize of TestMSGraphTrigger

* refactor: Added docstring to MSGraphSensor and updated the docstring of the MSGraphAsyncOperator

* refactor: Reformatted docstring of MSGraphSensor

* refactor: Added white line at end of status.json file to keep static check happy

* refactor: Removed timeout parameter from constructor MSGraphSensor as it is already defined in the BaseSensorOperator

* fix: Added missing return for async_poke in MSGraphSensor

* Revert "refactor: Added noqa check on pull_execute_complete method of MSGraphOperator"

This reverts commit ca6f92c.

* refactor: Reorganised imports on MSGraphSensor

* refactor: Reformatted TestMSGraphSensor

* refactor: Added MSGraph sensor integration name in provider.yaml

* refactor: Updated apache-airflow version to at least 2.7.0 in provider.yaml of microsoft-azure provider

* refactor: Exclude microsoft-azure from compatibility check with airflow 2.6.0 as version 2.7.0 will at least be required

* refactor: Also updated the apache-airflow dependency version from 2.6.0 to 2.7.0 for microsoft-azure provider in provider_dependencies.json

* refactor: Reformatted global_constants.py

* refactor: Add logging statements for proxies and authority related stuff

* fix: Fixed exclusion of microsoft.azure dependency in global_constants.py

* refactor: Some Azure related imports should be ignored when running Airflow 2.6.0 or lower

* refactor: Import of ADLSListOperator should be ignored when running Airflow 2.6.0 or lower

* refactor: Moved optional provider imports that should be ignored when running Airflow 2.6.0 or lower at top of file

* refactor: Fixed the event loop closed issue when executing long running tests on the MSGraphOperator

* refactor: Extracted reusable mock_context method

* refactor: Moved import of Session into type checking block

* refactor: Updated the TestMSGraphSensor

* refactor: Reformatted the mock_context method

* refactor: Try implementing cached connections on MSGraphTrigger

* docs: Added example for the MSGraphSensor and additional examples on how you can use the operator for PowerBI

* Revert "refactor: Try implementing cached connections on MSGraphTrigger"

This reverts commit 693975e.

* fix: Fixed serialization of event payload as xcom_value for the MSGraphSensor

* refactor: TestMSGraphAsyncOperator should be allowed to run as a db test

* Revert "refactor: TestMSGraphAsyncOperator should be allowed to run as a db test"

This reverts commit c7a06db.

* refactor: TestMSGraphAsyncOperator should be allowed to run as a db test

* refactor: Also added result_processor to MSGraphSensor

* refactor: Fixed template_fields in operator, trigger and sensor

---------

Co-authored-by: David Blain <david.blain@infrabel.be>
@utkarsharma2 utkarsharma2 added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:microsoft-azure Azure-related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants