Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providers facebook hook multiple account #19377

Merged

Conversation

bugraoz93
Copy link
Collaborator

closes: #19269
Following changes have made in this PR.

  • Multiple Facebook Ads account_id for Facebook Ads Hook support added.
  • To provide a better use of multiple account_id, FacebookAdsReportToGcsOperator has now supported file export for each account_id. (upload_as_account option presented to control this feature. The default value is False which mean all the account_id data will be exported to the same file)

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

bozturk added 3 commits November 2, 2021 02:50
Action Based Export Implemented
* Combined Export for all Account IDs
* An Export per Account ID
Lint Checks
Comments Added
@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Nov 3, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 3, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@potiuk potiuk closed this Nov 3, 2021
@potiuk potiuk reopened this Nov 3, 2021
@potiuk
Copy link
Member

potiuk commented Nov 3, 2021

There were some static code cancelled so I reopened it, but it looks pretty good

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Nov 3, 2021
@github-actions
Copy link

github-actions bot commented Nov 3, 2021

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@bugraoz93
Copy link
Collaborator Author

Hey @potiuk, thank you very much for your review and your comment. It is strange that static checks give errors for the flake and the black. I have run for all my changes before opening the PR. I have checked again the static code problem in the operator and rerun it. It should be fine now.

@@ -119,25 +120,30 @@ def bulk_facebook_report(
:type sleep_time: int

:return: Facebook Ads API response, converted to Facebook Ads Row objects
:rtype: List[AdsInsights]
:rtype: Dict[str, List[AdsInsights]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a breaking change. What can we do to prevent this from happening?

Copy link
Collaborator Author

@bugraoz93 bugraoz93 Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mik-laj, I did this due to we are getting the account_id from the connection extras. We can get the connection from a hook and cannot in operator as I understand from my tracing the code. On the other hand, I think that if we are supporting multiple account_id at once, we need to separate those files while we are exporting to provide distinct files (optional parameter -> upload_as_account). To do this, I need to pass the account_id or correlate it with the list of Ad Insight data.
I read from Facebook documentation, the Ad Insight data hold the account_id information, we can get from there but if we get account_id from the data most probably the data transformation part in the operator will be more complex. This would cause consuming more resources than now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not necessarily - you could easily make it backwards-compatible. You could return List in case just one account is passed, but if you send more than one, you would return a Dict. That sounds rather easy to implement and rather logical.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @potiuk, I have not thought that way. For sure, I could easily support that with a simple implementation. Since this hook is only used by FacebookAdsReportToGcsOperator, I thought that while I am changing the return type of the hook only affects this operator. For that reason, since I have changed that operator, I thought the return type could be a single type and not affect anywhere else. I will make that change to provide backward compatibility as soon as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are VERY strict about backwards compatibility. And Hooks are equally part of the "public" API of providers as operators. Airflow is a very specific project in the sense that users are supposed to write code and use our Hooks and Operators in the way they think is the best. Due to their nature operators are very limited in what they can do but Hooks provide a much better interface for the users especially if they want to build some kind of transfer between two completely different services. I'd even say that in "modern" Airflow, Hoooks are WAY more important than Operators.

Especially with the TaskFlow Api, where you can do something like that:

@task 
def trasnfer_from_fb_to_whatever():
   fbHook = FBHook()
   local_data = fbhook.read_data()
   transform_the_data( local_data)
   whatever_hook = WhateverHook()
   whatever_hook.send_data(local_data)

This is fully-fledged Python task that uses Hooks and no operators and is the way modern tasks should be written in Airflow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Your comment about returning two different data structures made me realize some of these outcomes and these use cases. I was thinking of a platform that uses operators to run a certain task, in the end, it is not just a tool (It is also a set of useful decoupled tools and libraries). Also, we are using some of those hooks in our stack. From now on, I will be more careful about backward compatibility even if that means a method return two different data structure.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed - it needs backwards compatibiltty - so i am changing it to "request changes" so that it does not get merged accidentally.

bozturk added 2 commits November 9, 2021 03:49
Adjustment on facebook_ads_to_gcs.py regarding backward compatibility
@bugraoz93
Copy link
Collaborator Author

bugraoz93 commented Nov 9, 2021

As discussed - it needs backwards compatibiltty - so i am changing it to "request changes" so that it does not get merged accidentally.

Hi @potiuk, I have made the changes regarding backward compatibility. I have made little changes in the hook and operator.
Hook:
I added a return type of Union which returns as follows.

  • If account_id is string (old format), returns List[AdsInsights]
  • If account_id is list (new format), returns Dict[str, List[AdsInsights]]
    Operator:
    I add a check for the return type of hook.
  • if List[AdsInsights] returned, the operator exports still in the same format and upload_as_account parameter useless.
  • if Dict[str, List[AdsInsights]] returned, user could choose export format with upload_as_account parameter.

@bugraoz93 bugraoz93 requested a review from potiuk November 9, 2021 02:58
@@ -98,14 +99,17 @@ def facebook_ads_config(self) -> Dict:
if missing_keys:
message = f"{missing_keys} fields are missing"
raise AirflowException(message)
# Solve a way to check in bulk_facebook_report to handle list and dict return there
if type(config["account_id"]) is not list:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if type(config["account_id"]) is not list:
if isinstance(config["account_id"], list):

This is_backward logic is nebulous to me though. Is this “backward if account_id is specified” behaviour specified in Facebook’s API documentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, the account_id was given as a string in the Facebook Ads Connection in the Airflow, with this check it is controlling rather the connection is the old version or is a new version which is an array of account_id. Also, I have updated this part regarding your suggestion.

airflow/providers/facebook/ads/hooks/ads.py Outdated Show resolved Hide resolved
Interfering hook object from a cached_property removed
airflow/providers/facebook/ads/hooks/ads.py Show resolved Hide resolved
Comment on lines 168 to 169
message = "Facebook Ads Hook returned different type than expected"
raise AirflowException(message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea to log what exactly was returned to help debugging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the message and added the type of the returned object from the hook.

Pulls data from the Facebook Ads API regarding Account ID with matching return type

.. seealso::
If the type of Account ID is a List account_id -> list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This -> syntax is a bit difficult to understand to me. It’s probably better to write this is prose, something like

The return type and value depends on the ``account_id`` configuration. If the
configuration is a str representing a single Account ID, the return value is the
list of reports for that ID. If the configuration is a list of str representing
multiple Account IDs, the return value is a dict of Account IDs and their
respective list of reports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this part does not belong in seealso, which is generally used to reference external content.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am grateful to you for your time to review too many changes and your help. I have changed the comment same as your comment since you have already described it very well.

@potiuk potiuk merged commit ed8b63b into apache:main Dec 8, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Dec 8, 2021

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers okay to merge It's ok to merge this PR as it does not require more tests provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for multiple AdAccount when getting Bulk Facebook Report from Facebook Ads Hook
4 participants