-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Providers facebook hook multiple account #19377
Providers facebook hook multiple account #19377
Conversation
Action Based Export Implemented * Combined Export for all Account IDs * An Export per Account ID
Lint Checks Comments Added
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
There were some static code cancelled so I reopened it, but it looks pretty good |
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
Hey @potiuk, thank you very much for your review and your comment. It is strange that static checks give errors for the flake and the black. I have run for all my changes before opening the PR. I have checked again the static code problem in the operator and rerun it. It should be fine now. |
@@ -119,25 +120,30 @@ def bulk_facebook_report( | |||
:type sleep_time: int | |||
|
|||
:return: Facebook Ads API response, converted to Facebook Ads Row objects | |||
:rtype: List[AdsInsights] | |||
:rtype: Dict[str, List[AdsInsights]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a breaking change. What can we do to prevent this from happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mik-laj, I did this due to we are getting the account_id from the connection extras. We can get the connection from a hook and cannot in operator as I understand from my tracing the code. On the other hand, I think that if we are supporting multiple account_id at once, we need to separate those files while we are exporting to provide distinct files (optional parameter -> upload_as_account). To do this, I need to pass the account_id or correlate it with the list of Ad Insight data.
I read from Facebook documentation, the Ad Insight data hold the account_id information, we can get from there but if we get account_id from the data most probably the data transformation part in the operator will be more complex. This would cause consuming more resources than now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not necessarily - you could easily make it backwards-compatible. You could return List in case just one account is passed, but if you send more than one, you would return a Dict. That sounds rather easy to implement and rather logical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @potiuk, I have not thought that way. For sure, I could easily support that with a simple implementation. Since this hook is only used by FacebookAdsReportToGcsOperator, I thought that while I am changing the return type of the hook only affects this operator. For that reason, since I have changed that operator, I thought the return type could be a single type and not affect anywhere else. I will make that change to provide backward compatibility as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are VERY strict about backwards compatibility. And Hooks are equally part of the "public" API of providers as operators. Airflow is a very specific project in the sense that users are supposed to write code and use our Hooks and Operators in the way they think is the best. Due to their nature operators are very limited in what they can do but Hooks provide a much better interface for the users especially if they want to build some kind of transfer between two completely different services. I'd even say that in "modern" Airflow, Hoooks are WAY more important than Operators.
Especially with the TaskFlow Api, where you can do something like that:
@task
def trasnfer_from_fb_to_whatever():
fbHook = FBHook()
local_data = fbhook.read_data()
transform_the_data( local_data)
whatever_hook = WhateverHook()
whatever_hook.send_data(local_data)
This is fully-fledged Python task that uses Hooks and no operators and is the way modern tasks should be written in Airflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Your comment about returning two different data structures made me realize some of these outcomes and these use cases. I was thinking of a platform that uses operators to run a certain task, in the end, it is not just a tool (It is also a set of useful decoupled tools and libraries). Also, we are using some of those hooks in our stack. From now on, I will be more careful about backward compatibility even if that means a method return two different data structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed - it needs backwards compatibiltty - so i am changing it to "request changes" so that it does not get merged accidentally.
Adjustment on facebook_ads_to_gcs.py regarding backward compatibility
Hi @potiuk, I have made the changes regarding backward compatibility. I have made little changes in the hook and operator.
|
@@ -98,14 +99,17 @@ def facebook_ads_config(self) -> Dict: | |||
if missing_keys: | |||
message = f"{missing_keys} fields are missing" | |||
raise AirflowException(message) | |||
# Solve a way to check in bulk_facebook_report to handle list and dict return there | |||
if type(config["account_id"]) is not list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if type(config["account_id"]) is not list: | |
if isinstance(config["account_id"], list): |
This is_backward
logic is nebulous to me though. Is this “backward if account_id
is specified” behaviour specified in Facebook’s API documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, the account_id
was given as a string in the Facebook Ads Connection in the Airflow, with this check it is controlling rather the connection is the old version or is a new version which is an array of account_id
. Also, I have updated this part regarding your suggestion.
Interfering hook object from a cached_property removed
message = "Facebook Ads Hook returned different type than expected" | ||
raise AirflowException(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good idea to log what exactly was returned to help debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the message and added the type of the returned object from the hook.
airflow/providers/google/cloud/transfers/facebook_ads_to_gcs.py
Outdated
Show resolved
Hide resolved
Operator exception messages updated to give more detailed explanation
Pulls data from the Facebook Ads API regarding Account ID with matching return type | ||
|
||
.. seealso:: | ||
If the type of Account ID is a List account_id -> list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ->
syntax is a bit difficult to understand to me. It’s probably better to write this is prose, something like
The return type and value depends on the ``account_id`` configuration. If the
configuration is a str representing a single Account ID, the return value is the
list of reports for that ID. If the configuration is a list of str representing
multiple Account IDs, the return value is a dict of Account IDs and their
respective list of reports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this part does not belong in seealso
, which is generally used to reference external content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am grateful to you for your time to review too many changes and your help. I have changed the comment same as your comment since you have already described it very well.
Awesome work, congrats on your first merged pull request! |
closes: #19269
Following changes have made in this PR.
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.