🎉 Source Facebook: migrate to CDK #3743

keu · 2021-05-30T02:19:35Z

What

This is quite old PR, because effectively fix all issues and test the fixes was difficult I decided to migrate to CDK and SAT at the same time as fixing issues above (#3525).
The PR contains:

Improve error handling
Improve async job performance (insights)
Add new configuration parameter insights_days_per_job
Rename stream adsets to ad_sets
Refactor schema logic for insights, allowing to configure any possible insight stream

How

Describe the solution

Pre-merge Checklist

Run integration tests
Publish Docker images

Recommended reading order

test.java
component.ts
the rest

…keys=True).

…sted.

…rsions" fields and add other available fields that have been tested and confirmed to work.

…ate-v2

keu · 2021-06-19T23:35:12Z

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/953366514
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/953366514

keu · 2021-06-20T00:05:27Z

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/953407890
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/953407890

keu · 2021-06-21T18:58:30Z

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958167703
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958167703

keu · 2021-06-21T19:07:40Z

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958196504
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958196504

keu · 2021-06-21T20:59:42Z

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958477701
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958477701

sherifnada

some questions but mostly looks good.

Taking a step back this feels like a good opportunity to explore an async read pattern in the CDK e.g: having an AsyncStream class where read_records is async. Definitely out of scope here but this might be a candidate for implementation when we get to it.

sherifnada · 2021-06-22T05:36:46Z

airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/source.py

+        config = ConnectorConfig.parse_obj(config)  # FIXME: this will be not need after we fix CDK
+        api = API(account_id=config.account_id, access_token=config.access_token)
+
+        try:


is the whole code block supposed to be inside try?

sherifnada · 2021-06-22T05:49:21Z

airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/streams.py

+            }
+
+    @backoff_policy
+    def _get_insights(self, params) -> AdReportRun:


can we rename it to _create_insights_job or something that indicates it is creating a job?

sherifnada · 2021-06-22T05:50:29Z

airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/streams.py

+        date_ranges = list(self._date_ranges(stream_state=stream_state))
+
+        # accumulate MAX_ASYNC_JOBS jobs in the buffer to schedule them all before trying to wait
+        for params in date_ranges[: self.MAX_ASYNC_JOBS]:


shouldn't we process MAX_ASYNC_JOBS at a time? The current impl seems to process MAX_ASYNC_JOBS the first time then creates as many jobs as is left which may be greater than the max

because we yield each job we wouldn't advance to next job before we read the result of the previous, so each yield is a wait for a job

so this launches MAX_ASYNC_JOBS in parallel, waits for all of them to complete, then runs jobs one-by-one after that?

sherifnada · 2021-06-22T05:53:14Z

airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/streams.py

+            if pendulum.parse(obj[self.cursor_field]) >= min_cursor:
+                yield obj.export_all_data()
+
+    def stream_slices(self, stream_state: Mapping[str, Any] = None, **kwargs) -> Iterable[Optional[Mapping[str, Any]]]:


is retry functionality working correctly? We never retry a job if it fails. is that the desired behavior?

usually, it is a very good reason for a job to fail. At least the response doesn't contain any useful information about the reason, in my practice failed job appears only when something is wrong with the query itself (fields or breakdowns not supported), so here we only retry checking of the status of the job. We save states for every async job, so if the next job fails we have a checkpoint.

https://developers.facebook.com/docs/marketing-api/insights/best-practices/
from the docs

Job Failed | Job has failed. Review your query and try again.

Facebook is leading by counter-example on how to give a good error message

keu · 2021-06-22T07:52:08Z

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/959843489
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/959843489

sherifnada

LGTM once build errors are fixed

…ate-v2

keu · 2021-06-22T22:34:14Z

/publish connector=connectors/source-facebook-marketing

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/962341974
✅ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/962341974

zestyping and others added 13 commits May 27, 2021 00:20

Add all the missing fields except AdAsset and AdInsightsResult fields.

7ffa945

Sort and format ad_insights.json with json.dumps(..., indent=2, sort_…

c202e21

…keys=True).

Remove fields from ads_insights.json that have not been externally te…

5c61eaf

…sted.

Facebook source: Remove the offending "unique_conversions" and "conve…

69b1fb9

…rsions" fields and add other available fields that have been tested and confirmed to work.

Update all the other ads_insights schemas for the various breakdowns.

4d19cb2

format

b9cea63

bump version, update changelog

590f52c

move to CDK implementation

c98a6b0

proper spec dump

26e49f5

fix read_records and api calls

2bec98f

fix call rate handling

f9011da

format

8d4712c

fix insight breakdowns

cdcd72d

auto-assign bot requested review from cgardens and davinchia May 30, 2021 02:19

keu mentioned this pull request May 30, 2021

Add the new Facebook API v10.0 fields to the ads_insights schema. #3646 #3693

Merged

2 tasks

keu linked an issue May 31, 2021 that may be closed by this pull request

Source Facebook: rate limit not always handled #3525

Closed

keu removed request for cgardens and davinchia June 1, 2021 18:14

eugene-kulak added 11 commits June 8, 2021 09:34

temp

5f0f3e6

tmp

b270f29

Merge remote-tracking branch 'origin/master' into keu/facebook-call-r…

b1a610b

…ate-v2

Merge remote-tracking branch 'origin/master' into keu/facebook-call-r…

b4b34d3

…ate-v2

Merge remote-tracking branch 'origin/master' into keu/facebook-call-r…

cca7def

…ate-v2

tmp

5f8edd3

make insights streams configurable

c384190

improve Async implementation using slices

ce11d1a

fix tests

67b3568

fix tests

6467f12

fix tests and reading

f9d7729

update docs and move changelog

a3cb28f

github-actions bot added the area/documentation Improvements or additions to documentation label Jun 19, 2021

fix date_range

77c003f

fix state handling, filter out records that were not updated (insights)

b4f2c8f

typos and SAT install

9439e6d

keu force-pushed the keu/facebook-call-rate-v2 branch from 11912b0 to 9439e6d Compare June 22, 2021 00:36

sherifnada suggested changes Jun 22, 2021

View reviewed changes

keu requested a review from sherifnada June 22, 2021 07:18

apply suggestions from @sherifnada

403afd2

sherifnada approved these changes Jun 22, 2021

View reviewed changes

eugene-kulak added 6 commits June 22, 2021 22:15

improve async jobs handling even more

9f3d535

Merge remote-tracking branch 'origin/master' into keu/facebook-call-r…

ba9c575

…ate-v2

fix dependencies

025daea

fix filtration logic and improve logging

31a3645

fix transient error with creatives stream

9774108

bump version

499e849

keu merged commit 2f7c15a into master Jun 22, 2021

keu deleted the keu/facebook-call-rate-v2 branch June 22, 2021 22:48

keu changed the title ~~Source Facebook: migrate to CDK~~ :tada Source Facebook: migrate to CDK Jun 22, 2021

keu changed the title ~~:tada Source Facebook: migrate to CDK~~ 🎉 Source Facebook: migrate to CDK Jun 22, 2021

mlavoie-sm360 mentioned this pull request Nov 8, 2021

[Source FB Marketing] Concurrent synchronization #1671

Closed

karinakuz added connectors/sources-api connectors/source/facebook-marketing labels Jan 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎉 Source Facebook: migrate to CDK #3743

🎉 Source Facebook: migrate to CDK #3743

keu commented May 30, 2021 •

edited

Loading

keu commented Jun 19, 2021 •

edited by github-actions bot

Loading

keu commented Jun 20, 2021 •

edited by github-actions bot

Loading

keu commented Jun 21, 2021 •

edited by github-actions bot

Loading

keu commented Jun 21, 2021 •

edited by github-actions bot

Loading

keu commented Jun 21, 2021 •

edited by github-actions bot

Loading

sherifnada left a comment

sherifnada Jun 22, 2021

sherifnada Jun 22, 2021

sherifnada Jun 22, 2021

keu Jun 22, 2021

sherifnada Jun 22, 2021

sherifnada Jun 22, 2021

keu Jun 22, 2021 •

edited

Loading

keu Jun 22, 2021

sherifnada Jun 22, 2021

keu commented Jun 22, 2021 •

edited by github-actions bot

Loading

sherifnada left a comment

keu commented Jun 22, 2021 •

edited by github-actions bot

Loading

🎉 Source Facebook: migrate to CDK #3743

🎉 Source Facebook: migrate to CDK #3743

Conversation

keu commented May 30, 2021 • edited Loading

What

How

Pre-merge Checklist

Recommended reading order

keu commented Jun 19, 2021 • edited by github-actions bot Loading

keu commented Jun 20, 2021 • edited by github-actions bot Loading

keu commented Jun 21, 2021 • edited by github-actions bot Loading

keu commented Jun 21, 2021 • edited by github-actions bot Loading

keu commented Jun 21, 2021 • edited by github-actions bot Loading

sherifnada left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keu Jun 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keu commented Jun 22, 2021 • edited by github-actions bot Loading

sherifnada left a comment

Choose a reason for hiding this comment

keu commented Jun 22, 2021 • edited by github-actions bot Loading

keu commented May 30, 2021 •

edited

Loading

keu commented Jun 19, 2021 •

edited by github-actions bot

Loading

keu commented Jun 20, 2021 •

edited by github-actions bot

Loading

keu commented Jun 21, 2021 •

edited by github-actions bot

Loading

keu commented Jun 21, 2021 •

edited by github-actions bot

Loading

keu commented Jun 21, 2021 •

edited by github-actions bot

Loading

keu Jun 22, 2021 •

edited

Loading

keu commented Jun 22, 2021 •

edited by github-actions bot

Loading

keu commented Jun 22, 2021 •

edited by github-actions bot

Loading