Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Source Facebook: migrate to CDK #3743

Merged
merged 42 commits into from
Jun 22, 2021
Merged

🎉 Source Facebook: migrate to CDK #3743

merged 42 commits into from
Jun 22, 2021

Conversation

keu
Copy link
Contributor

@keu keu commented May 30, 2021

What

closes #3525

This is quite old PR, because effectively fix all issues and test the fixes was difficult I decided to migrate to CDK and SAT at the same time as fixing issues above (#3525).
The PR contains:

  • Improve error handling
  • Improve async job performance (insights)
  • Add new configuration parameter insights_days_per_job
  • Rename stream adsets to ad_sets
  • Refactor schema logic for insights, allowing to configure any possible insight stream

How

Describe the solution

Pre-merge Checklist

  • Run integration tests
  • Publish Docker images

Recommended reading order

  1. test.java
  2. component.ts
  3. the rest

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Jun 19, 2021
@keu
Copy link
Contributor Author

keu commented Jun 19, 2021

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/953366514
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/953366514

@keu
Copy link
Contributor Author

keu commented Jun 20, 2021

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/953407890
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/953407890

@keu
Copy link
Contributor Author

keu commented Jun 21, 2021

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958167703
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958167703

@keu
Copy link
Contributor Author

keu commented Jun 21, 2021

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958196504
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958196504

@keu
Copy link
Contributor Author

keu commented Jun 21, 2021

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958477701
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/958477701

@keu keu force-pushed the keu/facebook-call-rate-v2 branch from 11912b0 to 9439e6d Compare June 22, 2021 00:36
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some questions but mostly looks good.

Taking a step back this feels like a good opportunity to explore an async read pattern in the CDK e.g: having an AsyncStream class where read_records is async. Definitely out of scope here but this might be a candidate for implementation when we get to it.

config = ConnectorConfig.parse_obj(config) # FIXME: this will be not need after we fix CDK
api = API(account_id=config.account_id, access_token=config.access_token)

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the whole code block supposed to be inside try?

}

@backoff_policy
def _get_insights(self, params) -> AdReportRun:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename it to _create_insights_job or something that indicates it is creating a job?

date_ranges = list(self._date_ranges(stream_state=stream_state))

# accumulate MAX_ASYNC_JOBS jobs in the buffer to schedule them all before trying to wait
for params in date_ranges[: self.MAX_ASYNC_JOBS]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we process MAX_ASYNC_JOBS at a time? The current impl seems to process MAX_ASYNC_JOBS the first time then creates as many jobs as is left which may be greater than the max

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we yield each job we wouldn't advance to next job before we read the result of the previous, so each yield is a wait for a job

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this launches MAX_ASYNC_JOBS in parallel, waits for all of them to complete, then runs jobs one-by-one after that?

if pendulum.parse(obj[self.cursor_field]) >= min_cursor:
yield obj.export_all_data()

def stream_slices(self, stream_state: Mapping[str, Any] = None, **kwargs) -> Iterable[Optional[Mapping[str, Any]]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is retry functionality working correctly? We never retry a job if it fails. is that the desired behavior?

Copy link
Contributor Author

@keu keu Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually, it is a very good reason for a job to fail. At least the response doesn't contain any useful information about the reason, in my practice failed job appears only when something is wrong with the query itself (fields or breakdowns not supported), so here we only retry checking of the status of the job. We save states for every async job, so if the next job fails we have a checkpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developers.facebook.com/docs/marketing-api/insights/best-practices/
from the docs

Job Failed | Job has failed. Review your query and try again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Facebook is leading by counter-example on how to give a good error message

@keu keu requested a review from sherifnada June 22, 2021 07:18
@keu
Copy link
Contributor Author

keu commented Jun 22, 2021

/test connector=source-facebook-marketing

🕑 source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/959843489
❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/959843489

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once build errors are fixed

@keu
Copy link
Contributor Author

keu commented Jun 22, 2021

/publish connector=connectors/source-facebook-marketing

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/962341974
✅ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/962341974

@keu keu merged commit 2f7c15a into master Jun 22, 2021
@keu keu deleted the keu/facebook-call-rate-v2 branch June 22, 2021 22:48
@keu keu changed the title Source Facebook: migrate to CDK :tada Source Facebook: migrate to CDK Jun 22, 2021
@keu keu changed the title :tada Source Facebook: migrate to CDK 🎉 Source Facebook: migrate to CDK Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Source Facebook: rate limit not always handled
5 participants