Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 🎉 Source FB Marketing: performance and reliability fixes #9805

Merged
merged 73 commits into from
Feb 17, 2022

Conversation

keu
Copy link
Contributor

@keu keu commented Jan 26, 2022

What

this PR is based on #8385

It focused on several improvements.
Stability:

  • use campaign-level insights when account-level insights start to fail.
  • retry HTTP requests on a much lower level now, avoid unhandled fails during batch and pagination.
  • respect throttle and call rate limit
  • save cursor for each slice individually, thus making progress in sync even if one job fails, eventually, we will sync that job.

Performance:

  • check the status of running jobs with a single batch request.
  • deprecated lookback_window param because it actually used in the wrong way before, and really useless. FB updates insights after up to 28 days. Make it constant and read last 28 days (not from the current cursor, but from now() datetime)
  • track individual slices in the state, we process jobs ASAP and don't wait and are stuck in a single job, probably the biggest improvement.

Other fixes:

Screenshot 2022-01-27 at 01 39 16

How

AsyncJob is responsible for syncing fixed time intervals now. I introduced a new class ParentAsyncJob to group smaller AsyncJobs. When AsyncJob fails for the first time it will try to restart, if it fails again InsightAsyncJobManager will split such job into group of smaller AsyncJobs, one job for each campaign active for a specific interval. We use 28day window to fetch IDs for these campaigns. The fail situation is quite rare, so this will degrade performance just a bit, but will improve reliability, it potentially could split further and use AdSets or even Ads to fetch data in smaller batches.
Especially it is important for date intervals bigger then 1 day (there are feature requests already for this)

Recommended reading order

  1. streams/*
  2. source.py
  3. spec.py
  4. tests

🚨 User Impact 🚨

We deprecate day_per_job parameter, as it is not needed after the introduction of the split algorithm.
We deprecate lookback_window parameter because it was used in the wrong way before, and FB has a constant value for it.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Connector Generator

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@keu keu temporarily deployed to more-secrets February 12, 2022 01:50 Inactive
@keu
Copy link
Contributor Author

keu commented Feb 12, 2022

/test connector=connectors/source-facebook-marketing

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1832417116
❌ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1832417116
🐛 https://gradle.com/s/iypgu5ude5lqa
Python short test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestSpec::test_match_expected[inputs0] - AssertionError:...
FAILED test_incremental.py::TestIncremental::test_two_sequential_reads[inputs0]
=================== 2 failed, 20 passed in 521.00s (0:08:40) ===================

@keu keu temporarily deployed to more-secrets February 12, 2022 02:00 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 12, 2022 02:01 Inactive
@keu keu temporarily deployed to more-secrets February 13, 2022 21:56 Inactive
@keu
Copy link
Contributor Author

keu commented Feb 13, 2022

/test connector=connectors/source-facebook-marketing

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1838093837
❌ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1838093837
🐛 https://gradle.com/s/utg5ortl7m64w
Python short test summary info:

=========================== short test summary info ============================
FAILED test_incremental.py::TestIncremental::test_two_sequential_reads[inputs0]
=================== 1 failed, 21 passed in 434.52s (0:07:14) ===================

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 13, 2022 21:59 Inactive
@keu
Copy link
Contributor Author

keu commented Feb 13, 2022

/test connector=connectors/source-facebook-marketing

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1838093837
❌ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1838093837
🐛 https://gradle.com/s/utg5ortl7m64w
Python short test summary info:

=========================== short test summary info ============================
FAILED test_incremental.py::TestIncremental::test_two_sequential_reads[inputs0]
=================== 1 failed, 21 passed in 434.52s (0:07:14) ===================

incremental behaviour of Images is broken until #9746

"""Connection check to validate that the user-provided config can be used to connect to the underlying API

:param config: the user-input config object conforming to the connector's spec.json
:param logger: logger object
:param _logger: logger object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we don't use it anywhere. Shouldn't we also update line 46?

@keu keu temporarily deployed to more-secrets February 16, 2022 20:24 Inactive
@keu
Copy link
Contributor Author

keu commented Feb 16, 2022

/publish connector=source-facebook-marketing

❌ source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1855080691

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 16, 2022 20:33 Inactive
@keu
Copy link
Contributor Author

keu commented Feb 17, 2022

/publish connector=connectors/source-facebook-marketing

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1856646105
✅ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1856646105

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 17, 2022 03:52 Inactive
…rmance-8282

# Conflicts:
#	airbyte-config/init/src/main/resources/config/STANDARD_SOURCE_DEFINITION/e7778cfc-e97c-4458-9ecb-b4f2bba8946c.json
@keu keu temporarily deployed to more-secrets February 17, 2022 04:35 Inactive
@keu keu merged commit a3aae80 into master Feb 17, 2022
@keu keu temporarily deployed to more-secrets February 17, 2022 04:35 Inactive
@keu keu deleted the keu/source-fb-performance-8282 branch February 17, 2022 04:35
@vladimir-remar
Copy link
Contributor

@keu the possibility to set up an attribution window was really healpful for us. We understand the reason why you made it a constant, but our use case is another one. We are only using some data that will never change regardless of the attribution window (e.g., spend) and not using any of the data that could change (e.g., conversions). For this reason, incremental synch with a window of 3 days was more than enough for us. With this new approach, we are forced to retrieve a lot of data we don't need and with the constrains of the FB API, this connection takes 5 hours to run every day, while with 3 days attribution window would be only 20-30 mins. Would you be open to add a variable incremental_lookback_window with default 28 days? It would help our use case. I can do the PR.

@sherifnada
Copy link
Contributor

@vladimir-remar could you propose a draft PR for what the spec would look like?

@vladimir-remar
Copy link
Contributor

@vladimir-remar could you propose a draft PR for what the spec would look like?

@sherifnada Thanks for the answer and I will do it ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Source FB Marketing: improve insights jobs reliability & runtime
8 participants