Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roll out AvailabilityStrategy - slowly for GA connectors, as default for others #20401

Closed
erohmensing opened this issue Dec 12, 2022 · 3 comments · Fixed by #21924
Closed

Roll out AvailabilityStrategy - slowly for GA connectors, as default for others #20401

erohmensing opened this issue Dec 12, 2022 · 3 comments · Fixed by #21924
Assignees

Comments

@erohmensing
Copy link
Contributor

erohmensing commented Dec 12, 2022

After having implemented #19977, in order to better handle 403 errors that cause stream failures to turn into sync failures, we should make sure that GA connectors have the latest CDK installed with the default HttpAvailabilityStrategy available to streams that inherit the HttpStream class. This is an important step that we can do before #17853, which is likely to take more time to implement.

P0: GA connectors that have given us problems in the past: Grab from this document

P1. Other GA connectors

P2. Beta/Alpha connectors

Open questions

  • if/when do we want to enforce that connectors have this dependency updated? Shouldn't be an issue in the future, but for now we want to get the default behavior in. When moving to GA? When moving to beta?
    • Idea - maybe must have the default HttpAvailabilityStrategy for beta, must implement own AvailabilityStrategy for GA?
  • Most of these will have build fails due to low strictness. How do we handle this, especially if potentially bumping the CDK by a lot? (depends on both dependency and when the source was built last - docker run pip list)
@erohmensing erohmensing changed the title Update dependency of connectors with HttpStreams to ~0.13 for HttpAvailabilityStrategy Update CDK dependency of connectors with HttpStreams to ~0.13 for HttpAvailabilityStrategy Dec 12, 2022
@evantahler
Copy link
Contributor

evantahler commented Dec 13, 2022

Grooming:

  • Publish 1 Beta connector to test that everything works
  • Publish all the GA connectors with the new CDK after this work is complete (8 from 2022 from Metabase)
  • leave the rest of the connectors unpublished.
  • document the new features and socialize with GL

@erohmensing
Copy link
Contributor Author

Notes on rollout:

The edge case we weren't handling but now are appeared due to issues in pinterest and github's stream_slices methods for substreams. While I can fix those specifically before enabling the default for those sources, I do think it makes sense to turn things on gradually in case there are more strange edge cases we need to address.

Here's what I'm thinking for rollout:

  • In the initial PR that introduces AvailabilityStrategy, don't set a default AvailabilityStrategy for HttpStreams.
  • In a separate PR, enable the default for HttpStreams and set the default for GA connectors (and Gitlab, since we know it is already not going to work due to per-record permission issues) to None . Create issues to implement an AvailabilityStrategy (default or otherwise) for GA connectors. GL already has doing that for the 10 most impacted connectors in their Q1a plans, so those get worked on first.
  • For beta and alpha connectors other than gitlab, they will get the default next time they're published - if there are issues, address them as they come (I'll be on call next week - convenient!)

@erohmensing erohmensing changed the title Update CDK dependency of connectors with HttpStreams to ~0.13 for HttpAvailabilityStrategy Re-roll out AvailabilityStrategy - slowly for GA connectors, as default for others Jan 13, 2023
@erohmensing erohmensing changed the title Re-roll out AvailabilityStrategy - slowly for GA connectors, as default for others Roll out AvailabilityStrategy - slowly for GA connectors, as default for others Jan 13, 2023
@erohmensing
Copy link
Contributor Author

erohmensing commented Jan 23, 2023

Sources with setup.py files (python sources) that are GA:

source-amazon-ads
source-amplitude
source-facebook-marketing
source-freshdesk
source-github
source-google-analytics-v4
source-google-search-console
source-harvest
source-hubspot
source-intercom
source-iterable
source-klaviyo
source-linkedin-ads
source-mailchimp
source-marketo
source-mixpanel
source-notion
source-paypal-transaction
source-pinterest
source-recharge
source-salesforce
source-sentry
source-slack
source-snapchat-marketing
source-stripe
source-surveymonkey
source-tiktok-marketing
source-twilio
source-zendesk-chat
source-zendesk-support
source-zendesk-talk

Python sources not affected:

source-bing-ads (doesnt use http streams)
source-chargebee (doesn't use http streams)
source-file (not AbstractSource)
source-google-ads (doesnt use HttpStreams)
source-google-sheets (not AbstractSource)
source-greenhouse (is a declarative source)
source-instagram (doesn't use httpstreams)
source-s3 (not using httpstreams)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants