Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚨 🚨 Source Stripe: fix multiple issues regarding Refunds, CheckoutSessions and CheckoutSessionsLineItems + fix stream schemas #32286

Merged
merged 42 commits into from
Nov 16, 2023

Conversation

davydov-d
Copy link
Collaborator

@davydov-d davydov-d commented Nov 8, 2023

What

This PR addresses multiple issues described in https://github.com/airbytehq/oncall/issues/3428 that are related to incremental syncs of the Refunds, CheckoutSessions and CheckoutSessionsLineItems streams + fixes stream schemas:

  • Refunds missing data in the incremental sync mode
  • CheckoutSessions missing data for one day after a reset
  • CheckoutSessionLineItems potential data loss
  • Incremental streams with created cursor field produce duplicated data in edge cases
  • Fix the default_tax_rates schema in the Invoices, Subscriptions, SubscriptionSchedule streams

How

  • Do not use event-based API for the Refunds stream
  • Change the cursor field for the CheckoutSessions stream
  • Re-implement the CheckoutSessionsLineItems stream
  • Introduce two new fields to the CheckoutSessionsLineItems stream
  • Increase start datetime of a first slice by one second to avoid duplicate data
  • Remove parent_id and add_parent_id of the StripeLazySubStream class
  • Introduce new callable param to the stream for post-processing the records given the slice as a context
  • Update stream schemas
  • Update docs
  • Cover these changes with unit tests

🚨 User Impact 🚨

  • The CheckoutSessionLineItems and Refunds streams will have their cursor fields changed what is reflected in the migration guide

@davydov-d davydov-d added the breaking-change Don't merge me unless you are ready. label Nov 8, 2023
@davydov-d davydov-d requested a review from a team November 8, 2023 08:48
Copy link

vercel bot commented Nov 8, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Nov 16, 2023 3:54pm

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/stripe labels Nov 8, 2023
Copy link
Contributor

github-actions bot commented Nov 8, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@maxi297
Copy link
Contributor

maxi297 commented Nov 13, 2023

Hey Denys! To speed up the review process, can you describe in which cases could these cases occur?

  • Refunds missing data in the incremental sync mode
    • Are "refund.created", "refund.updated" events not enough? In which case aren’t they?
  • CheckoutSessions missing data for one day after a reset
    • legacy_cursor_field has been changed from expires_at to created. My guess is that sessions expires after 24 hours and we would lose them by using expires_at as a cursor field. Can you confirm that?
  • CheckoutSessionLineItems potential data loss
    • Before, the parent stream would be checkout_sessions and we would use it as full incremental. How does moving to only using checkout_sessions as incremental could prevent data loss?

@davydov-d
Copy link
Collaborator Author

@maxi297

  • Refunds missing data in the incremental sync mode

    • Are "refund.created", "refund.updated" events not enough? In which case aren’t they?

We don't know the reason but the Stripe API does not return events of types refund.* and this is not a transient issue. Therefore we decided to use traditional endpoint for both sync modes of this stream. To always have the up-to-date data, customers can make use of the lookback window

  • CheckoutSessions missing data for one day after a reset

    • legacy_cursor_field has been changed from expires_at to created. My guess is that sessions expires after 24 hours and we would lose them by using expires_at as a cursor field. Can you confirm that?

Yes, we would lose sessions that would be created between the time of the sync and the most recent expires_at value which is ~24 hours

  • CheckoutSessionLineItems potential data loss

    • Before, the parent stream would be checkout_sessions and we would use it as full incremental. How does moving to only using checkout_sessions as incremental could prevent data loss?

First of all, having changed the parent stream's legacy cursor field, we need to change this stream's cursor field as well -- this is how we prevent the data loss. Also, we're not moving to using checkout_sessions only in incremental mode - we're using it in the same mode that checkout_sessions_line_items is used.
Second - I got rid of the additional lookback window of 1 day for this stream which was also tied to the previous cursor field.
And the last one - now we do not have to use the parent stream in semi-incremental mode when syncing line items in incremental - this must be a performance boost.

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes LGTM. Thanks for providing more information and testing exhaustively.

I would like us to take actions on what you mentioned We don't know the reason but the Stripe API does not return events of types refund.* and this is not a transient issue:

  • Can we document this in our code? It feels weird that this stream does not leverage the Events API and it feels like it deserve an explanation
  • Can we open a ticket with Stripe to see why the refunds event would not work? Based on their documentation, it should and it feels like we could get more information on this API that might affect other streams we have

Once both are done and my comments are addressed, I'll approve the PR

…github.com:airbytehq/airbyte into ddavydov/3428-source-stripe-fix-incremental-issues
@davydov-d
Copy link
Collaborator Author

@maxi297 the complaint to Stripe has been filed. I do not have any link, however I think they should reply in an email shortly.
Comment added, migration guide and test updated.

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks Denys

@davydov-d davydov-d changed the title 🚨 🚨 Source Stripe: fix multiple issues regarding Refunds, CheckoutSessions and CheckoutSessionsLineItems 🚨 🚨 Source Stripe: fix multiple issues regarding Refunds, CheckoutSessions and CheckoutSessionsLineItems + fix stream schemas Nov 14, 2023
@davydov-d
Copy link
Collaborator Author

@katmarkham a kind reminder to review this PR

@@ -33,6 +33,14 @@ data:
schema refresh of all effected streams is required to use the new cursor
format.
upgradeDeadline: "2023-09-14"
5.0.0:
message:
Version 5.0.0 introduces fixes for the `CheckoutSessions`, `CheckoutSessionsLineItems` and `Refunds` streams.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version 5.0.0 introduces fixes for the CheckoutSessions, CheckoutSessionsLineItems and Refunds streams. The cursor field is changed for the CheckoutSessionsLineItems and Refunds streams. This will prevent data loss during incremental syncs.
Also, the Invoices, Subscriptions and SubscriptionSchedule stream schemas have been updated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied

],
**args,
),
# The refunds stream does not utilize the Events API as we faced issues with data loss during the incremental syncs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   # The Refunds stream does not utilize the Events API as it created issues with data loss during the incremental syncs.
        # Therefore, we're using the regular API with the `created` cursor field. A bug has been filed with Stripe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied

after creation. Applies only to streams that do not support event-based incremental syncs: CheckoutSessionLineItems,
Events, SetupAttempts, ShippingRates, BalanceTransactions, Files, FileLinks. More info <a
after creation. Applies only to streams that do not support event-based incremental syncs: Events,
SetupAttempts, ShippingRates, BalanceTransactions, Files, FileLinks, Refunds. More info <a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Lookback Window only applies to streams that do not support event-based incremental syncs: Events,
SetupAttempts, ShippingRates, BalanceTransactions, Files, FileLinks, Refunds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied


class ParentIncrementalStipeSubStream(StripeSubStream):
"""
This stream differs from others in that it runs parent stream in exactly same sync mode it is run itself to generate stream slices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stream differs from others in that it runs parent stream in exactly same sync mode it is run itself to generate stream slices.
(this is unclear to me, perhaps we can talk live?)
It also uses regular /v1 API endpoints to sync data no matter what the sync mode is. This means that the event-based API may only
be utilized by the parent stream.
(should may be "can"?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion applied: "may" changed to "can". However, this is a docstring, so, cloud users won't even see it, so I don't think this is a significant change

@@ -1,5 +1,19 @@
# Stripe Migration Guide

## Upgrading to 5.0.0

This change fixes multiple issues for the `Refunds`, `Checkout Sessions` and `Checkout Sessions Line Items` streams:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes multiple incremental sync issues with the Refunds, Checkout Sessions and Checkout Sessions Line Items streams:

  • Refunds stream was not syncing data in the incremental sync mode. Cursor field has been updated to "created" to allow for incremental syncs. Because of the changed cursor field of the Refunds stream, incremental syncs will not reflect every update of the records that have been previously replicated. Only newly created records will be synced. To always have the up-to-date data, users are encouraged to make use of the lookback window.

  • CheckoutSessions stream had been missing data for one day when using the incremental sync mode after a reset; this has been resolved.

  • CheckoutSessionsLineItems previously had potential data loss. It has been updated to use a new cursor field ___.

  • Incremental streams with the created cursor had been duplicating some data; this has been fixed.

Stream schema update is a breaking change as well as changing the cursor field for the Refunds and the CheckoutSessionsLineItems stream. A schema refresh and data reset of all effected streams is required after the update is applied.

Also, this update affects three more streams: Invoices, Subscriptions, SubscriptionSchedule. Schemas are changed in this update so that the declared data types would match the actual data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion applied

@octavia-squidington-iv octavia-squidington-iv requested a review from a team November 15, 2023 14:42
@davydov-d davydov-d merged commit f5cbe29 into master Nov 16, 2023
25 of 26 checks passed
@davydov-d davydov-d deleted the ddavydov/3428-source-stripe-fix-incremental-issues branch November 16, 2023 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation breaking-change Don't merge me unless you are ready. checklist-action-run connectors/source/stripe
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants