Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[airbyte-cdk] Updates Low Code CDK ErrorHandlers and BackoffStrategies to align with Python CDK Interfaces #38743

Merged
merged 51 commits into from
Jun 18, 2024

Conversation

pnilan
Copy link
Contributor

@pnilan pnilan commented May 28, 2024

What

  • Updates Low Code CDK ErrorHandler and implementations to align with Python CDK interface
  • Updates Low Code CDK BackoffStrategy and implementations to align with Python CDK interface
  • Removes ResponseStatus class in favor of ErrorResolution dataclass, which encapsulates the response action, the failure type, and an associated error message.
  • Updates the HttpResponseFilter to act as the error mapping / matching mechanism for the low code CDK -> continues to map http status codes, predicates to declared action, but will default to the HttpStatusErrorHandler mapping from the Python CDK, when no mapping is provided. Returns an ErrorResolution. Default failure type is system_failure unless otherwise mapped.
  • Adds DefaultHttpResponseFilter to default to after iterating through the underlying HttpResponseFilters of a given DefaultErrorHandler.
  • Updates HttpRequester to follow new request management flow and response action execution.
  • Updates AbstractSource to emit control messages when an AirbyteTracedException is raised during a read operation.

Why

Update is to align the low-code CDK interfaces with common Python CDK interfaces and to provide sensible error defaults for response actions and failure types for all low-code connectors.

Recommended Reading Order

  1. ErrorHandler classes
  2. BackoffStrategy classes
  3. HttpResponseFilter
  4. DefaultHttpResponseFilter
  5. HttpRequester (note: ErrorStatus class has been removed)
  6. abstract_source.py
  7. model_to_component_factory.py

Other Notes

  • This PR does not integrate the HttpClient into the HttpRequester, that will be included in a subsequent PR.
  • This PR does not deprecate the CompositeErrorHandler class, that will be in a subsequent PR.
  • While neither the ErrorHandler or BackoffStrategy interfaces include the max_retries and max_time properties by default, I have maintained inclusion of these within the low-code CDK implementations of the ErrorHandler interfaces.

@pnilan pnilan requested a review from a team as a code owner May 28, 2024 23:41
Copy link

vercel bot commented May 28, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Jun 11, 2024 8:32pm

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label May 28, 2024
@pnilan pnilan marked this pull request as draft May 29, 2024 15:52
@pnilan pnilan marked this pull request as ready for review May 29, 2024 22:41
@pnilan pnilan force-pushed the pnilan/airbyte-cdk-low-code-http-error-handler branch from 139d532 to fec77c3 Compare May 31, 2024 18:20
Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments around some nits in code style and we also already talked about adjusting the relevant docstrings so won't add comments to those. Overall though I like this incremental approach to releasing this. This definitely feels like the trickiest part and hopefully actually using the http client to make the send ends up being easier.

I think the main thing to talk about is the impact of these changes on existing connectors that either implement the BackoffStrategy interface or override existing backoff strategy components. With some of the naming changes and parameters change, those will no longer work. I think it would be good to get a sense for the impact of how many are affected.

Have you tested these against some existing connectors? And in addition to documenting the path to migrating a component to the new interface, we should at least test that process against at least one connector that implements a custom backoff strategy/

@girarda girarda requested a review from tolik0 June 6, 2024 03:21
@pnilan pnilan requested a review from brianjlai June 7, 2024 22:24
Comment on lines +58 to +70
if not isinstance(matched_error_resolution, ErrorResolution):
continue

if matched_error_resolution.response_action == ResponseAction.SUCCESS:
return matched_error_resolution

if (
matched_error_resolution.response_action == ResponseAction.RETRY
or matched_error_resolution.response_action == ResponseAction.IGNORE
):
return matched_error_resolution
if matched_error_resolution:
return matched_error_resolution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not isinstance(matched_error_resolution, ErrorResolution):
continue
if matched_error_resolution.response_action == ResponseAction.SUCCESS:
return matched_error_resolution
if (
matched_error_resolution.response_action == ResponseAction.RETRY
or matched_error_resolution.response_action == ResponseAction.IGNORE
):
return matched_error_resolution
if matched_error_resolution:
return matched_error_resolution
if not isinstance(matched_error_resolution, ErrorResolution):
continue
else:
return matched_error_resolution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought behind the current indentation levels/response hierarchies is:

  1. Iterate over each error handler to determine:
    a. If response is SUCCESS, return success resolution
    b. If response is RETRY or IGNORE, return resolution
    c. If response is FAIL, iterate to next error handler to see if there is a "better" specified action for the given response
  2. After iterating over all error handlers, if there is a matched error resolution, (it would presumably be a FAIL response action, otherwise it would have already been returned), then return the resolution
  3. Otherwise, return a default error resolution (RETRY w/ system_error)

The thought here is we don't want to prematurely fail if there is an error handler with a response filter that has a "better" outcome. IMO this makes a lot of assumptions about the connector dev's intentions, but while the CompositeErrorHandler still exists, I'm trying to maintain its behavior as much as possible.

Copy link
Contributor

@tolik0 tolik0 Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then should it be like this?

    def interpret_response(self, response_or_exception: Optional[Union[requests.Response, Exception]]) -> ErrorResolution:
        best_resolution = None

        for error_handler in self.error_handlers:
            error_resolution = error_handler.interpret_response(response_or_exception)

            # Check if error_resolution is valid and prioritize based on action
            if isinstance(error_resolution, ErrorResolution):
                if error_resolution.response_action == ResponseAction.SUCCESS:
                    return error_resolution  # Immediate return on success
                elif error_resolution.response_action in (ResponseAction.RETRY, ResponseAction.IGNORE):
                    best_resolution = error_resolution  # Store RETRY or IGNORE as potential best outcomes
                elif error_resolution.response_action == ResponseAction.FAIL and not best_resolution:
                    best_resolution = error_resolution  # Store FAIL only if no better alternative has been found

        # Return the best resolution found, default to RETRY if none provided
        return best_resolution if best_resolution else ErrorResolution(ResponseAction.RETRY)

Why do we not have the same implementation for DefaultErrorHandler?

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few small comments to address, but nothing that needs to block approval. nice work!

A few follow up questions:

Were you able to trigger any live test runs using local CDK to verify the changes on real connector that you can include in a comment?

One question about the release, we had originally discussed either committing to a temp branch that the last PR would go into or alternatively releasing this w/o a publish. Which did you decide on?

And did you create follow up tickets for the other work that we didn't scope in. I know the http client work is tracked, but are we tracking the work to deprecate CompositeErrorHandler?

if should_retry == response_status.IGNORE or should_retry.action == ResponseAction.RETRY:
return should_retry
return should_retry
return max([error_handler.max_time or 0 for error_handler in self.error_handlers]) # type: ignore # property not defined in ErrorHandler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. What is the exact error that mypy raises when we don't have this ignore annotation. The ErrorHandler interface does have a max_time property so I'm surprised this gets raised. Not a blocker, but it is curious to know if we can avoid the annotation because the code itself looks fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

airbyte_cdk/sources/declarative/requesters/error_handlers/composite_error_handler.py:51: error: "ErrorHandler" has no attribute "max_time" [attr-defined]

This is because the new parent class is the Python CDK ErrorHandler which does not have the max_time/max_retries, etc.

@pnilan pnilan merged commit 7a63966 into master Jun 18, 2024
29 checks passed
@pnilan pnilan deleted the pnilan/airbyte-cdk-low-code-http-error-handler branch June 18, 2024 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants