Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDK: provide better user-friendly error messages #12944

Merged
merged 12 commits into from
May 19, 2022

Conversation

pedroslopez
Copy link
Contributor

@pedroslopez pedroslopez commented May 17, 2022

What

  • Adds a way for developers to provide user-friendly display messages for errors when reading, to be emitted as AirbyteTraceMessages.
  • Uses the above to implement default error message retrieval logic for API errors coming from HTTPStreams.

closes #12537

How

  • Stream gains a get_error_display_message() method, that takes in the exception and should return a string with a user-friendly error message for the exception (or None)
  • HTTPStream implements get_error_display_message by handling HTTPErrors and passing it to a new parse_response_error_message method.
  • HTTPStream.parse_error_message contains logic to pull an error message from common API patterns. Developers can override this method if they want to.

Recommended reading order

  1. airbyte_cdk/sources/abstract_source.py - error handling actually happens here
  2. airbyte_cdk/sources/streams/core.py - defines the get_error_display_message on all Streams
  3. airbyte_cdk/sources/streams/http/http.py - overrides the default get_error_display_message on HTTPStreams to handle HTTPErrors. Defines parse_response_error_message method to allow overriding parsing logic.
  4. unit_tests/sources/streams/http/test_http.py- has examples of some responses that are handled

🚨 User Impact 🚨

Users will start to see error messages as provided by the APIs on HTTPStreams automatically if the API adheres to common practices. Connector developers can tweak this behavior as needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Tests

Unit

Results (4.00s):
     410 passed

@github-actions github-actions bot added the CDK Connector Development Kit label May 17, 2022
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedroslopez the approach makes sense to me!

return None

try:
body = response.json()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't assume the response is JSON

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need to expose another method decode_response or something which defaults to json that can be overridden 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, at the moment I'm returning None if a JSONDecodeError is encountered... Breaking it out into a separate method for decoding feels like it may cause confusion (this would only be used for parsing error responses but parsing regular responses is left up to you?). I think in this case you would override the method as needed to fit your use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

touche


def get_secrets(source: Source, config: Mapping[str, Any], logger: logging.Logger) -> List[Any]:

def get_secrets(source: "Source", config: Mapping[str, Any], logger: logging.Logger) -> List[Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jc why was this needed?

Copy link
Contributor Author

@pedroslopez pedroslopez May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of Source here for typing caused a circular dependency (this is used by AirbyteTraceMessage). Could also move this get_secrets method out into another file since AirbyteTraceMessage doesn't need this specific method (it uses other things defined in this file to mask secrets)

@pedroslopez pedroslopez marked this pull request as ready for review May 18, 2022 23:51
@@ -297,7 +296,7 @@ def _send(self, request: requests.PreparedRequest, request_kwargs: Mapping[str,
# Raise any HTTP exceptions that happened in case there were unexpected ones
try:
response.raise_for_status()
except HTTPError as exc:
except requests.HTTPError as exc:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrong error type was being used here (urlib's HTTPError instead of requests's HTTPError).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedroslopez this should not be from requests.exceptions.HTTPError?
See one example of this throws the error:

2022-05-26 04:38:32 e[44msourcee[0m > Syncing stream: feedback_submissions 
2022-05-26 04:38:32 e[44msourcee[0m > Encountered an exception while reading stream SourceHubspot
Traceback (most recent call last):
  File "/airbyte/integration_code/source_hubspot/streams.py", line 340, in read_records
    stream_records, response = self._read_stream_records(
  File "/airbyte/integration_code/source_hubspot/streams.py", line 311, in _read_stream_records
    response = self.handle_request(
  File "/airbyte/integration_code/source_hubspot/streams.py", line 288, in handle_request
    response = self._send_request(request, request_kwargs)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 337, in _send_request
    return backoff_handler(user_backoff_handler)(request, request_kwargs)
  File "/usr/local/lib/python3.9/site-packages/backoff/_sync.py", line 94, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/backoff/_sync.py", line 94, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 299, in _send
    response.raise_for_status()
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 953, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.hubapi.com/crm/v3/objects/feedback_submissions?hapikey=*****&limit=100&startTimestamp=1485302400000&endTimestamp=1653539911984&archived=false&associations=contacts&properties=hs_agent_email%2Chs_agent_id%2Chs_agent_name%2Chs_all_accessible_team_ids%2Chs_all_assigned_business_unit_ids%2Chs_all_owner_ids%2Chs_all_team_ids%2Chs_chatflow_name%2Chs_chatflow_object_id%2Chs_contact_id%2Chs_conversation_thread_id%2Chs_created_by_user_id%2Chs_createdate%2Chs_lastmodifieddate%2Chs_merged_object_ids%2Chs_object_id%2Chs_unique_creation_key%2Chs_updated_by_user_id%2Chs_user_ids_of_all_notification_followers%2Chs_user_ids_of_all_notification_unfollowers%2Chs_user_ids_of_all_owners%2Chubspot_owner_assigneddate%2Chubspot_owner_id%2Chubspot_team_id%2Chs_industry_standard_question_type%2Chs_sentiment%2Chs_survey_id%2Chs_survey_type%2Chs_survey_channel%2Chs_submission_timestamp%2Chs_value%2Chs_response_group%2Chs_content%2Chs_ingestion_id%2Chs_knowledge_article_id%2Chs_visitor_id%2Chs_engagement_id%2Chs_submission_url%2Chs_survey_name%2Chs_form_guid%2Chs_contact_email_rollup%2Chs_submission_name

Looks is not capturing the correct Error Exception and showing the response from the API.

return None

try:
body = response.json()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, at the moment I'm returning None if a JSONDecodeError is encountered... Breaking it out into a separate method for decoding feels like it may cause confusion (this would only be used for parsing error responses but parsing regular responses is left up to you?). I think in this case you would override the method as needed to fit your use case.

Comment on lines 448 to 462
"api_response, expected_message",
[
({"error": "something broke"}, "something broke"),
({"error": {"message": "something broke"}}, "something broke"),
({"error": "err-001", "message": "something broke"}, "something broke"),
({"errors": ["one", "two", "three"]}, "one, two, three"),
({"messages": ["one", "two", "three"]}, "one, two, three"),
({"errors": [{"message": "one"}, {"message": "two"}, {"message": "three"}]}, "one, two, three"),
({"errors": [{"error": "one"}, {"error": "two"}, {"error": "three"}]}, "one, two, three"),
(["one", "two", "three"], "one, two, three"),
({"error": True}, None),
({"something_else": "hi"}, None),
({}, None),
],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this test-case based development

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a missing case would be:

({"error": {"code": "foo-123"}}, "foo-123"),

Showing that we prefer to show /something/ from an error object.

What about synonyms for "error"?

({"failure": {"message": "foo-123"}}, "foo-123"),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more cases in 28c355d! I left out code for now since I'm not sure how useful it will be for end users to see. We can always add it later though if it does end up being useful.

[
({"error": "something broke"}, "something broke"),
({"error": {"message": "something broke"}}, "something broke"),
({"error": "err-001", "message": "something broke"}, "something broke"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I disagree with this one - I think the error should be err-001 here. The message key is not part of the error object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I got from brian's comment on #12537:

{
    "error": "auth-0001",
    "message": "Incorrect username and password",
    "detail": "Ensure that the username and password included in the request are correct",
    "help": "https://example.com/help/error/auth-0001"
}

Where I believe the "user-friendly error message" would be the message key. They're theoretical examples though, so I could go either way...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I agree with that, assuming there isn't an error object with additional fields. I don't feel too strongly either way - we'll be revising this a lot, I'm sure.

return None

try:
body = response.json()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

touche

Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved to move this forward, but I still think some additional error-cases / synonyms would be nice

@pedroslopez
Copy link
Contributor Author

pedroslopez commented May 19, 2022

/publish-cdk dry-run=true

🕑 https://github.com/airbytehq/airbyte/actions/runs/2354568318
https://github.com/airbytehq/airbyte/actions/runs/2354568318

@pedroslopez
Copy link
Contributor Author

pedroslopez commented May 19, 2022

/publish-cdk dry-run=false

https://github.com/airbytehq/airbyte/actions/runs/2354707685

@pedroslopez pedroslopez merged commit 41dc82f into master May 19, 2022
@pedroslopez pedroslopez deleted the pedroslopez/cdk-error-messages branch May 19, 2022 21:28
@pedroslopez pedroslopez restored the pedroslopez/cdk-error-messages branch May 19, 2022 22:40
@pedroslopez
Copy link
Contributor Author

pedroslopez commented May 19, 2022

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/2354999571
https://github.com/airbytehq/airbyte/actions/runs/2354999571

@pedroslopez pedroslopez deleted the pedroslopez/cdk-error-messages branch May 19, 2022 23:15
ganpatagarwal pushed a commit to ganpatagarwal/airbyte that referenced this pull request May 20, 2022
* initial implementation

* add parse_response_error_message tests

* move error handler to existing try/catch in AbstractSource

* formatting

* var rename

* use isinstance for type checking

* add docstrings

* add more abstract_source and httpstream tests

* fix wrong httperror usage

* more test cases

* bump version, update changelog
suhomud pushed a commit that referenced this pull request May 23, 2022
* initial implementation

* add parse_response_error_message tests

* move error handler to existing try/catch in AbstractSource

* formatting

* var rename

* use isinstance for type checking

* add docstrings

* add more abstract_source and httpstream tests

* fix wrong httperror usage

* more test cases

* bump version, update changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CDK: Connectors can define a method to get error message
4 participants