-
Notifications
You must be signed in to change notification settings - Fork 30
feat: (Airbyte Python CDK) - extend the error_message
info for the ResponseAction.FAIL
request scenario
#87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/streams/http/http_client.py (2)
339-361
: Great addition! A few suggestions to make it even better, wdyt?The method is well-documented and handles different response formats gracefully. However, I have a few suggestions:
- Consider returning the raw JSON object instead of converting it to string immediately:
- return str(response.json()) + return response.json()
- Maybe we could be more specific about the exception type on line 359?
- Would it be helpful to log when both parsing attempts fail?
390-393
: Consider response size impact on performance?The enhanced error message with filtered response details is a great addition! However, for very large response bodies, including the entire response might impact performance. Would it make sense to add a size limit or truncation for large responses, wdyt?
Example approach:
- filtered_response_message = filter_secrets( - f"Request (body): '{str(request.body)}'. Response (body): '{self._get_response_body(response)}'. Response (headers): '{response.headers}'." - ) + response_body = self._get_response_body(response) + if response_body and len(response_body) > 1000: # configurable limit + response_body = f"{response_body[:1000]}... (truncated)" + filtered_response_message = filter_secrets( + f"Request (body): '{str(request.body)}'. Response (body): '{response_body}'. Response (headers): '{response.headers}'." + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
airbyte_cdk/sources/streams/http/http_client.py
(4 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/streams/http/http_client.py (1)
5-5
: LGTM! Clean import additions.
The new imports for filter_secrets
and JsonType
are well-organized and align with the PR's objective of enhancing error messages while protecting sensitive data.
Also applies to: 54-54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you should me the output of the console to confirm that we can see there error message? I fear that it would not work when running this in the platform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/streams/http/http_client.py (2)
338-360
: Consider enhancing the response body handling implementation.The implementation is solid, but here are some suggestions for improvement:
The return type hint
JsonType
might not be the most accurate since we're returning string representations. What do you think about usingstr
instead?The
response.json()
already returns a parsed object, but we're converting it back to string. Maybe we could optimize this? Here's a suggestion, wdyt?- def _get_response_body(self, response: requests.Response) -> Optional[JsonType]: + def _get_response_body(self, response: requests.Response) -> str: """ Extracts and returns the body of an HTTP response. This method attempts to parse the response body as JSON. If the response body is not valid JSON, it falls back to decoding the response content as a UTF-8 string. If both attempts fail, it returns None. Args: response (requests.Response): The HTTP response object. Returns: - Optional[JsonType]: The parsed JSON object as a string, the decoded - response content as a string, or None if both parsing attempts fail. + str: A string representation of the response body or an error message + if decoding fails. """ try: - return str(response.json()) + return orjson.dumps(response.json()).decode('utf-8') except requests.exceptions.JSONDecodeError: try: return response.content.decode("utf-8") except Exception: return "The Content of the Response couldn't be decoded."
389-407
: LGTM! Good improvements to error handling.The changes effectively enhance error messages while ensuring sensitive data is filtered. The addition of
emit_message()
before raising the exception is a nice touch for visibility.One tiny suggestion: Would it make sense to extract the error message construction into a separate private method to improve readability? Something like
_construct_error_message(request, response, filtered_response_message)
, wdyt?+ def _construct_error_message( + self, + request: requests.PreparedRequest, + response: requests.Response, + filtered_response_message: str + ) -> str: + return ( + f"'{request.method}' request to '{request.url}' failed with " + f"status code '{response.status_code}' and error message: " + f"'{self._error_message_parser.parse_response_error_message(response)}'. " + f"{filtered_response_message}" + ) def _handle_error_resolution(self, ...): if error_resolution.response_action == ResponseAction.FAIL: if response is not None: filtered_response_message = filter_secrets( f"Request (body): '{str(request.body)}'. Response (body): '{self._get_response_body(response)}'. Response (headers): '{response.headers}'." ) - error_message = f"'{request.method}' request to '{request.url}' failed with status code '{response.status_code}' and error message: '{self._error_message_parser.parse_response_error_message(response)}'. {filtered_response_message}" + error_message = self._construct_error_message( + request, response, filtered_response_message + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
airbyte_cdk/sources/streams/http/http_client.py
(4 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/streams/http/http_client.py (1)
5-5
: LGTM! Clean import additions.
The new imports are well-organized and necessary for the enhanced error handling functionality.
Also applies to: 54-54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
airbyte_cdk/sources/streams/http/http_client.py (1)
339-361
: Consider improving JSON response handling, wdyt?The method is well-documented and handles errors gracefully. However,
str(response.json())
might not be the best way to handle JSON responses as it could:
- Lose the JSON structure
- Not handle complex objects consistently
What do you think about this alternative approach?
- return str(response.json()) + json_response = response.json() + return orjson.dumps(json_response).decode() if json_response else NoneThis would:
- Preserve JSON structure
- Handle None values gracefully
- Use the same serialization method (orjson) that's already imported
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
airbyte_cdk/sources/streams/http/http_client.py
(4 hunks)
🔇 Additional comments (3)
airbyte_cdk/sources/streams/http/http_client.py (3)
5-6
: LGTM! Clean import additions.
The new imports are well-organized and necessary for the enhanced error handling functionality.
Also applies to: 54-54
390-393
: Great enhancement to error handling! 🎉
The changes effectively:
- Filter sensitive information from responses
- Provide more detailed error messages for debugging
- Ensure error messages are visible in logs through
emit_message()
This addresses the visibility concerns raised in previous reviews while maintaining security through secret filtering.
Also applies to: 399-409
390-392
: Verify comprehensive secret filtering
Let's ensure the secret filtering catches all potential sensitive data in responses.
✅ Verification successful
Secret filtering looks good, but should we add a note about maximal matching? 🤔
The secret filtering implementation properly handles sensitive data in headers and responses by replacing them with "****". The filter_secrets function is used consistently across the codebase, including for response headers. However, there's a TODO comment about maximal matching that might be worth addressing:
- The current implementation might have edge cases with overlapping secrets (e.g., "x" and "xk")
- This could potentially lead to partial secret exposure in some cases
What do you think about adding a note to track this as a future improvement? The current implementation is safe for the common case, but being explicit about the edge case might help future maintainers. wdyt?
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential sensitive data patterns in error handling
# Look for patterns that might indicate sensitive data in tests or examples
# Check for sensitive patterns in test files
rg -i '(password|secret|token|key|auth).*response' test_http.py -A 5
# Check for examples of response headers containing sensitive data
rg -i 'authorization.*header' -g '!*.pyc'
Length of output: 3912
Script:
#!/bin/bash
# Let's check the implementation of the filter_secrets function to ensure it handles headers properly
ast-grep --pattern 'def filter_secrets'
# Also check for any test cases related to secret filtering
rg -i 'filter.*secret' -g '*.py' -A 5
# Look for any other sensitive data patterns in headers
rg -i '(api[-_]?key|token|secret|password|auth).*header' -g '*.py' -A 2
Length of output: 51046
Script:
#!/bin/bash
# Let's check the implementation of filter_secrets to understand how it handles sensitive data
ast-grep --pattern 'def filter_secrets'
# Also check for any global secrets or configuration that might be used in filtering
rg -i '__SECRETS_FROM_CONFIG' -g '*.py' -A 5
# Look for any tests that verify secret filtering behavior
rg -i 'test.*filter.*secret' -g '*test*.py' -A 5
Length of output: 2454
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/streams/http/http_client.py (2)
339-361
: Consider optimizing the response parsing logicThe implementation is solid, but I have a couple of suggestions that might improve it further. What do you think about:
- Returning the raw JSON instead of converting it to string?
str(response.json())
might lose precision for numbers and booleans.- Adding debug logging when parsing fails to help with troubleshooting?
Here's a possible improvement:
def _get_response_body(self, response: requests.Response) -> Optional[JsonType]: try: - return str(response.json()) + return response.json() except requests.exceptions.JSONDecodeError: + self._logger.debug("Failed to parse response as JSON, attempting UTF-8 decode") try: return response.content.decode("utf-8") except Exception: + self._logger.debug("Failed to decode response content as UTF-8") return "The Content of the Response couldn't be decoded."
390-401
: Great improvements to error handling! One tiny suggestionThe error handling looks solid! I particularly like how you've addressed the platform logging visibility issue and added comprehensive secret filtering.
Quick thought: Would it be clearer to structure the error message with newlines for better readability in logs? wdyt?
Here's an optional suggestion to improve log readability:
- filtered_response_message = filter_secrets(f"Request (body): '{str(request.body)}'. Response (body): '{self._get_response_body(response)}'. Response (headers): '{response.headers}'.") + filtered_response_message = filter_secrets( + "Details:\n" + f" Request body: '{str(request.body)}'\n" + f" Response body: '{self._get_response_body(response)}'\n" + f" Response headers: '{response.headers}'" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
airbyte_cdk/sources/streams/http/http_client.py
(4 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/streams/http/http_client.py (1)
5-5
: LGTM! Clean import addition
The import of filter_secrets
is well-placed and aligns with the PR objective of filtering sensitive information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks Baz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/streams/http/http_client.py (2)
55-55
: Duplicate import detected
AuthBase
is already imported earlier in the file. Would you mind removing this duplicate import? wdyt?
341-363
: Consider enhancing error handling and type annotationsA few suggestions to make this more robust:
- The return type annotation suggests
JsonType
but we're returningstr
. Should we update the type tostr
?- Would it be helpful to log the actual decoding error for debugging purposes?
- The hardcoded message could be more descriptive, maybe include the encoding that failed?
Here's a potential improvement, wdyt?
- def _get_response_body(self, response: requests.Response) -> Optional[JsonType]: + def _get_response_body(self, response: requests.Response) -> str: try: return str(response.json()) except requests.exceptions.JSONDecodeError: try: return response.content.decode("utf-8") - except Exception: - return "The Content of the Response couldn't be decoded." + except Exception as e: + self._logger.debug(f"Failed to decode response content: {str(e)}") + return "Failed to decode response content (both JSON and UTF-8 attempts failed)"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
airbyte_cdk/sources/streams/http/http_client.py
(4 hunks)unit_tests/sources/streams/http/test_http.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- unit_tests/sources/streams/http/test_http.py
🔇 Additional comments (1)
airbyte_cdk/sources/streams/http/http_client.py (1)
392-403
: LGTM! Just a quick verification needed
The implementation looks solid! The error message now includes comprehensive debug information while filtering secrets.
One small thing: shall we verify that all potential sensitive data is being filtered? For instance, would you like me to help check if there are any other fields in the request/response that might contain sensitive information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
airbyte_cdk/sources/streams/http/http_client.py (2)
339-361
: Consider enhancing the response body handling, wdyt?A few suggestions to make this more robust:
- The JSON response could be returned as-is instead of converting to string, preserving the original structure.
- We might want to catch specific exceptions for better error handling.
- The docstring could be more precise about the actual return types.
Here's a potential improvement:
def _get_response_body(self, response: requests.Response) -> Optional[JsonType]: """ Extracts and returns the body of an HTTP response. Args: response (requests.Response): The HTTP response object. Returns: Optional[JsonType]: Either: - A dict/list for JSON responses - A string for plain text responses - A default message if decoding fails """ try: - return str(response.json()) + return response.json() except requests.exceptions.JSONDecodeError: try: return response.content.decode("utf-8") - except Exception: + except UnicodeDecodeError: return "The Content of the Response couldn't be decoded."
390-393
: Consider optimizing the error message construction, what do you think?The filtered message includes both request and response bodies, which could be quite large. Should we consider truncating these for readability? Also, the structure could be more readable with line breaks.
Here's a suggestion:
- filtered_response_message = filter_secrets( - f"Request (body): '{str(request.body)}'. Response (body): '{self._get_response_body(response)}'. Response (headers): '{response.headers}'." - ) + filtered_response_message = filter_secrets("\n".join([ + "Request body: " + str(request.body)[:1000] + ("..." if len(str(request.body)) > 1000 else ""), + "Response body: " + str(self._get_response_body(response))[:1000] + ("..." if len(str(self._get_response_body(response))) > 1000 else ""), + "Response headers: " + str(response.headers) + ]))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
airbyte_cdk/sources/streams/http/http_client.py
(3 hunks)unit_tests/sources/streams/http/test_http.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- unit_tests/sources/streams/http/test_http.py
🔇 Additional comments (2)
airbyte_cdk/sources/streams/http/http_client.py (2)
48-49
: LGTM! Clean import addition.
The import of filter_secrets
is well-placed and aligns with the security requirements.
399-401
: Consider the duplicate logging behavior
The error message is logged here and will also be logged when the exception is printed. Should we consider removing this direct logging since the exception will handle it? Or is this intentional for immediate visibility?
Let's check how the exception logging is handled:
✅ Verification successful
Let me check how the exception handling and logging is implemented in this class.
The double logging is actually intentional here - what do you think about adding a quick comment to explain why? 🤔
Looking at the code, the explicit logging before raising the exception serves a specific purpose - it ensures the error is immediately visible in logs before any exception handling occurs. This is particularly useful in distributed systems where exception handling might happen at a different layer or service. The comment "ensure the exception message is emitted before raised" hints at this, but we could make it more explicit.
Maybe something like:
# Log error immediately for visibility in distributed systems, before exception handling occurs
self._logger.error(error_message)
wdyt about making this design choice more clear in the code? 🙂
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if MessageRepresentationAirbyteTracedErrors logging is handled elsewhere
rg -A 5 "MessageRepresentationAirbyteTracedErrors" --type py
Length of output: 4508
Script:
#!/bin/bash
# Check the implementation of the error handling in HttpClient
ast-grep --pattern 'class HttpClient {
$$$
def _handle_error_response($$$) {
$$$
}
$$$
}'
Length of output: 103
Script:
#!/bin/bash
# Let's check the context around the error logging
rg -B 10 -A 5 "_logger.error\(error_message\)" --type py
Length of output: 1839
What
Resolving:
How
HttpClient._handle_error_resolution
>error_message
withresponse
info for debugging purposesfilter_secrets
to omit theuser-provided secret values
once appear suddenlytest_error_codes_http_stream_error_resolution_with_response_secrets_filtered
unit testUser Impact
No impact is expected, this is not a breaking change.
Summary by CodeRabbit
New Features
ruff
tool.Bug Fixes
Tests