-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(redshift): Improve redshift error handling with new structured reporting system #10870
refactor(redshift): Improve redshift error handling with new structured reporting system #10870
Conversation
…into jj--add-error-handling-redshift
WalkthroughThe recent updates to Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant GEDataProfiler
participant Logger
User ->> GEDataProfiler: Call _generate_single_profile()
GEDataProfiler ->> GEDataProfiler: Try profiling
alt Permission Denied Exception
GEDataProfiler ->> Logger: Log specific unauthorized access warning
else Other Exceptions
GEDataProfiler ->> Logger: Log general warning
end
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (8)
- metadata-ingestion/src/datahub/ingestion/source/ge_data_profiler.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/redshift/exception.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/redshift/lineage_v2.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/redshift/profile.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/redshift/query.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/redshift/redshift.py (7 hunks)
- metadata-ingestion/src/datahub/ingestion/source/redshift/redshift_schema.py (7 hunks)
- metadata-ingestion/src/datahub/ingestion/source/sql/sql_generic_profiler.py (1 hunks)
Files skipped from review due to trivial changes (1)
- metadata-ingestion/src/datahub/ingestion/source/redshift/query.py
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/redshift/redshift_schema.py
276-277: Use a single
if
statement instead of nestedif
statements(SIM102)
Additional comments not posted (17)
metadata-ingestion/src/datahub/ingestion/source/redshift/exception.py (3)
10-18
: LGTM!The
handle_redshift_exceptions
function correctly handles Redshift-specific exceptions and reports failures. The use of type hinting is appropriate.
20-27
: LGTM!The
handle_redshift_exceptions_yield
function correctly handles exceptions for callables that return iterables. The use ofyield from
is appropriate, and error reporting is consistent.
29-57
: LGTM!The
report_redshift_failure
function provides detailed failure reports based on the error content. The categorization of errors and the corresponding messages are well-structured.metadata-ingestion/src/datahub/ingestion/source/redshift/profile.py (1)
51-69
: LGTM!The logic for handling profiling of external tables is well-implemented. The conditional checks and corresponding logging and reporting provide clear feedback to the user.
metadata-ingestion/src/datahub/ingestion/source/sql/sql_generic_profiler.py (1)
95-113
: LGTM!The logic for skipping profile generation based on missing table attributes is well-implemented. The logging provides clear feedback to the user.
metadata-ingestion/src/datahub/ingestion/source/redshift/lineage_v2.py (2)
243-246
: LGTM!The error reporting in
_populate_lineage_agg
is well-implemented. The detailed context and exception information will aid in debugging and understanding the failure.
413-418
: LGTM!The error reporting in
generate
is well-implemented. The detailed context and exception information will aid in debugging and understanding the failure.metadata-ingestion/src/datahub/ingestion/source/redshift/redshift_schema.py (3)
172-173
: LGTM!The warning comment about table enrichment behavior is informative and does not affect functionality.
Line range hint
212-221
:
LGTM!Logging the number of fetched tables/views is a good practice. The enriched tables are correctly utilized.
315-349
: LGTM!Handling tables with no profiles and logging the message when returning 0 improves clarity and correctness.
metadata-ingestion/src/datahub/ingestion/source/redshift/redshift.py (4)
415-431
: LGTM!The inclusion of
_try_get_redshift_connection
and exception handling makes the method more robust.
432-439
: LGTM!Wrapping
_extract_metadata
withhandle_redshift_exceptions_yield
ensures better error handling and reporting.
Line range hint
574-628
:
LGTM!Including column count for tables/views and reporting informational messages when no tables/views are found improves clarity and completeness.
1117-1155
: LGTM!Handling various connection errors and reporting failures with specific messages improves robustness and user feedback.
metadata-ingestion/src/datahub/ingestion/source/ge_data_profiler.py (3)
1219-1219
: LGTM! Improved error handling.The addition of specific error handling for "permission denied" errors enhances the robustness of the profiling process.
1220-1227
: Specific handling for "permission denied" errors is good.The code correctly identifies "permission denied" errors and logs a specific warning message, providing clear feedback to the user.
1228-1234
: General exception handling is well-implemented.The code handles other exceptions by logging a general warning message, ensuring that unexpected errors are reported.
metadata-ingestion/src/datahub/ingestion/source/ge_data_profiler.py
Outdated
Show resolved
Hide resolved
elif "svl_user_info" in error_message: | ||
report.report_failure( | ||
title="Permission denied", | ||
message="Failed to extract metadata due to insufficient permission to access 'svl_user_info' table. Please ensure the provided database user has access.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a huge fan of this pattern - ideally error handling code should be close to the code that causes the errors, especially for these sorts of cases where the error message is extremely custom
For the generic permission denied / failed to extract metadata, they're fine here. But the svv and svl table ones feel like they should be caught closer to the cause of the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These happen in like 3 places across lineage, usage, and just normal table stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the code is a mess in there so I think this fallback is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I agree though that this is too much of a fallback, but in this case it would be a pretty significant undertaking to cover each place we are issuing queries via the client object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And this covered the cases I tested with a new sample user who I progressively added privileges for)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also see I'm just catching the "redshift" connector errors here as of now, also. So it's fairly targeted to mean that the exception came from a redshift connector call.
metadata-ingestion/src/datahub/ingestion/source/redshift/exception.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/redshift/redshift.py
Outdated
Show resolved
Hide resolved
): | ||
logger.warning( | ||
f"Table {request.pretty_name} has no column count, rows count, or size in bytes. Skipping emitting table level profile." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a hard blocker, but ideally we'd have some tests for this logic to avoid regressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically this is skipping emitting a null profile. will check if there are any existing tests i can easily modify
…er.py Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/source/ge_data_profiler.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/ingestion/source/ge_data_profiler.py
…ift.py Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/source/redshift/redshift.py (7 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/ingestion/source/redshift/redshift.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- metadata-ingestion/src/datahub/ingestion/source/redshift/exception.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/redshift/profile.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/redshift/redshift.py (7 hunks)
Files skipped from review as they are similar to previous changes (3)
- metadata-ingestion/src/datahub/ingestion/source/redshift/exception.py
- metadata-ingestion/src/datahub/ingestion/source/redshift/profile.py
- metadata-ingestion/src/datahub/ingestion/source/redshift/redshift.py
…ed reporting system (#10870) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
…ed reporting system (datahub-project#10870) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Scenarios Covered
Also some small fixes
Screenshots
QA
QA'd locally by attempting to replicate every scenario. Only one I couldn't was lineage parsing.
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Chores