New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: SHOW FUNCTIONS for Databricks #14252
Conversation
Codecov Report
@@ Coverage Diff @@
## master #14252 +/- ##
==========================================
- Coverage 76.96% 76.89% -0.08%
==========================================
Files 952 952
Lines 48043 48052 +9
Branches 5978 5978
==========================================
- Hits 36977 36949 -28
- Misses 10864 10901 +37
Partials 202 202
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
return df[cls._show_functions_column].tolist() | ||
|
||
columns = df.columns.values.tolist() | ||
logger.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this function is called in core.py the exceptions are caught and logged there. What's the philosophy in this case? Should we catch/log early or let errors bubble up and be caught in one place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I wanted to log here where we have more context, so that we can quickly fix the problem when we see it in the logs. Also, even if the result is not as expected we can still try our best and use the results if they have a single column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
SUMMARY
The Databricks DB engine is based on the Hive DB engine spec, but the result from
SHOW FUNCTIONS
is slightly different. In Hive it returns a single column calledtab_name
, while in Databricks it's calledfunction
.I changed the
get_function_names
method to allow derived classes to specify a different column name, and also made it more resilient by accepting any column name when only one column is returned.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A
TEST PLAN
I tested the response from both Hive and Databricks, and also tested that it works when the results have a single column with a name different than the one expected (in that case we still log an error, but return the names).
Response from Databricks with this PR:
Response from Hive:
ADDITIONAL INFORMATION