-
Notifications
You must be signed in to change notification settings - Fork 132
feat(csharp/src/Drivers/Databricks): Optimize GetColumnsExtendedAsync via DESC TABLE EXTENDED #2953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…xtended to implement GetColumnsExtendedAsync
81b326f
to
7d83b75
Compare
49b3f6e
to
8c73c0b
Compare
* limitations under the License. | ||
*/ | ||
|
||
using Apache.Arrow.Adbc.Drivers.Databricks.Result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mocify/improve the existing GetColumnExtended tests to verify that:
With the flag on/off, they get same number of columns, same schema, and same data?
4d4af63
to
4a0d05a
Compare
4a0d05a
to
bb93562
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good! I've just got a few nits about formatting and casing of method names.
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra blank line
/// </summary> | ||
internal bool CanUseDescTableExtended => _useDescTableExtended && ServerProtocolVersion != null && ServerProtocolVersion >= TProtocolVersion.SPARK_CLI_SERVICE_PROTOCOL_V7; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra blank line
return string.Join(".", parts); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra blank line
return CreateExtendedColumnsResult(columnMetadataSchema,result); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra blank line
]; | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra blank line
using System.Text.RegularExpressions; | ||
using static Apache.Arrow.Adbc.Drivers.Apache.Hive2.HiveServer2Connection; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra blank lines here and at line 35
} | ||
} | ||
|
||
private int getIntervalSize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private int getIntervalSize() | |
private int GetIntervalSize() |
} | ||
} | ||
|
||
private void parseConstraints() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private void parseConstraints() | |
private void ParseConstraints() |
// Constraints string format example: | ||
// "[ (pk_constraint, PRIMARY KEY (`col1`, `col2`)), (fk_constraint, FOREIGN KEY (`col3`) REFERENCES `catalog`.`schema`.`table` (`refcol1`, `refcol2`)) ]" | ||
|
||
var constraintString = TableConstraints.Trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when not null, we're trimming twice. Consider instead doing
var constraintString = TableConstraints?.Trim();
if (constraintString == null || constraintString.Length == 0)
Optimize GetColumnsExtendedAsync in Databricks via
DESC TABLE EXTENDED
Motivation
Currently,
HiveServer2Statement.GetColumnsExtendedAsync
will make 3 thrift callsGetColumns
,GetPrimaryKeys
andGetCrossReference
to get all the information and then join them into one result set. Now Databricks introduces a SQLDESC TABLE EXTENDED <table> AS JSON
that will return all of these information in one query, this can save 2 extra roundtrips and improve the performance.Description
Override
HiveServer2Statement.GetColumnsExtendedAsync
inDatabricksStatement
by executing single SQLDESC TABLE EXTENDED <table> AS JSON
to get all the required column info forGetColumnsExtendedAsync
then combine and join these info into the result. As this SQLDESC TABLE EXTENDED <table> AS JSON
is only available in Databricks Runtime 16.2 or above, it will check theServerProtocolVersion
. if it does not meet the requirement, it will fallback the implementation of base class.Change
DescTableExtendedResult
that represents the result ofDESC TABLE EXTENDED <table> AS JSON
(see format here). It alsotable_constraints
to the structured primary key and foreign key constraintsDataType
,ColumnSize
,FullTypeName
which are calculated from the column type and type specific propertiesHiveServer2Statement
to protected so that they can be used/overridden in subclassDatabricksStatement
PrimaryKeyFields
ForeignKeyFields
PrimaryKeyPrefix
ForeignKeyPrefix
GetColumnsExtendedAsync
CreateEmptyExtendedColumnsResult
DatabricksStatement
with the changes belowGetColumnsExtendedAsync
inDatabricksStatement
, it executes queryDESC TABLE EXTENDED <table> AS JSON
to get all the required info and then json them into the QueryResultGetColumnsAsync
to a reusable methodCreateColumnMetadataSchema
GetColumnsAsync
to a reusable methodCreateColumnMetadataEmptyArray
adbc.databricks.use_desc_table_extended
CanUseDescTableExtended
inDatabricksConnection
,DatabricksStatement
will call it to decide if it will overrideGetColumnsExtendedAsync
StatementTest:CanGetColumnsExtended
Testing
DescTableExtendedResultTest
to cover all the deserialization and parsing cases from raw result ofDESC TABLE EXTENDED <table> AS JSON
ColumnsExtended
relevant test cases inDatabricks/StatementTest