HIVE-24343: Table partition operations (create, drop, select) fail wh…#1640
HIVE-24343: Table partition operations (create, drop, select) fail wh…#1640vnhive wants to merge 2 commits intoapache:masterfrom
Conversation
| // parameters being set in a executed statement as a 2 byte signed integer to the servers. This indirectly | ||
| // sets s limit on the number of parameters that can be set to 32767. This is also the number of parameters | ||
| // that can be passed to the JDO executeArray call. | ||
| final int JDBC_STMT_PARAM_LIMIT = 32767; |
There was a problem hiding this comment.
2 thoughts on this
- Can we move this outside this method to the top of the class?
- I understand that based on the research all DBs support atleast 32767 parameters. But do we really know that it works? We dont have a test case for this that tests across all databases nor have we tested this, afaik. Also we are starting to support a pluggable model for DB support. So users can use HMS with other DBs as well for example DB2 (all they have to do is provide the schema scripts). So we don't know what their limits would be.
Would it make sense to use a smaller number? what is the downside of doing so?
There was a problem hiding this comment.
- I am OK with this. Will do this and resubmit the patch.
- But do we really know that it works? We dont have a test case for this that tests across all databases nor have we
tested this, afaik.
Agreed. I will look into creating a test. Insofar as the issue itself is concerned they seem to be documented limitations of the JDBC driver and the solution is to break the original query into multiple queries. Please do take a look at - https://www.postgresql.org/message-id/16832734.post%40talk.nabble.com
We dont have a test case for this that tests across all databases nor have we tested this, afaik.
Agreed. I can work on this.
Also we are starting to support a pluggable model for DB support. So users can use HMS with other DBs as well for
example DB2 (all they have to do is provide the schema scripts). So we don't know what their limits would be.
Would it make sense to use a smaller number? what is the downside of doing so?
Thank you for explaining about the pluggable model for DB support. We can consider a smaller size for the parameters that are passed to the query.
The downside of reducing the number of parameters that are passed would be that it can potentially result in increases the number of iterations required to get the result of the query for the same number of parameters.
for example if there are 30000 parameters, with a size of 32767, it can be executed in a single query. But if we set the size to 1000, it will result in 30000/1000 = 30 iterations.
This will result in increased latency.
It is important to note that this part of the code is executed for all DDL, DML and queries that involved partitioned tables. Hence it has potential to increase the latency across the spectrum.
| for (int i = 0; i < paramsForFilter.size(); ++i) { | ||
| params[i + 3] = paramsForFilter.get(i); | ||
| } | ||
| "select " + PARTITIONS + ".\"PART_ID\" from " + PARTITIONS + "" |
There was a problem hiding this comment.
re-indent this code to original spacing
There was a problem hiding this comment.
Will do ! Will make the change and resubmit.
| // If this method has been called by default we need to query the tables once. Hence set the minimum numbner of | ||
| // iterations to 1. | ||
| int iterations = 1; | ||
|
|
There was a problem hiding this comment.
remove extra blank lines. max of 1 blank line between code blocks.
There was a problem hiding this comment.
Will do ! Will make the change and resubmit.
| // Find the number of iterations required from the number of input parameters and sets of 32764 parameters | ||
| // that can be created from it. | ||
| iterations = parametersToProcess / (JDBC_STMT_PARAM_LIMIT - 3); | ||
| iterations += ((parametersToProcess % (JDBC_STMT_PARAM_LIMIT - 3) != 0) ? 1 : 0); |
There was a problem hiding this comment.
we should be able to merge these 2 statements. iterations is initialized to 1. Within this conditional block, if we just added the result of integer division to the value of iterations, should be good right? This code is only executed when the params is greater than the limit.
There was a problem hiding this comment.
I used his code to distinguish the case when the number of parameters is a multiple of JDBC_STMT_PARAM_LIMIT - 3, vs when it is not.
For example,
when parametersToProcess = 32764 * 2 => iterations = 2
However
when parametersToProcess = 32764 * 2 + 11 => iterations = 3
32764 parameters in the first iteration
32764 parameters in the second iteration
11 parameters in the third iteration
| } | ||
| result.add(MetastoreDirectSqlUtils.extractSqlLong(fields)); | ||
| } | ||
|
|
There was a problem hiding this comment.
Will do ! Will make the change and resubmit.
| break; | ||
| } | ||
| } | ||
| result.add(MetastoreDirectSqlUtils.extractSqlLong(fields)); |
There was a problem hiding this comment.
I dont think so but I am asking it anyways. Is it possible to have duplicates across sub-queries?
There was a problem hiding this comment.
This is the query that is executed,
select "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where "DBS"."CTLG_NAME" = ? and ( (("FILTER0"."PART_KEY_VAL" = ?) or (("FILTER0"."PART_KEY_VAL" = ?) or (("FILTER0"."PART_KEY_VAL" = ?) or ("FILTER0"."PART_KEY_VAL" = ?)) ) ) )
We pass in unique values to PART_KEY_VAL, hence this ideally should not result in duplicates.
…en the number of partitions is greater than 32767 (signed int)
…en the number of partitions is greater than 32767 (signed int) Addressed comments
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
…en the number of partitions is greater than 32767 (signed int)
What changes were proposed in this pull request?
The implementation of getPartitionIdsViaSqlFilter should be extended to account for the limit of 32767
on the number of parameters to the underlying JDBC driver.
Why are the changes needed?
The table partition operations - create, drop, select access the underlying relation database using JDO,
which internally routes the operations through the JDBC driver. Most of the underlying JDBC driver
implementations place a limit on the number of parameters that can be passed through a statement
implementation. These limits should be taken into account when querying the underlying metastore.
Does this PR introduce any user-facing change?
The patch will ensure that if more than 32767 operations are being operated on (added, dropped or queried)
at the same time, it will not fail.
How was this patch tested?