HIVE-29482: Direct SQL fails when number of query parameters exceeds … by dengzhhu653 · Pull Request #6347 · apache/hive

dengzhhu653 · 2026-03-04T10:13:43Z

…PostgreSQL limit

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

sonarqubecloud · 2026-03-04T11:36:09Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
4.0% Duplication on New Code

See analysis details on SonarQube Cloud

VenuReddy2103 · 2026-03-04T11:41:07Z

...tore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java

      batchSize = dbType.needsInBatching() ? 1000 : NO_BATCHING;
    }
    this.batchSize = batchSize;
+    this.directSqlInsertPart = new DirectSqlInsertPart(pm, dbType, batchSize);


This makes oracle, sql server and postgres to limit the batch size to 1000 now. You would probably want to look at DatabaseProduct#getMaxRows() ?

hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java

Line 850 in 89631dc

public int getMaxRows(int batch, int paramSize) {

Thanks for the comment. I add the enforcement check in some other places as well

…PostgreSQL limit

Copilot

Pull request overview

This PR aims to prevent metastore direct-SQL/JDO operations from failing when the number of bound query parameters exceeds backend limits (notably PostgreSQL), by adjusting batching behavior based on the database product.

Changes:

Introduces DatabaseProduct.getMaxBatch and applies it to many Batchable.runBatched(...) call sites.
Updates several direct-SQL/JDO partition-related queries to run in (intended) safer batches.
Refactors partition-by-name direct SQL to use parameter placeholders instead of inlined quoted strings.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/MetaToolObjectStore.java`	Switches metadata-summary queries to use db-specific batching limits.
`standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java`	Uses db-specific batching for JDO partition operations and column stats retrieval.
`standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java`	Applies db-specific batching to multiple direct-SQL partition operations and parameterizes partition-name queries.
`standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlAggrStats.java`	Adds `dbType` field and attempts to batch aggregate stats queries by DB parameter limit.
`standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java`	Adds PostgreSQL parameter-limit logic and a new `getMaxBatch` helper.

Comments suppressed due to low confidence (2)

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:2310

In dropAllPartitionsAndGetLocations, batch is now passed as the Batchable.runBatched batch size, but the progress message treats it as the total number of batches (batchIdx/batch and remaining-time estimation using batch - batchIdx). This makes progress reporting incorrect and can produce negative/meaningless remaining-time estimates. Compute totalBatches separately from the chosen batch size (e.g., ceil(partIds.size()/batchSize)).

    int batch = dbType.getMaxBatch(batchSize, partIds.size());
    AtomicLong batchIdx = new AtomicLong(1);
    AtomicLong timeSpent = new AtomicLong(0);
    Batchable.runBatched(batch, partIds, new Batchable<Long, Void>() {
      @Override
      public List<Void> run(List<Long> input) throws Exception {
        StringBuilder progress = new StringBuilder("Dropping partitions, batch: ");
        long start = System.currentTimeMillis();
        progress.append(batchIdx.get()).append("/").append(batch);
        if (batchIdx.get() > 1) {
          long leftTime = (batch - batchIdx.get()) * timeSpent.get() / batchIdx.get();
          progress.append(", time left: ").append(leftTime).append("ms");
        }

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:3011

batch is now used as the Batchable.runBatched batch size, but the progress message treats it as the total number of batches (e.g., batchIdx/batch and remaining-time estimation using batch - batchIdx). This will make progress reporting incorrect and can produce negative/meaningless remaining time. Track batchSize and totalBatches separately (compute totalBatches from partNames.size() and the chosen batch size).

    int batch = dbType.getMaxBatch(batchSize, partNames.size() + 3);
    AtomicLong batchIdx = new AtomicLong(1);
    AtomicLong timeSpent = new AtomicLong(0);
    try {
      Batchable.runBatched(batch, partNames, new Batchable<String, Void>() {
        @Override
        public List<Void> run(List<String> input) throws MetaException {
          StringBuilder progress = new StringBuilder("Dropping partitions, batch: ");
          long start = System.currentTimeMillis();
          progress.append(batchIdx.get()).append("/").append(batch);
          if (batchIdx.get() > 1) {
            long leftTime = (batch - batchIdx.get()) * timeSpent.get() / batchIdx.get();
            progress.append(", time left: ").append(leftTime).append("ms");
          }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

...tore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java

Copilot · 2026-03-05T06:31:49Z

...tastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java

+    int minBatch = batch;
+    if (isSQLSERVER()) {
+       minBatch = (totalParameters + 2100) / 2100;
+    } else if (isPOSTGRES()) {
+       minBatch = (totalParameters + 32767) / 32767;
+    }
+    return batch <= 0 ? minBatch : Math.max(batch, minBatch);


getMaxBatch currently computes a value that looks like a number of batches (ceil(totalParameters/limit)) and then is used as the Batchable.runBatched batch size throughout the PR. For batchSize == NO_BATCHING (-1) this will often return 1, turning what used to be a single query into N single-item queries (major performance regression) and still not enforcing the per-statement parameter limit. Rework this helper to return a safe batch size (max items per batch) based on the DB parameter limit and fixed/variable parameter counts, and ensure it never returns 0.

Suggested change

int minBatch = batch;

if (isSQLSERVER()) {

minBatch = (totalParameters + 2100) / 2100;

} else if (isPOSTGRES()) {

minBatch = (totalParameters + 32767) / 32767;

}

return batch <= 0 ? minBatch : Math.max(batch, minBatch);

// totalParameters is treated as the per-item parameter count. Compute the maximum

// number of items we can safely include in a single batch without exceeding the

// database's per-statement parameter limit.

int maxItemsPerBatch;

if (totalParameters <= 0) {

// If we cannot determine per-item parameter usage, fall back to the configured

// batch size without applying DB-specific limits.

return batch;

}

if (isSQLSERVER()) {

// SQL Server supports a maximum of 2100 parameters in a request.

maxItemsPerBatch = 2100 / totalParameters;

} else if (isPOSTGRES()) {

// PostgreSQL supports a maximum of 32767 parameters in a request.

maxItemsPerBatch = 32767 / totalParameters;

} else {

// No known hard parameter limit; use the configured batch size as-is.

return batch;

}

// Ensure we never return 0; at least one item per batch is always allowed.

if (maxItemsPerBatch <= 0) {

maxItemsPerBatch = 1;

}

// If batch <= 0 (e.g., NO_BATCHING), choose the largest safe batch size that

// respects the DB parameter limit. Otherwise, cap the requested batch size at

// the safe maximum.

if (batch <= 0) {

return maxItemsPerBatch;

}

return Math.min(batch, maxItemsPerBatch);

this is to calculate the minimum batch needed given the total query parameters, not per row

...tore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlAggrStats.java

...tastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/MetaToolObjectStore.java

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

...tastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java

...tore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:3009

batch is now passed as the batchSize to Batchable.runBatched, but the same variable is also used as the “total batches” denominator in the progress message (batchIdx/.../batch) and in the time-left estimate. If batch is meant to be items-per-batch (as runBatched expects), the progress output and time-left math become incorrect.

Compute and use two separate values: an effective batchSize for runBatched, and a totalBatches value derived from partNames.size() and that batch size for progress reporting.

    int batch = dbType.getMaxBatch(batchSize, partNames.size() + 3);
    AtomicLong batchIdx = new AtomicLong(1);
    AtomicLong timeSpent = new AtomicLong(0);
    try {
      Batchable.runBatched(batch, partNames, new Batchable<String, Void>() {
        @Override
        public List<Void> run(List<String> input) throws MetaException {
          StringBuilder progress = new StringBuilder("Dropping partitions, batch: ");
          long start = System.currentTimeMillis();
          progress.append(batchIdx.get()).append("/").append(batch);
          if (batchIdx.get() > 1) {
            long leftTime = (batch - batchIdx.get()) * timeSpent.get() / batchIdx.get();

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:2310

batch is used both as the Batchable.runBatched batch size and as the “total batches” value in the progress message/time-left calculation (batchIdx/.../batch). If batch represents items-per-batch, the progress output and time-left estimate will be incorrect.

Use an effective batchSize for runBatched, and compute totalBatches from partIds.size() and that batch size for the progress denominator.

    int batch = dbType.getMaxBatch(batchSize, partIds.size());
    AtomicLong batchIdx = new AtomicLong(1);
    AtomicLong timeSpent = new AtomicLong(0);
    Batchable.runBatched(batch, partIds, new Batchable<Long, Void>() {
      @Override
      public List<Void> run(List<Long> input) throws Exception {
        StringBuilder progress = new StringBuilder("Dropping partitions, batch: ");
        long start = System.currentTimeMillis();
        progress.append(batchIdx.get()).append("/").append(batch);
        if (batchIdx.get() > 1) {
          long leftTime = (batch - batchIdx.get()) * timeSpent.get() / batchIdx.get();
          progress.append(", time left: ").append(leftTime).append("ms");
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T07:22:01Z

...tore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlAggrStats.java

+    int batch = dbType.getMaxBatch(batchSize, colNames.size() + partNames.size() + 4);
+    return Batchable.runBatched(batch, colNames, new Batchable<String, ColumnStatisticsObj>() {
      @Override
      public List<ColumnStatisticsObj> run(final List<String> inputColNames) throws MetaException {
-        return columnStatisticsObjForPartitionsBatch(catName, dbName, tableName,
-            partNames, inputColNames, engine, areAllPartsFound,
-            useDensityFunctionForNDVEstimation, ndvTuner, enableBitVector, enableKll);
-      }
-    });
-  }
-
-  /**
-   * Should be called with the list short enough to not trip up Oracle/etc.
-   */
-  private List<ColumnStatisticsObj> columnStatisticsObjForPartitionsBatch(
-      String catName,
-      String dbName, String tableName,
-      List<String> partNames, List<String> colNames, String engine,
-      boolean areAllPartsFound, boolean useDensityFunctionForNDVEstimation,
-      double ndvTuner, boolean enableBitVector,
-      boolean enableKll) throws MetaException {
-    if (enableBitVector || enableKll) {
-      return aggrStatsUseJava(catName, dbName, tableName, partNames,
-          colNames, engine, areAllPartsFound, useDensityFunctionForNDVEstimation,
-          ndvTuner, enableBitVector, enableKll);
-    } else {
-      return aggrStatsUseDB(catName, dbName, tableName, partNames, colNames, engine,
-          useDensityFunctionForNDVEstimation, ndvTuner);
-    }
+        if (enableBitVector || enableKll) {
+          return aggrStatsUseJava(catName, dbName, tableName, partNames,
+              inputColNames, engine, areAllPartsFound, useDensityFunctionForNDVEstimation,
+              ndvTuner, enableBitVector, enableKll);
+        } else {
+          return aggrStatsUseDB(catName, dbName, tableName, partNames, inputColNames, engine,
+              useDensityFunctionForNDVEstimation, ndvTuner);
+        }
+      }});


columnStatisticsObjForPartitions computes a DB-specific batch size but only applies it to batching over colNames. The SQL built in aggrStatsUseDB uses both inputColNames and the full partNames list as bind parameters (... in (%1$s) ... in (%2$s) plus 4 fixed params). If batchSize is NO_BATCHING (the default for Postgres today), the unbatched partNames can still exceed PostgreSQL’s 32767 parameter limit, so this change doesn’t fully address the failure mode.

To actually enforce the parameter limit, apply batching to the partNames dimension as well (or derive inputColNames and inputPartNames batch sizes from the same per-statement parameter budget, accounting for the 4 fixed params).

asf-ci-hive added the tests pending label Mar 4, 2026

VenuReddy2103 reviewed Mar 4, 2026

View reviewed changes

asf-ci-hive added tests passed and removed tests pending labels Mar 4, 2026

dengzhhu653 added 3 commits March 5, 2026 11:10

HIVE-29482: Direct SQL fails when number of query parameters exceeds …

3088b58

…PostgreSQL limit

revert update part

edad575

injection

fd5468a

dengzhhu653 force-pushed the HIVE-29482 branch from f8a9a06 to fd5468a Compare March 5, 2026 06:21

asf-ci-hive added tests pending and removed tests passed labels Mar 5, 2026

dengzhhu653 requested a review from Copilot March 5, 2026 06:22

Copilot started reviewing on behalf of dengzhhu653 March 5, 2026 06:24 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

fix-1

a2d7929

dengzhhu653 requested a review from Copilot March 5, 2026 06:55

Copilot started reviewing on behalf of dengzhhu653 March 5, 2026 06:59 View session

asf-ci-hive added tests failed and removed tests pending labels Mar 5, 2026

Copilot AI reviewed Mar 5, 2026

View reviewed changes

fix-2

c7af92b

dengzhhu653 requested a review from Copilot March 5, 2026 07:15

Copilot started reviewing on behalf of dengzhhu653 March 5, 2026 07:15 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

asf-ci-hive added tests pending and removed tests failed labels Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-29482: Direct SQL fails when number of query parameters exceeds …#6347

HIVE-29482: Direct SQL fails when number of query parameters exceeds …#6347
dengzhhu653 wants to merge 5 commits intoapache:masterfrom
dengzhhu653:HIVE-29482

dengzhhu653 commented Mar 4, 2026

Uh oh!

sonarqubecloud bot commented Mar 4, 2026

Uh oh!

VenuReddy2103 Mar 4, 2026 •

edited

Loading

Uh oh!

dengzhhu653 Mar 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

dengzhhu653 Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-    int minBatch = batch;
-    if (isSQLSERVER()) {
-       minBatch = (totalParameters + 2100) / 2100;
-    } else if (isPOSTGRES()) {
-       minBatch = (totalParameters + 32767) / 32767;
-    }
-    return batch <= 0 ? minBatch : Math.max(batch, minBatch);
+    // totalParameters is treated as the per-item parameter count. Compute the maximum
+    // number of items we can safely include in a single batch without exceeding the
+    // database's per-statement parameter limit.
+    int maxItemsPerBatch;
+    if (totalParameters <= 0) {
+      // If we cannot determine per-item parameter usage, fall back to the configured
+      // batch size without applying DB-specific limits.
+      return batch;
+    }
+    if (isSQLSERVER()) {
+      // SQL Server supports a maximum of 2100 parameters in a request.
+      maxItemsPerBatch = 2100 / totalParameters;
+    } else if (isPOSTGRES()) {
+      // PostgreSQL supports a maximum of 32767 parameters in a request.
+      maxItemsPerBatch = 32767 / totalParameters;
+    } else {
+      // No known hard parameter limit; use the configured batch size as-is.
+      return batch;
+    }
+    // Ensure we never return 0; at least one item per batch is always allowed.
+    if (maxItemsPerBatch <= 0) {
+      maxItemsPerBatch = 1;
+    }
+    // If batch <= 0 (e.g., NO_BATCHING), choose the largest safe batch size that
+    // respects the DB parameter limit. Otherwise, cap the requested batch size at
+    // the safe maximum.
+    if (batch <= 0) {
+      return maxItemsPerBatch;
+    }
+    return Math.min(batch, maxItemsPerBatch);

Conversation

dengzhhu653 commented Mar 4, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

sonarqubecloud bot commented Mar 4, 2026

Quality Gate passed

Uh oh!

VenuReddy2103 Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dengzhhu653 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

dengzhhu653 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VenuReddy2103 Mar 4, 2026 •

edited

Loading