Skip to content

HIVE-29482: Direct SQL fails when number of query parameters exceeds …#6347

Open
dengzhhu653 wants to merge 5 commits intoapache:masterfrom
dengzhhu653:HIVE-29482
Open

HIVE-29482: Direct SQL fails when number of query parameters exceeds …#6347
dengzhhu653 wants to merge 5 commits intoapache:masterfrom
dengzhhu653:HIVE-29482

Conversation

@dengzhhu653
Copy link
Member

…PostgreSQL limit

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 4, 2026

batchSize = dbType.needsInBatching() ? 1000 : NO_BATCHING;
}
this.batchSize = batchSize;
this.directSqlInsertPart = new DirectSqlInsertPart(pm, dbType, batchSize);
Copy link
Contributor

@VenuReddy2103 VenuReddy2103 Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes oracle, sql server and postgres to limit the batch size to 1000 now. You would probably want to look at DatabaseProduct#getMaxRows() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I add the enforcement check in some other places as well

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent metastore direct-SQL/JDO operations from failing when the number of bound query parameters exceeds backend limits (notably PostgreSQL), by adjusting batching behavior based on the database product.

Changes:

  • Introduces DatabaseProduct.getMaxBatch and applies it to many Batchable.runBatched(...) call sites.
  • Updates several direct-SQL/JDO partition-related queries to run in (intended) safer batches.
  • Refactors partition-by-name direct SQL to use parameter placeholders instead of inlined quoted strings.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/MetaToolObjectStore.java Switches metadata-summary queries to use db-specific batching limits.
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java Uses db-specific batching for JDO partition operations and column stats retrieval.
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java Applies db-specific batching to multiple direct-SQL partition operations and parameterizes partition-name queries.
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlAggrStats.java Adds dbType field and attempts to batch aggregate stats queries by DB parameter limit.
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java Adds PostgreSQL parameter-limit logic and a new getMaxBatch helper.
Comments suppressed due to low confidence (2)

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:2310

  • In dropAllPartitionsAndGetLocations, batch is now passed as the Batchable.runBatched batch size, but the progress message treats it as the total number of batches (batchIdx/batch and remaining-time estimation using batch - batchIdx). This makes progress reporting incorrect and can produce negative/meaningless remaining-time estimates. Compute totalBatches separately from the chosen batch size (e.g., ceil(partIds.size()/batchSize)).
    int batch = dbType.getMaxBatch(batchSize, partIds.size());
    AtomicLong batchIdx = new AtomicLong(1);
    AtomicLong timeSpent = new AtomicLong(0);
    Batchable.runBatched(batch, partIds, new Batchable<Long, Void>() {
      @Override
      public List<Void> run(List<Long> input) throws Exception {
        StringBuilder progress = new StringBuilder("Dropping partitions, batch: ");
        long start = System.currentTimeMillis();
        progress.append(batchIdx.get()).append("/").append(batch);
        if (batchIdx.get() > 1) {
          long leftTime = (batch - batchIdx.get()) * timeSpent.get() / batchIdx.get();
          progress.append(", time left: ").append(leftTime).append("ms");
        }

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:3011

  • batch is now used as the Batchable.runBatched batch size, but the progress message treats it as the total number of batches (e.g., batchIdx/batch and remaining-time estimation using batch - batchIdx). This will make progress reporting incorrect and can produce negative/meaningless remaining time. Track batchSize and totalBatches separately (compute totalBatches from partNames.size() and the chosen batch size).
    int batch = dbType.getMaxBatch(batchSize, partNames.size() + 3);
    AtomicLong batchIdx = new AtomicLong(1);
    AtomicLong timeSpent = new AtomicLong(0);
    try {
      Batchable.runBatched(batch, partNames, new Batchable<String, Void>() {
        @Override
        public List<Void> run(List<String> input) throws MetaException {
          StringBuilder progress = new StringBuilder("Dropping partitions, batch: ");
          long start = System.currentTimeMillis();
          progress.append(batchIdx.get()).append("/").append(batch);
          if (batchIdx.get() > 1) {
            long leftTime = (batch - batchIdx.get()) * timeSpent.get() / batchIdx.get();
            progress.append(", time left: ").append(leftTime).append("ms");
          }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +863 to +869
int minBatch = batch;
if (isSQLSERVER()) {
minBatch = (totalParameters + 2100) / 2100;
} else if (isPOSTGRES()) {
minBatch = (totalParameters + 32767) / 32767;
}
return batch <= 0 ? minBatch : Math.max(batch, minBatch);
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMaxBatch currently computes a value that looks like a number of batches (ceil(totalParameters/limit)) and then is used as the Batchable.runBatched batch size throughout the PR. For batchSize == NO_BATCHING (-1) this will often return 1, turning what used to be a single query into N single-item queries (major performance regression) and still not enforcing the per-statement parameter limit. Rework this helper to return a safe batch size (max items per batch) based on the DB parameter limit and fixed/variable parameter counts, and ensure it never returns 0.

Suggested change
int minBatch = batch;
if (isSQLSERVER()) {
minBatch = (totalParameters + 2100) / 2100;
} else if (isPOSTGRES()) {
minBatch = (totalParameters + 32767) / 32767;
}
return batch <= 0 ? minBatch : Math.max(batch, minBatch);
// totalParameters is treated as the per-item parameter count. Compute the maximum
// number of items we can safely include in a single batch without exceeding the
// database's per-statement parameter limit.
int maxItemsPerBatch;
if (totalParameters <= 0) {
// If we cannot determine per-item parameter usage, fall back to the configured
// batch size without applying DB-specific limits.
return batch;
}
if (isSQLSERVER()) {
// SQL Server supports a maximum of 2100 parameters in a request.
maxItemsPerBatch = 2100 / totalParameters;
} else if (isPOSTGRES()) {
// PostgreSQL supports a maximum of 32767 parameters in a request.
maxItemsPerBatch = 32767 / totalParameters;
} else {
// No known hard parameter limit; use the configured batch size as-is.
return batch;
}
// Ensure we never return 0; at least one item per batch is always allowed.
if (maxItemsPerBatch <= 0) {
maxItemsPerBatch = 1;
}
// If batch <= 0 (e.g., NO_BATCHING), choose the largest safe batch size that
// respects the DB parameter limit. Otherwise, cap the requested batch size at
// the safe maximum.
if (batch <= 0) {
return maxItemsPerBatch;
}
return Math.min(batch, maxItemsPerBatch);

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to calculate the minimum batch needed given the total query parameters, not per row

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:3009

  • batch is now passed as the batchSize to Batchable.runBatched, but the same variable is also used as the “total batches” denominator in the progress message (batchIdx/.../batch) and in the time-left estimate. If batch is meant to be items-per-batch (as runBatched expects), the progress output and time-left math become incorrect.

Compute and use two separate values: an effective batchSize for runBatched, and a totalBatches value derived from partNames.size() and that batch size for progress reporting.

    int batch = dbType.getMaxBatch(batchSize, partNames.size() + 3);
    AtomicLong batchIdx = new AtomicLong(1);
    AtomicLong timeSpent = new AtomicLong(0);
    try {
      Batchable.runBatched(batch, partNames, new Batchable<String, Void>() {
        @Override
        public List<Void> run(List<String> input) throws MetaException {
          StringBuilder progress = new StringBuilder("Dropping partitions, batch: ");
          long start = System.currentTimeMillis();
          progress.append(batchIdx.get()).append("/").append(batch);
          if (batchIdx.get() > 1) {
            long leftTime = (batch - batchIdx.get()) * timeSpent.get() / batchIdx.get();

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java:2310

  • batch is used both as the Batchable.runBatched batch size and as the “total batches” value in the progress message/time-left calculation (batchIdx/.../batch). If batch represents items-per-batch, the progress output and time-left estimate will be incorrect.

Use an effective batchSize for runBatched, and compute totalBatches from partIds.size() and that batch size for the progress denominator.

    int batch = dbType.getMaxBatch(batchSize, partIds.size());
    AtomicLong batchIdx = new AtomicLong(1);
    AtomicLong timeSpent = new AtomicLong(0);
    Batchable.runBatched(batch, partIds, new Batchable<Long, Void>() {
      @Override
      public List<Void> run(List<Long> input) throws Exception {
        StringBuilder progress = new StringBuilder("Dropping partitions, batch: ");
        long start = System.currentTimeMillis();
        progress.append(batchIdx.get()).append("/").append(batch);
        if (batchIdx.get() > 1) {
          long leftTime = (batch - batchIdx.get()) * timeSpent.get() / batchIdx.get();
          progress.append(", time left: ").append(leftTime).append("ms");
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +188 to +200
int batch = dbType.getMaxBatch(batchSize, colNames.size() + partNames.size() + 4);
return Batchable.runBatched(batch, colNames, new Batchable<String, ColumnStatisticsObj>() {
@Override
public List<ColumnStatisticsObj> run(final List<String> inputColNames) throws MetaException {
return columnStatisticsObjForPartitionsBatch(catName, dbName, tableName,
partNames, inputColNames, engine, areAllPartsFound,
useDensityFunctionForNDVEstimation, ndvTuner, enableBitVector, enableKll);
}
});
}

/**
* Should be called with the list short enough to not trip up Oracle/etc.
*/
private List<ColumnStatisticsObj> columnStatisticsObjForPartitionsBatch(
String catName,
String dbName, String tableName,
List<String> partNames, List<String> colNames, String engine,
boolean areAllPartsFound, boolean useDensityFunctionForNDVEstimation,
double ndvTuner, boolean enableBitVector,
boolean enableKll) throws MetaException {
if (enableBitVector || enableKll) {
return aggrStatsUseJava(catName, dbName, tableName, partNames,
colNames, engine, areAllPartsFound, useDensityFunctionForNDVEstimation,
ndvTuner, enableBitVector, enableKll);
} else {
return aggrStatsUseDB(catName, dbName, tableName, partNames, colNames, engine,
useDensityFunctionForNDVEstimation, ndvTuner);
}
if (enableBitVector || enableKll) {
return aggrStatsUseJava(catName, dbName, tableName, partNames,
inputColNames, engine, areAllPartsFound, useDensityFunctionForNDVEstimation,
ndvTuner, enableBitVector, enableKll);
} else {
return aggrStatsUseDB(catName, dbName, tableName, partNames, inputColNames, engine,
useDensityFunctionForNDVEstimation, ndvTuner);
}
}});
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columnStatisticsObjForPartitions computes a DB-specific batch size but only applies it to batching over colNames. The SQL built in aggrStatsUseDB uses both inputColNames and the full partNames list as bind parameters (... in (%1$s) ... in (%2$s) plus 4 fixed params). If batchSize is NO_BATCHING (the default for Postgres today), the unbatched partNames can still exceed PostgreSQL’s 32767 parameter limit, so this change doesn’t fully address the failure mode.

To actually enforce the parameter limit, apply batching to the partNames dimension as well (or derive inputColNames and inputPartNames batch sizes from the same per-statement parameter budget, accounting for the 4 fixed params).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants