branch-3.1: [fix](cloud) modify some partition getVisibleVersion to getCachedVisibleVersion (#60072)#61038
branch-3.1: [fix](cloud) modify some partition getVisibleVersion to getCachedVisibleVersion (#60072)#61038mymeiyi wants to merge 1 commit intoapache:branch-3.1from
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
Backports #60072 to branch-3.1 to reduce cloud-mode reliance on live partition visible-version lookups by switching several call sites from Partition#getVisibleVersion() to Partition#getCachedVisibleVersion().
Changes:
- Replace
getVisibleVersion()withgetCachedVisibleVersion()in multiple FE code paths (diagnosis, metadata viewing, row count / sampling, proximate row count). - Simplify
CloudReplica#checkVersionCatchUp()to an unconditionaltrueimplementation (and remove now-unusedPartitionimport). - Remove partition-cache update request building logic from
RowBatchBuilderand aPartitionRange.PartitionSingle#debug()helper.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/system/Diagnoser.java | Uses cached partition version during tablet diagnosis. |
| fe/fe-core/src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java | Uses cached partition version when estimating tablet rows for sampling. |
| fe/fe-core/src/main/java/org/apache/doris/qe/cache/RowBatchBuilder.java | Removes partition-cache update request builder method. |
| fe/fe-core/src/main/java/org/apache/doris/qe/cache/PartitionRange.java | Removes PartitionSingle.debug() logging helper. |
| fe/fe-core/src/main/java/org/apache/doris/httpv2/rest/RowCountAction.java | Uses cached partition version when computing row counts. |
| fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java | Makes checkVersionCatchUp() always return true; removes Partition import and redundant logic. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTable.java | Uses cached partition version in proximateRowCount(). |
| fe/fe-core/src/main/java/org/apache/doris/catalog/MetadataViewer.java | Uses cached partition version when building replica status output. |
Comments suppressed due to low confidence (1)
fe/fe-core/src/main/java/org/apache/doris/system/Diagnoser.java:156
- In cloud mode,
CloudPartition.getCachedVisibleVersion()may be-1(unknown) until a meta-service fetch occurs. With this change,visibleVersioncan be-1, and the subsequentreplica.getVersion() != visibleVersioncheck will always report a version mismatch for every replica, makingReplicaVersionStatusmisleading/noisy in cloud. Consider skipping the replica-vs-partition version equality check whenConfig.isCloudMode()(or whenvisibleVersion < Partition.PARTITION_INIT_VERSION), or explicitly fetchingpartition.getVisibleVersion()only for this diagnostic path if accurate partition version is required.
// for local mode, getCachedVisibleVersion return visibleVersion.
// for cloud mode, the replica version is not updated.
long visibleVersion = partition.getCachedVisibleVersion();
for (Replica replica : replicas) {
// backend
do {
Backend be = infoService.getBackend(replica.getBackendIdWithoutException());
if (be == null) {
backendErr.append("Backend "
+ replica.getBackendIdWithoutException() + " does not exist. ");
break;
}
if (!be.isAlive()) {
backendErr.append("Backend " + replica.getBackendIdWithoutException() + " is not alive. ");
break;
}
if (be.isDecommissioned()) {
backendErr.append("Backend " + replica.getBackendIdWithoutException() + " is decommission. ");
break;
}
if (!be.isLoadAvailable()) {
backendErr.append("Backend " + replica.getBackendIdWithoutException() + " is not load available. ");
break;
}
if (!be.isQueryAvailable()) {
backendErr.append("Backend "
+ replica.getBackendIdWithoutException() + " is not query available. ");
break;
}
if (be.diskExceedLimit()) {
backendErr.append("Backend " + replica.getBackendIdWithoutException() + " has no space left. ");
break;
}
} while (false);
// version
if (replica.getVersion() != visibleVersion) {
versionErr.append("Replica on backend " + replica.getBackendIdWithoutException() + "'s version ("
+ replica.getVersion() + ") does not equal"
+ " to partition visible version (" + visibleVersion + ")");
} else if (replica.getLastFailedVersion() != -1) {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| } | ||
| } | ||
| return key; | ||
| } | ||
|
|
||
| /** | ||
| * Rowbatch split to Row | ||
| */ | ||
| public InternalService.PUpdateCacheRequest buildPartitionUpdateRequest(String sql) { | ||
| if (updateRequest == null) { | ||
| updateRequest = InternalService.PUpdateCacheRequest.newBuilder() | ||
| .setSqlKey(CacheProxy.getMd5(sql)) | ||
| .setCacheType(InternalService.CacheType.PARTITION_CACHE).build(); | ||
| } | ||
| HashMap<Long, List<byte[]>> partRowMap = new HashMap<>(); | ||
| List<byte[]> partitionRowList; | ||
| PartitionRange.PartitionKeyType cacheKey; | ||
| for (byte[] row : rowList) { | ||
| cacheKey = getKeyFromRow(row, keyIndex, keyType); | ||
| if (!cachePartMap.containsKey(cacheKey.realValue())) { | ||
| LOG.info("cant find partition key {}", cacheKey.realValue()); | ||
| continue; | ||
| } | ||
| if (!partRowMap.containsKey(cacheKey.realValue())) { | ||
| partitionRowList = Lists.newArrayList(); | ||
| partitionRowList.add(row); | ||
| partRowMap.put(cacheKey.realValue(), partitionRowList); | ||
| } else { | ||
| partRowMap.get(cacheKey.realValue()).add(row); | ||
| } | ||
| } | ||
|
|
||
| for (HashMap.Entry<Long, List<byte[]>> entry : partRowMap.entrySet()) { | ||
| Long key = entry.getKey(); | ||
| PartitionRange.PartitionSingle partition = cachePartMap.get(key); | ||
| partitionRowList = entry.getValue(); | ||
| updateRequest = updateRequest.toBuilder() | ||
| .addValues(InternalService.PCacheValue.newBuilder() | ||
| .setParam(InternalService.PCacheParam.newBuilder() | ||
| .setPartitionKey(key) | ||
| .setLastVersion(partition.getPartition().getVisibleVersion()) | ||
| .setLastVersionTime(partition.getPartition().getVisibleVersionTime()) | ||
| .build()).setDataSize(dataSize).addAllRows( | ||
| partitionRowList.stream().map(ByteString::copyFrom) | ||
| .collect(Collectors.toList()))).build(); | ||
| } | ||
| return updateRequest; | ||
| } | ||
| } |
There was a problem hiding this comment.
buildPartitionUpdateRequest() was removed from RowBatchBuilder, but it is still referenced by PartitionCache.updateCache() (see PartitionCache.java calling rowBatchBuilder.buildPartitionUpdateRequest(...)). This will cause a compilation failure and also breaks partition cache update functionality. Either restore this method (updated to use getCachedVisibleVersion() as intended) or change PartitionCache to use the new/alternative update-request builder API.
pick #60072