[improvement](maxcompute) Simplify FE block ID requests for MaxCompute writes#62880
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
I found two issues that should be addressed before this PR merges.
- The new
max_compute_write_max_block_countsetting turns the previously hard FE-side block-id guard into an unconstrained runtime knob. If this is set above MaxCompute's actual per-session block cap, FE will now allocate out-of-contract block IDs and push the failure to the writer/runtime path. MaxComputeFeClientnow opens a fresh thrift socket for every block-id request. Because block IDs are requested on every writer rotation, this regresses the hot write path and increases FE connection churn compared with the old cached-client path.
Critical checkpoint conclusions:
- Goal of the task: move MaxCompute block-id allocation from the BE JNI native bridge into the Java connector and keep the FE allocation semantics. The refactor mostly achieves that, and
MaxComputeFeClientTestcovers basic success/redirect/error retry behavior, but the two issues above mean the new path is not yet equivalent/safe enough. - Change scope: the JNI cleanup and direct FE client are reasonably focused, but the config change also widens a previously hard-enforced guard.
- Concurrency: no new cross-thread locking issue was found in the touched code.
MCTransaction.allocateBlockIdRange()still relies onAtomicLong, and the new FE client is used per writer instance. - Lifecycle/static init: no special lifecycle or static-initialization issue found in the reviewed paths.
- Configuration: a new FE config was added. Its current shape is problematic because it makes the external block-id cap operator-configurable without preserving the hard upper bound.
- Compatibility/incompatible change: no new storage-format or FE/BE thrift compatibility issue was found in the touched fields.
- Parallel code paths: I did not find another live MaxCompute block-id allocation path after the JNI method removal.
- Special conditions: NOT_MASTER redirect handling is still present, but the transport implementation changed enough to introduce the connection-churn regression above.
- Test coverage: good unit coverage for the new redirect/retry logic; missing coverage for invalid block-count configuration and repeated block-allocation hot-path behavior.
- Test result updates: none in this PR.
- Observability: existing logs are sufficient for debugging this path.
- Transaction/persistence/data-write correctness: block IDs are still allocated from FE transaction state, but allowing the limit to exceed the external cap can break write correctness.
- Performance: per-request socket creation on block allocation is a hot-path regression.
- Other issues: no additional distinct issues found.
- User focus:
/tmp/review_focus.txtcontained no additional review focus, so there were no extra focus-specific findings.
| */ | ||
| @ConfField(mutable = true, masterOnly = true, description = { | ||
| "Maximum number of MaxCompute Storage API write block IDs that can be allocated in one write session."}) | ||
| public static long max_compute_write_max_block_count = 20000L; |
There was a problem hiding this comment.
PR #62578 made 20000 the FE-side guard for MaxCompute's per-session block-id cap. After turning that into an unconstrained config here, setting max_compute_write_max_block_count > 20000 lets allocateBlockIdRange() hand out block IDs that the previous FE-side guard explicitly refused. That moves the failure from deterministic FE validation to a later write-time error after data has already been produced. Can we keep 20000 as a hard upper bound and only make a lower soft cap configurable (or validate/clamp the config value before it is used)?
|
|
||
| private static <T> T callFe(TNetworkAddress address, int timeoutMs, | ||
| boolean useFramedTransport, FeCall<T> call) throws Exception { | ||
| TSocket socket = new TSocket(address.getHostname(), address.getPort(), timeoutMs); |
There was a problem hiding this comment.
Before this refactor, block-id allocation went through ThriftRpcHelper::rpc<FrontendServiceClient>(), so repeated requests reused Doris' FE client cache. callFe() now creates and closes a brand-new TSocket/TTransport for every requestBlockId(). Since requestBlockId() runs on every writer rotation (rotateCurrentBatchWriter()), small block sizes or many partitions now turn into one TCP connection per block. That is a hot-path regression and increases FE socket churn. Please keep a reusable client/transport per MaxComputeFeClient (or another pooled helper) instead of reconnecting for every allocation.
|
/review |
|
run buildall |
There was a problem hiding this comment.
Besides the two already-open review threads, I found one additional blocking issue.
Critical checkpoints
- Goal / correctness: The refactor mostly achieves the FE-direct block-id path, but it does not preserve availability across FE master failover. The added tests only prove steady-state success /
NOT_MASTERredirect. - Scope: The change is reasonably focused around MaxCompute block-id allocation.
- Concurrency:
MaxComputeFeClientis synchronized and appears single-writer scoped; I did not find a deadlock or lock-order issue in the touched code. - Lifecycle: The new client now lives for the whole writer lifetime, and that lifetime-long FE address pinning is the source of the failure below.
- Config:
max_compute_write_max_block_countis added; I am not duplicating the already-open thread about enforcing the historic20000upper bound. - Parallel code paths: I did not find another MaxCompute writer path in this PR that also needed the same fix.
- Special conditions:
NOT_MASTERis handled, but the old-master-down / no-redirect case is not. - Tests: Added FE unit and Java unit tests cover happy path, redirect, FE error, and retry. They do not cover master crash / heartbeat refresh behavior.
- Test result files: none changed.
- Observability: adequate for most FE-side failures; Java-side RPC failure logs could include
txn_id/write_session_id, but that is secondary. - Transaction / persistence: no EditLog changes; external transaction bookkeeping remains in-memory as before.
- Data writes: FE failover can now break further block-id allocation for an in-flight write.
- FE/BE variables: the new FE address / thrift-server-type params are wired through the BE-to-Java path.
- Performance: the latest revision fixes the already-open connection-churn regression by reusing a transport per writer.
- Other: I did not find further distinct issues beyond the already-open threads and the inline comment below.
|
|
||
| Exception lastException = null; | ||
| for (int retryTimes = 0; retryTimes < FETCH_BLOCK_ID_MAX_RETRY_TIMES; retryTimes++) { | ||
| TNetworkAddress requestAddress = copyAddress(masterAddress); |
There was a problem hiding this comment.
Before this refactor, each block-id allocation started from ExecEnv::GetInstance()->cluster_info()->master_fe_addr, so the next allocation could follow BE heartbeat updates after a master crash. Here the request address always comes from the client's cached masterAddress, which only changes after a successful NOT_MASTER response. If the old master dies before it can return NOT_MASTER, BE can learn the new leader, but this writer never re-reads it and keeps retrying the dead endpoint for the rest of the query. Can we refresh the baseline FE address between independent requestBlockId() calls (or on RPC failures) instead of pinning the whole writer lifetime to the first address?
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
| return new MaxComputeFeClient(new TNetworkAddress(host, port), timeoutMs, serverType); | ||
| } | ||
|
|
||
| MaxComputeFeClient(TNetworkAddress masterAddress, int rpcTimeoutMs, String thriftServerType) { |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
skip buildall |
|
skip check_coverage |
|
/review |
There was a problem hiding this comment.
I completed a full pass over the PR and did not add duplicate inline comments. The main blocking concerns I found are already present in existing review threads and still appear unresolved in the final patch: the MaxCompute FE client still pins the initial master FE address across RPC failures, so it can keep retrying a dead former master instead of re-reading BE cluster-info after failover; and max_compute_write_max_block_count remains an unconstrained startup config, so values above MaxCompute's 20000 block/session limit can still move the failure from FE validation to later write-time errors.
Critical checkpoint conclusions:
Goal/test coverage: The PR intends to move MaxCompute block-id allocation into the Java connector and add tests for success, redirect, FE error, and RPC retry. The tests cover basic client behavior but not FE failover after a dead old master or config values above 20000.
Scope/focus: The change is mostly focused, but it changes the hot write path and FE/BE interaction behavior.
Concurrency/lifecycle: The Java client serializes access and closes reusable transport on close/errors. No new lock-order issue found, but writer lifetime now owns a persistent FE transport and the already-raised failover lifecycle concern remains.
Configuration: A new FE config is added as non-mutable, but it is still not validated/clamped against the external 20000 hard limit.
Compatibility/protocol: New FE address params are injected by BE and consumed by the Java connector; no additional FE-BE thrift mismatch found. Mixed-version behavior is intentionally rejected for missing txn_id.
Parallel paths: I found no other remaining native call sites after removing requestMaxComputeBlockId.
Error handling/observability: RPC and FE statuses are propagated as IOException, and success logging was moved to debug. Existing FE request INFO logging was outside this PR diff.
Data correctness/transactions: Block allocation remains FE-master backed and session-id checked. The unresolved config cap can still allocate IDs beyond the MaxCompute-supported range if misconfigured.
Performance: Transport reuse was added after the prior review comment, so I did not find a new connection-per-block issue.
User focus: No additional user-provided review focus was supplied.
What problem does this PR solve?
Related PR: #62578
Instead of calling FE through the BE JNI C++ bridge:
MaxCompute connector Java -> BE JNI C++ -> FE
the MaxCompute connector now requests FE directly through thrift:
MaxCompute connector Java -> FE
A new MaxComputeFeClient is added under the MaxCompute connector to handle FE
methods.
MAX_BLOCK_COUNTvariable fromfe/fe-core/src/main/java/org/apache/doris/datasource/maxcompute/MCTransaction.javaand moves it to the FE config
max_compute_write_max_block_countThe default value is still 20000, so the existing behavior is preserved.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)