Skip to content

[chore](distribute plan) add Javadoc to UnassignedJob.computeAssignedJobs and related methods#63743

Open
morrySnow wants to merge 1 commit into
apache:masterfrom
morrySnow:add-job-comment
Open

[chore](distribute plan) add Javadoc to UnassignedJob.computeAssignedJobs and related methods#63743
morrySnow wants to merge 1 commit into
apache:masterfrom
morrySnow:add-job-comment

Conversation

@morrySnow
Copy link
Copy Markdown
Contributor

Summary

  • Add comprehensive Javadoc to UnassignedJob.computeAssignedJobs() and all its overrides across 10 concrete subclasses
  • Document the two-phase parallelization strategy in AbstractUnassignedScanJob (multipleMachinesParallelization + insideMachineParallelization)
  • Add Javadoc to all protected helper methods: degreeOfParallelism, assignLocalShuffleJobs, assignedDefaultJobs, useLocalShuffleToAddParallel, fillUpAssignedJobs, fillUpSingleEmptyInstance
  • Each subclass override explains its specific logic (colocate/bucket join, query cache partition optimization, missing bucket fill-up, etc.)

Test plan

  • Build passes: ./build.sh --fe
  • No behavioral change — comments only

🤖 Generated with Claude Code

…and related methods

Add comprehensive Javadoc comments to the worker job abstraction layer
in the Nereids distributed plan optimizer. The comments document:
- UnassignedJob interface and its computeAssignedJobs contract
- Two-phase parallelization strategy in AbstractUnassignedScanJob
  (cross-machine replica selection + intra-machine instance split)
- Compute logic for each concrete subclass (shuffle, gather, scan, etc.)
- Protected helper methods governing parallelism decisions and fallback

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morrySnow morrySnow changed the title [refactor](nereids) add Javadoc to UnassignedJob.computeAssignedJobs and related methods [chore](physcial plan) add Javadoc to UnassignedJob.computeAssignedJobs and related methods May 27, 2026
@morrySnow morrySnow changed the title [chore](physcial plan) add Javadoc to UnassignedJob.computeAssignedJobs and related methods [chore](distribute plan) add Javadoc to UnassignedJob.computeAssignedJobs and related methods May 27, 2026
@924060929
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the documentation-only PR that adds Javadocs for Nereids distributed worker jobs. I found two accuracy issues where the new comments describe behavior that the implementation does not actually provide.

Critical checkpoint conclusions:

  • Goal/test: The goal appears to be documenting UnassignedJob.computeAssignedJobs and related methods. No behavior changed and no tests are required for runtime behavior, but the documentation needs to match the code.
  • Scope: The PR is focused and small in behavior terms, but some descriptions overstate or misstate existing control flow.
  • Concurrency/lifecycle/config/compatibility/persistence/data writes: No code behavior changes in these areas.
  • Parallel paths: I checked the related scan, gather, shuffle, and specified-instance paths; the reported issues are distinct documentation inaccuracies in their respective methods.
  • Test coverage/results: No tests or result files were changed; acceptable for comment-only changes once the comments are accurate.
  • Observability/performance: No runtime changes.
  • User focus: No additional user-provided review focus was supplied.

Please fix the inaccurate Javadocs before merging.


/**
* If no workers could be selected for the metadata scan (e.g. all backends are
* unavailable), create a single empty instance on a random available worker
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is misleading. fillUpAssignedJobs() only reaches fillUpSingleEmptyInstance() when the selected scan ranges produced no assigned jobs; it does not recover from all BEs being unavailable. The fallback still calls workerManager.randomAvailableWorker(catalogId), which throws a NereidsException when no scan backend is available. Please describe this as the empty-scan-range/pruned case instead of an all-backends-unavailable fallback that prevents query failure.

* fragment. When the expected instance count is lower than the child count
* (e.g. due to session variable limits or query cache constraints), workers
* are shuffled to spread instances across different backends for load balancing.
* When more instances are needed, worker assignment follows the child layout.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence does not match the implementation. When expectInstanceNum is greater than or equal to the largest child fragment size, the else branch still calls buildInstances(biggestParallelChildFragment.size(), ...), so this method never creates "more instances" than the child layout. It only preserves the child instance count and worker placement. Please reword this to avoid implying that higher requested exchange parallelism increases the instance count here.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants