Skip to content

[server] Add authorization to Replication Control RPCs (notifyLeaderAndIsr, updateMetadata, stopReplica, adjustIsr)#3299

Open
vaibhavk1992 wants to merge 9 commits into
apache:mainfrom
vaibhavk1992:add-replication-control-authorization
Open

[server] Add authorization to Replication Control RPCs (notifyLeaderAndIsr, updateMetadata, stopReplica, adjustIsr)#3299
vaibhavk1992 wants to merge 9 commits into
apache:mainfrom
vaibhavk1992:add-replication-control-authorization

Conversation

@vaibhavk1992
Copy link
Copy Markdown
Contributor

Summary

Implements authorization checks for internal replication control RPC operations as part of issue #3249.

These are server-to-server RPCs used by the CoordinatorServer to manage replication state across TabletServers. Currently, these critical operations have no authorization checks, allowing any client to potentially call internal cluster management APIs.

Changes

Authorization Added to 4 Internal RPCs:

Method Location Authorization Resource
notifyLeaderAndIsr TabletService CLUSTER/WRITE cluster()
updateMetadata TabletService CLUSTER/WRITE cluster()
stopReplica TabletService CLUSTER/WRITE cluster()
adjustIsr CoordinatorService CLUSTER/WRITE cluster()

Implementation Pattern:

if (authorizer != null) {
    authorizer.authorize(currentSession(), WRITE, Resource.cluster());
}

Files Modified:

  1. TabletService.java

    • Added authorization to: notifyLeaderAndIsr, updateMetadata, stopReplica
  2. CoordinatorService.java

    • Added static import for OperationType.WRITE
    • Added authorization to: adjustIsr
  3. FlussAuthorizationITCase.java

    • Added test method: testInternalReplicationControlAuthorization
    • Added imports for: NotifyLeaderAndIsrRequest, UpdateMetadataRequest, StopReplicaRequest, AdjustIsrRequest

Key Design Decisions

  1. CLUSTER/WRITE Permission: These operations modify cluster replication state, consistent with other cluster control operations like rebalance() and cancelRebalance()

  2. Internal Session Bypass: The AbstractAuthorizer.isAuthorized() method automatically allows session.isInternal() requests, so internal server-to-server calls continue working seamlessly

  3. No Explicit isInternal() Check: No need to add explicit checks - the authorization framework handles it automatically

  4. External Client Protection: External clients attempting to call these internal RPCs will now receive AuthorizationException

Test Coverage

The new test testInternalReplicationControlAuthorization() verifies:

External clients blocked: All 4 methods throw AuthorizationException when called by external clients without CLUSTER/WRITE permission
Internal sessions bypass: Internal server-to-server calls do NOT throw AuthorizationException (they may fail for other reasons like invalid data, but not authorization)
Proper error messages: Authorization failures include the correct principal, operation type (WRITE), and resource (CLUSTER)

Security Impact

Before: External clients could call internal replication control RPCs ❌
After: Only internal servers or authorized clients with CLUSTER/WRITE permission can call these RPCs ✅

Backward Compatibility

  • ✅ Existing clusters with authorization disabled continue working unchanged
  • ✅ Internal server-to-server replication continues working (bypasses authorization)
  • ✅ Minimal code changes - only adding authorization checks, no logic changes

Related Issue

Closes #3249

@wuchong
Copy link
Copy Markdown
Member

wuchong commented May 18, 2026

Hi @vaibhavk1992,

I reviewed the pull request and identified that the failure is caused by the newly introduced test method testInternalReplicationControlAuthorization. I reproduced the same exception locally, and the issue resolves when this method is removed.

The root cause is that testInternalReplicationControlAuthorization sends an updateMetadata request with empty metadata to a TabletServer. Although the RPC succeeds due to proper client permissions, it corrupts the TabletServer's metadata, causing the server to fail. Consequently, subsequent tests, such as testTableExistsAuthorization, report NoneReplica errors.

@wuchong
Copy link
Copy Markdown
Member

wuchong commented May 18, 2026

In addition, these RPCs are intended to be internal (used for intra-cluster communication) rather than public. The newly added validation allows clients with cluster() permissions to access them, which introduces potential security risks to the cluster. A better approach would be to restrict these RPCs to internal sessions only, returning an error for any non-internal sessions.

@vaibhavk1992
Copy link
Copy Markdown
Contributor Author

vaibhavk1992 commented May 22, 2026

Hi @wuchong
Thanks for the review and feedback.
Do we know why when I try to run the test manually they pass but on github CL it is failing. Also When I run an tests for individual module like core the it runs forever. It becomes a bit difficult to test the changes every time on CL. Do we need to set any special config in local to run all the test before I commit? I am using java 11 to build and run test cases but not able to reproduce the same error on local.

vaibhav kumar and others added 9 commits May 22, 2026 15:52
Implements authorization checks for internal replication control RPC operations
as part of issue apache#3249. These are server-to-server RPCs used by the
CoordinatorServer to manage replication state across TabletServers.

Changes:
- Added CLUSTER/WRITE authorization to notifyLeaderAndIsr() in TabletService
- Added CLUSTER/WRITE authorization to updateMetadata() in TabletService
- Added CLUSTER/WRITE authorization to stopReplica() in TabletService
- Added CLUSTER/WRITE authorization to adjustIsr() in CoordinatorService
- Added comprehensive tests in FlussAuthorizationITCase

Authorization Implementation:
All four methods now check for CLUSTER/WRITE permission using the pattern:
  if (authorizer != null) {
      authorizer.authorize(currentSession(), WRITE, Resource.cluster());
  }

The existing AbstractAuthorizer automatically allows internal sessions
(via session.isInternal() check), so internal server-to-server calls
continue working while blocking external clients.

Test Coverage:
- Tests verify external clients are blocked without CLUSTER/WRITE permission
- Tests verify internal sessions bypass authorization checks
- Tests ensure proper AuthorizationException messages

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…n control RPCs

This commit enhances testInternalReplicationControlAuthorization() to include
comprehensive test coverage:

1. Test authorization denial (no permission) - verifies AuthorizationException
2. Grant CLUSTER/WRITE permission via ACL binding
3. Test authorization success (with permission) - verifies operations succeed
4. Test internal session bypass - verifies internal server calls allowed

The authorization success test creates ACL binding granting guestPrincipal
CLUSTER/WRITE permission, waits for ACL sync, then verifies all 4 internal
replication control operations (notifyLeaderAndIsr, updateMetadata, stopReplica,
adjustIsr) do NOT throw AuthorizationException when permission is granted.

This addresses feedback to test "during auth present things are working as expected".

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Previous implementation allowed external clients with CLUSTER/WRITE permission
to call internal replication control RPCs (notifyLeaderAndIsr, updateMetadata,
stopReplica, adjustIsr), which could corrupt cluster metadata.

Changed all 4 RPCs to strictly reject ANY external session, regardless of
permissions. These RPCs are now truly internal-only (session.isInternal()
must be true).

Updated testInternalReplicationControlAuthorization to verify rejection of
external sessions (including super users) without attempting operations that
would corrupt cluster state.

Fixes security vulnerability where external clients could send malformed
metadata updates.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace CompletableFuture.failedFuture() (Java 9+) with Java 8-compatible
completeExceptionally() pattern for internal RPC rejection:
- CoordinatorService.adjustIsr()
- TabletService.notifyLeaderAndIsr()
- TabletService.updateMetadata()
- TabletService.stopReplica()

This fixes the compile-on-jdk8 CI failure while maintaining the same
authorization behavior.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
CoordinatorHighAvailabilityITCase.testTabletServerRejectsStaleCoordinatorEpochAfterLeaderSwitch
was calling updateMetadata() with an external session, which now triggers
authorization check before coordinator epoch validation.

Since updateMetadata is internal-only RPC (commit 742bd61), external calls
are rejected with AuthorizationException regardless of coordinator epoch.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Test 5 of testInternalReplicationControlAuthorization was creating an
RpcClient without authentication configuration, causing AuthenticationException
instead of the expected AuthorizationException.

The test now properly configures SASL authentication (username/password)
when creating the root user RpcClient, matching the pattern used in
@beforeeach setup() and testNoAuthorizer().

This fixes the GitHub Actions CI failure where the test expected
AuthorizationException but received AuthenticationException due to
missing authentication credentials.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove Test 6 from testInternalReplicationControlAuthorization which was
causing TabletServer metadata corruption. The test used an internal session
to call notifyLeaderAndIsr with empty metadata, which passed authorization
but corrupted the server state, causing subsequent tests to fail with
NoneReplica errors.

Tests 1-5 remain and are sufficient to validate that external sessions
(including root/super user) are properly rejected when attempting to call
internal RPCs (notifyLeaderAndIsr, updateMetadata, stopReplica, adjustIsr).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@vaibhavk1992 vaibhavk1992 force-pushed the add-replication-control-authorization branch from 4fee26e to 6c8e92a Compare May 27, 2026 17:12
@vaibhavk1992
Copy link
Copy Markdown
Contributor Author

@wuchong Please review, I have fixed the test case.

vaibhavk1992 pushed a commit to vaibhavk1992/fluss that referenced this pull request May 29, 2026
Update snapshot management authorization to match the internal-only
RPC pattern from PR apache#3299. These RPCs reject ALL external sessions
(including superusers) since they are strictly server-to-server operations.

Changes:
- CoordinatorService: commitKvSnapshot, commitLakeTableSnapshot
- TabletService: notifyKvSnapshotOffset, notifyLakeTableOffset
- Test: Simplified to verify external sessions are rejected

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
vaibhavk1992 pushed a commit to vaibhavk1992/fluss that referenced this pull request May 29, 2026
…tiering RPCs

Update remote log and tiering authorization to match the internal-only
RPC pattern from PR apache#3299. These RPCs reject ALL external sessions
(including superusers) since they are strictly server-to-server operations.

Changes:
- CoordinatorService: commitRemoteLogManifest, lakeTieringHeartbeat
- TabletService: notifyRemoteLogOffsets
- Test: Simplified to verify external sessions are rejected

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[server] Add authorization to Replication Control RPCs (notifyLeaderAndIsr, updateMetadata, stopReplica, adjustIsr)

2 participants