Skip to content

HDDS-13954. Add localLease for followerRead#9320

Merged
adoroszlai merged 13 commits intoapache:masterfrom
symious:HDDS-13954
Dec 8, 2025
Merged

HDDS-13954. Add localLease for followerRead#9320
adoroszlai merged 13 commits intoapache:masterfrom
symious:HDDS-13954

Conversation

@symious
Copy link
Contributor

@symious symious commented Nov 18, 2025

What changes were proposed in this pull request?

This ticket is to add a fast path check for follower read.

In most cases, we don't need so strong consistency.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13954

How was this patch tested?

Test with Freon FollowerRead.

@ivandika3 ivandika3 requested a review from szetszwo November 19, 2025 01:09
Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@symious , thanks for working on this! Please see the comment inlined.

Comment on lines 532 to 538
public static final String OZONE_OM_FOLLOWER_READ_LOCAL_LEASE_LAG_LIMIT =
"ozone.om.follower.read.local.lease.lag.limit";
public static final int OZONE_OM_FOLLOWER_READ_LOCAL_LEASE_LAG_LIMIT_DEFAULT = 30000;

public static final String OZONE_OM_FOLLOWER_READ_LOCAL_LEASE_TIME_MS =
"ozone.om.follower.read.local.lease.time.ms";
public static final long OZONE_OM_FOLLOWER_READ_LOCAL_LEASE_TIME_MS_DEFAULT = 15000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults seem too big. How about setting the lag to 1000 and the time to 3000ms (the same as OZONE_OM_RATIS_SERVER_REQUEST_TIMEOUT_DEFAULT)?

BTW, please move them to OMConfigKeys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo Thank you for the review.

Decrease the values to 10000 and 5000.
Log lag is kind of inflated during writing, better have a large buffer; time is set to the OM's ratis RPC timeout.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. LGTM +1.

Copy link
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @symious for this improvement — hoping it can push Ozone’s read performance to the next level.

I left a few comments. Beyond those, could we add at least two tests:

  1. Verify the read state machine routes through the follower’s local path when the option is enabled.
  2. With a very small commit-index gap timeout, verify the staleness detector triggers as expected.

public static final String OZONE_OM_GRPC_PORT_KEY =
"ozone.om.grpc.port";

public static final String OZONE_OM_FOLLOWER_READ_LOCAL_LEASE_ENABLED_KEY =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment in #9222 (comment). Could we group the read options so users and developers can understand them more easily?
We already have a strictness order in OzoneManagerProtocolServerSideTranslatorPB, which is part of this change:

if (stale read available) { ... }
else if (follower linearizable read) { ... }
else /* leader-only read */ { ... }

How about adding the follower stale read to this set of read options as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good, I'll update it then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think currently it's already the logic?

if (local lease read) 
else if (linearizable read)
end if

normal check leader and read from leader

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks—yes, I was referring to including follower stale reads within the read options. Your comment (#9222 (comment)) reminded me we should also cover fallback behavior. I’m thinking we introduce a clearer, self-contained set of read modes so users can configure intent without scattered flags:

  1. leader-only-read
  2. follower-linearizable-read
  3. follower-stale-read-or-fallback-follower-linearizable-read
    • Prefer follower stale read; if not possible, fall back to follower linearizable read
  4. follower-stale-read-or-fallback-leader-only-read
    • Prefer follower stale read; if not possible, fall back to leader-only read

These modes make the configuration atomic and avoid scattering multiple booleans. We should also add comprehensive documentation for each mode—consistency guarantees, when fallback triggers, and expected performance implications. The trade-off is that we’ll need a small translation layer to map these flexible modes to Ratis server settings and to the branching in OzoneManagerProtocolServerSideTranslatorPB. I’m fine to own that conversion logic so the config remains clean and self-contained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these flags should make users quite confused.

  1. leader-only-read
  2. follower-linearizable-read
  3. follower-stale-read + fallback option(leader or linear-read)

The above seems more fluent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterxcli For the configuration, I decided to keep the original check to isolate the locallease and linearizable part.

Linearizable read is set by OMs, but locallease configs can be set from OM and also client side for future usage.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @symious for the patch.

In addition to the comment about new configurations, I'd like to understand the state of this PR: it is approved, but there is some discussion afterwards about further changes (e.g. two test cases). Are those required or not?

If this is already good to go, I can address the comment about configuration in a follow-up.

Comment on lines 2269 to 2294
<property>
<name>ozone.om.follower.read.local.lease.enabled</name>
<value>false</value>
<tag>OZONE, OM, RATIS, MANAGEMENT</tag>
<description> If we enabled the local lease for Follower Read.
If enabled, follower OM will decide if return local data directly
based on lag log and time.
</description>
</property>
<property>
<name>ozone.om.follower.read.local.lease.lag.limit</name>
<value>10000</value>
<tag>OZONE, OM, RATIS, MANAGEMENT</tag>
<description> If the lag between leader OM and follower OM is larger
than this number, the follower OM is not up-to-date.
</description>
</property>
<property>
<name>ozone.om.follower.read.local.lease.time.ms</name>
<value>5000</value>
<tag>OZONE, OM, RATIS, MANAGEMENT</tag>
<description> If the lag time Ms between leader OM and follower OM is larger
than this number, the follower OM is not up-to-date.
By default, it's set to Ratis RPC timeout value.
</description>
</property>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be nice to introduce these new settings in OmConfig (see HDDS-12298), to avoid further bloating OzoneManager. Most importantly it would eliminate the boilerplate required for making the new properties reconfigurable to:

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @symious for updating the patch. Two minor items noted in the new test. Can be addressed in follow-up (in itself or as part of other changes) if there are no comments from others.

@peterxcli would you like to take another look?

Comment on lines 87 to 89
Assertions.assertTrue(getCluster().getOzoneManager(1).getMetrics().getNumFollowerReadLocalLeaseSuccess() > 0);
Assertions.assertEquals(0, getCluster().getOzoneManager(2).getMetrics().getNumFollowerReadLocalLeaseSuccess());
Assertions.assertTrue(getCluster().getOzoneManager(2).getMetrics().getNumFollowerReadLocalLeaseFailTime() > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, can be fixed in follow-up

  • please use assertThat(...).isPositive() instead of assertTrue(... > 0)
  • please add import static for assert...

Comment on lines 91 to 92
getCluster().getOzoneManager(1).setConfiguration(oldConf);
getCluster().getOzoneManager(2).setConfiguration(oldConf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, can be fixed in follow-up

Restore original state in finally block.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @symious for updating the patch. There is a checkstyle failure.

Comment on lines 28 to 30
import static org.assertj.core.api.AssertionsForClassTypes.assertThat;
import static org.junit.jupiter.api.Assertions.assertEquals;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move these before regular imports.

Also, please import org.assertj.core.api.Assertions.assertThat instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, PTAL.

@adoroszlai adoroszlai merged commit d1aecfd into apache:master Dec 8, 2025
43 checks passed
@adoroszlai
Copy link
Contributor

Thanks @symious for the patch, @ivandika3, @peterxcli, @szetszwo for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants