HDDS-13954. Add localLease for followerRead#9320
Conversation
| public static final String OZONE_OM_FOLLOWER_READ_LOCAL_LEASE_LAG_LIMIT = | ||
| "ozone.om.follower.read.local.lease.lag.limit"; | ||
| public static final int OZONE_OM_FOLLOWER_READ_LOCAL_LEASE_LAG_LIMIT_DEFAULT = 30000; | ||
|
|
||
| public static final String OZONE_OM_FOLLOWER_READ_LOCAL_LEASE_TIME_MS = | ||
| "ozone.om.follower.read.local.lease.time.ms"; | ||
| public static final long OZONE_OM_FOLLOWER_READ_LOCAL_LEASE_TIME_MS_DEFAULT = 15000; |
There was a problem hiding this comment.
The defaults seem too big. How about setting the lag to 1000 and the time to 3000ms (the same as OZONE_OM_RATIS_SERVER_REQUEST_TIMEOUT_DEFAULT)?
BTW, please move them to OMConfigKeys.
There was a problem hiding this comment.
@szetszwo Thank you for the review.
Decrease the values to 10000 and 5000.
Log lag is kind of inflated during writing, better have a large buffer; time is set to the OM's ratis RPC timeout.
szetszwo
left a comment
There was a problem hiding this comment.
+1 the change looks good.
ivandika3
left a comment
There was a problem hiding this comment.
Thanks for the patch. LGTM +1.
peterxcli
left a comment
There was a problem hiding this comment.
Thanks @symious for this improvement — hoping it can push Ozone’s read performance to the next level.
I left a few comments. Beyond those, could we add at least two tests:
- Verify the read state machine routes through the follower’s local path when the option is enabled.
- With a very small commit-index gap timeout, verify the staleness detector triggers as expected.
...main/java/org/apache/hadoop/ozone/protocolPB/OzoneManagerProtocolServerSideTranslatorPB.java
Show resolved
Hide resolved
| public static final String OZONE_OM_GRPC_PORT_KEY = | ||
| "ozone.om.grpc.port"; | ||
|
|
||
| public static final String OZONE_OM_FOLLOWER_READ_LOCAL_LEASE_ENABLED_KEY = |
There was a problem hiding this comment.
I left a comment in #9222 (comment). Could we group the read options so users and developers can understand them more easily?
We already have a strictness order in OzoneManagerProtocolServerSideTranslatorPB, which is part of this change:
if (stale read available) { ... }
else if (follower linearizable read) { ... }
else /* leader-only read */ { ... }How about adding the follower stale read to this set of read options as well?
There was a problem hiding this comment.
This change looks good, I'll update it then.
There was a problem hiding this comment.
I think currently it's already the logic?
if (local lease read)
else if (linearizable read)
end if
normal check leader and read from leader
There was a problem hiding this comment.
Thanks—yes, I was referring to including follower stale reads within the read options. Your comment (#9222 (comment)) reminded me we should also cover fallback behavior. I’m thinking we introduce a clearer, self-contained set of read modes so users can configure intent without scattered flags:
- leader-only-read
- follower-linearizable-read
- follower-stale-read-or-fallback-follower-linearizable-read
- Prefer follower stale read; if not possible, fall back to follower linearizable read
- follower-stale-read-or-fallback-leader-only-read
- Prefer follower stale read; if not possible, fall back to leader-only read
These modes make the configuration atomic and avoid scattering multiple booleans. We should also add comprehensive documentation for each mode—consistency guarantees, when fallback triggers, and expected performance implications. The trade-off is that we’ll need a small translation layer to map these flexible modes to Ratis server settings and to the branching in OzoneManagerProtocolServerSideTranslatorPB. I’m fine to own that conversion logic so the config remains clean and self-contained.
There was a problem hiding this comment.
I think these flags should make users quite confused.
- leader-only-read
- follower-linearizable-read
- follower-stale-read + fallback option(leader or linear-read)
The above seems more fluent.
There was a problem hiding this comment.
@peterxcli For the configuration, I decided to keep the original check to isolate the locallease and linearizable part.
Linearizable read is set by OMs, but locallease configs can be set from OM and also client side for future usage.
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @symious for the patch.
In addition to the comment about new configurations, I'd like to understand the state of this PR: it is approved, but there is some discussion afterwards about further changes (e.g. two test cases). Are those required or not?
If this is already good to go, I can address the comment about configuration in a follow-up.
| <property> | ||
| <name>ozone.om.follower.read.local.lease.enabled</name> | ||
| <value>false</value> | ||
| <tag>OZONE, OM, RATIS, MANAGEMENT</tag> | ||
| <description> If we enabled the local lease for Follower Read. | ||
| If enabled, follower OM will decide if return local data directly | ||
| based on lag log and time. | ||
| </description> | ||
| </property> | ||
| <property> | ||
| <name>ozone.om.follower.read.local.lease.lag.limit</name> | ||
| <value>10000</value> | ||
| <tag>OZONE, OM, RATIS, MANAGEMENT</tag> | ||
| <description> If the lag between leader OM and follower OM is larger | ||
| than this number, the follower OM is not up-to-date. | ||
| </description> | ||
| </property> | ||
| <property> | ||
| <name>ozone.om.follower.read.local.lease.time.ms</name> | ||
| <value>5000</value> | ||
| <tag>OZONE, OM, RATIS, MANAGEMENT</tag> | ||
| <description> If the lag time Ms between leader OM and follower OM is larger | ||
| than this number, the follower OM is not up-to-date. | ||
| By default, it's set to Ratis RPC timeout value. | ||
| </description> | ||
| </property> |
There was a problem hiding this comment.
I would be nice to introduce these new settings in OmConfig (see HDDS-12298), to avoid further bloating OzoneManager. Most importantly it would eliminate the boilerplate required for making the new properties reconfigurable to:
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @symious for updating the patch. Two minor items noted in the new test. Can be addressed in follow-up (in itself or as part of other changes) if there are no comments from others.
@peterxcli would you like to take another look?
| Assertions.assertTrue(getCluster().getOzoneManager(1).getMetrics().getNumFollowerReadLocalLeaseSuccess() > 0); | ||
| Assertions.assertEquals(0, getCluster().getOzoneManager(2).getMetrics().getNumFollowerReadLocalLeaseSuccess()); | ||
| Assertions.assertTrue(getCluster().getOzoneManager(2).getMetrics().getNumFollowerReadLocalLeaseFailTime() > 0); |
There was a problem hiding this comment.
nit, can be fixed in follow-up
- please use
assertThat(...).isPositive()instead ofassertTrue(... > 0) - please add
import staticforassert...
| getCluster().getOzoneManager(1).setConfiguration(oldConf); | ||
| getCluster().getOzoneManager(2).setConfiguration(oldConf); |
There was a problem hiding this comment.
nit, can be fixed in follow-up
Restore original state in finally block.
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @symious for updating the patch. There is a checkstyle failure.
| import static org.assertj.core.api.AssertionsForClassTypes.assertThat; | ||
| import static org.junit.jupiter.api.Assertions.assertEquals; | ||
|
|
There was a problem hiding this comment.
Please move these before regular imports.
Also, please import org.assertj.core.api.Assertions.assertThat instead.
|
Thanks @symious for the patch, @ivandika3, @peterxcli, @szetszwo for the review. |
What changes were proposed in this pull request?
This ticket is to add a fast path check for follower read.
In most cases, we don't need so strong consistency.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13954
How was this patch tested?
Test with Freon FollowerRead.