-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Short Circuit read/write #17243
Conversation
8b14b24
to
977fde5
Compare
8605ed6
to
dc05cb1
Compare
Shall the short-circuit-related metrics be removed in this PR as well? Or metrics cleanup will be an another process? |
253ed31
to
6ecdda5
Compare
dc05cb1
to
8252255
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some hard coded tests are complaining:
2023-07-14T07:29:09.9167401Z 07:29:09.915 [ERROR] Failures:
2023-07-14T07:29:09.9168088Z 07:29:09.915 [ERROR] alluxio.cli.fsadmin.report.MetricsCommandTest.metrics
2023-07-14T07:29:09.9169006Z 07:29:09.915 [ERROR] Run 1: MetricsCommandTest.metrics:62->checkIfOutputValid:145
2023-07-14T07:29:09.9175392Z Expected: iterable containing ["Cluster.BytesReadDomain (Type: GAUGE, Value: 4145.73KB)", "Cluster.BytesReadDomainThroughput (Type: GAUGE, Value: 29.97MB/MIN)", "Cluster.BytesReadLocal (Type: GAUGE, Value: 11.47GB)", "Cluster.BytesReadLocalThroughput (Type: GAUGE, Value: 117.42MB/MIN)", "Cluster.BytesReadRemote (Type: GAUGE, Value: 401.79MB)", "Cluster.BytesReadRemoteThroughput (Type: GAUGE, Value: 518.36MB/MIN)", "Cluster.BytesReadUfsAll (Type: GAUGE, Value: 509.47MB)", "Cluster.BytesReadUfsThroughput (Type: GAUGE, Value: 728.16KB/MIN)", "Cluster.BytesWrittenDomain (Type: GAUGE, Value: 62.43MB)", "Cluster.BytesWrittenDomainThroughput (Type: GAUGE, Value: 1202.37KB/MIN)", "Cluster.BytesWrittenRemote (Type: GAUGE, Value: 22.98KB)", "Cluster.BytesWrittenRemoteThroughput (Type: GAUGE, Value: 8.03MB/MIN)", "Cluster.BytesWrittenUfsAll (Type: GAUGE, Value: 317.70KB)", "Cluster.BytesWrittenUfsThroughput (Type: GAUGE, Value: 33.46KB/MIN)", "Cluster.CapacityTotal (Type: GAUGE, Value: 1,154,531,246,129,122)", "Master.CompleteFileOps (Type: COUNTER, Value: 813)", "Master.UfsSessionCount-Ufs:_alluxio_underFSStorage (Type: COUNTER, Value: 8,535)", "Master.UfsSessionCount-Ufs:file:___Users_alluxio_alluxioMountedFolder (Type: COUNTER, Value: 1,231)", "Master.getMetrics.User:alluxio (Type: TIMER, Value: 4)"]
2023-07-14T07:29:09.9178398Z but: item 2: was "Cluster.BytesReadRemote (Type: GAUGE, Value: 401.79MB)"
2023-07-14T07:29:09.9179129Z 07:29:09.915 [ERROR] Run 2: MetricsCommandTest.metrics:62->checkIfOutputValid:145
2023-07-14T07:29:09.9184592Z Expected: iterable containing ["Cluster.BytesReadDomain (Type: GAUGE, Value: 4145.73KB)", "Cluster.BytesReadDomainThroughput (Type: GAUGE, Value: 29.97MB/MIN)", "Cluster.BytesReadLocal (Type: GAUGE, Value: 11.47GB)", "Cluster.BytesReadLocalThroughput (Type: GAUGE, Value: 117.42MB/MIN)", "Cluster.BytesReadRemote (Type: GAUGE, Value: 401.79MB)", "Cluster.BytesReadRemoteThroughput (Type: GAUGE, Value: 518.36MB/MIN)", "Cluster.BytesReadUfsAll (Type: GAUGE, Value: 509.47MB)", "Cluster.BytesReadUfsThroughput (Type: GAUGE, Value: 728.16KB/MIN)", "Cluster.BytesWrittenDomain (Type: GAUGE, Value: 62.43MB)", "Cluster.BytesWrittenDomainThroughput (Type: GAUGE, Value: 1202.37KB/MIN)", "Cluster.BytesWrittenRemote (Type: GAUGE, Value: 22.98KB)", "Cluster.BytesWrittenRemoteThroughput (Type: GAUGE, Value: 8.03MB/MIN)", "Cluster.BytesWrittenUfsAll (Type: GAUGE, Value: 317.70KB)", "Cluster.BytesWrittenUfsThroughput (Type: GAUGE, Value: 33.46KB/MIN)", "Cluster.CapacityTotal (Type: GAUGE, Value: 1,154,531,246,129,122)", "Master.CompleteFileOps (Type: COUNTER, Value: 813)", "Master.UfsSessionCount-Ufs:_alluxio_underFSStorage (Type: COUNTER, Value: 8,535)", "Master.UfsSessionCount-Ufs:file:___Users_alluxio_alluxioMountedFolder (Type: COUNTER, Value: 1,231)", "Master.getMetrics.User:alluxio (Type: TIMER, Value: 4)"]
2023-07-14T07:29:09.9187204Z but: item 2: was "Cluster.BytesReadRemote (Type: GAUGE, Value: 401.79MB)"
Other than that, I browsed all changes and I think they mostly LGTM. I need to take a closer look tonight. From the scale of this PR (~2000 lines), I figure I can do a full scan of all changed lines.
If a change grows much larger than that (>3000 lines or even much larger than that), I believe it's close to impossible to do a thorough review. And for those PRs I'd recommend:
- Break it into smaller reviewable parts and allow leftovers in between, as long as leftovers do not interfere other code;
- Or try to allow ONLY removing code files as a whole - try to do changes as sparingly as possible because the changes cannot be reviewed or noticed at all. If a change in-line is compulsory, segregate that to one commit and log it in description. That was the approach we collectively took in Remove block interface - worker side #17638 with ~40k lines of change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work!
Found some extra occurrences of the term short circuit, which are not included here. It depends on your intention of this PR whether we want to fix those individual occurrences scattered every where, or you may create a separate one to address those comment-only occurrences.
Some other relevant changes you won't be able to cover in this PR, but might need extra awareness:
- In web UI there are at least some gauges for short circuit related metrics for display
- Some relevant sections in the doc pages
* | ||
* @return the cache hit local | ||
*/ | ||
public String getCacheHitLocal() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[no action required in this PR I guess]
Seems it used to be:
local -> short circuit
domain -> domain socket
remote -> remote grpc not including local domain socket traffic
Now we take away "short circuit", then "local" is gone. Suddenly we only have "domain" and "remote" and no "local", although "domain" should theoretically be "local".
My point is, the opposite of "remote" should be "local" instead of "domain". Now the naming becomes a bit misleading because "local" is gone. A user may just ask "Is domain local or not?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with the code change here, in that we are just removing code instead of doing a refactor. It's just we may need to rename the metrics in the next refactor attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. let's revisit the metrics name later. I will leave a TODO here
@@ -115,49 +114,27 @@ public static BlockInStream create(FileSystemContext context, BlockInfo info, | |||
} | |||
|
|||
AlluxioConfiguration alluxioConf = context.getClusterConf(); | |||
boolean shortCircuit = alluxioConf.getBoolean(PropertyKey.USER_SHORT_CIRCUIT_ENABLED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the comment of this method, there's "short-circuit" mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -57,9 +57,6 @@ private Factory() {} // prevent instantiation | |||
public static DataWriter create(FileSystemContext context, long blockId, long blockSize, | |||
WorkerNetAddress address, OutStreamOptions options) throws IOException { | |||
AlluxioConfiguration alluxioConf = context.getClusterConf(); | |||
boolean shortCircuit = alluxioConf.getBoolean(PropertyKey.USER_SHORT_CIRCUIT_ENABLED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In GrpcDataWriter
there are comments mentioning "short-circuit". I found those occurrences by searching "shortcircuit" and "short-circuit" and "short circuit".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other occurrences include Format.java
, package-info.java
, BlockInStream.java
, DecommissionWorkerCommand
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java comment part addressed. PTAL
For webui/metrics, we shall have a separate round of revamp
@@ -115,49 +114,27 @@ public static BlockInStream create(FileSystemContext context, BlockInfo info, | |||
} | |||
|
|||
AlluxioConfiguration alluxioConf = context.getClusterConf(); | |||
boolean shortCircuit = alluxioConf.getBoolean(PropertyKey.USER_SHORT_CIRCUIT_ENABLED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -57,9 +57,6 @@ private Factory() {} // prevent instantiation | |||
public static DataWriter create(FileSystemContext context, long blockId, long blockSize, | |||
WorkerNetAddress address, OutStreamOptions options) throws IOException { | |||
AlluxioConfiguration alluxioConf = context.getClusterConf(); | |||
boolean shortCircuit = alluxioConf.getBoolean(PropertyKey.USER_SHORT_CIRCUIT_ENABLED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work!
alluxio-bot, merge this please |
merge failed: |
alluxio-bot, merge this please |
What changes are proposed in this pull request?
Remove source code related to Short Circuit read/write
Why are the changes needed?
Does this PR introduce any user facing changes?
No more Short Circuit read/write