opt: reduce identification broadcast from one per farm to one per farmer, also for caches #2945

tediou5 · 2024-07-24T15:18:22Z

This is the first step towards #2900.

Next I'll try to replace the request with a stream(if its still necessary), and I think we can optimise the ids in the farmer (cache), e.g. by incrementing the first id, so that when we send an identify we only need one id, and the rest can be calculated directly from the index. For very large farmers, this should make the identify message much smaller.

The first commit is some moving and renaming(hope I'm right this time.).

Code contributor checklist:

I have read, understood and followed contributing guide

teor2345

Looks good to me, I just had some questions about changing how much detail we log.

They are all optional changes.

crates/subspace-farmer/src/cluster/cache.rs

crates/subspace-farmer/src/cluster/farmer.rs

teor2345

Thanks for those changes, I'm going to leave this to a more experienced reviewer to do the final approval.

nazar-pc

I see you have tackled the description of the #2900 literally by reducing number of network messages, but that isn't quite the solution I meant.

We still have a similar amount of network bandwidth used and we still process every single identification message just like we did before, so this probably improves situation a bit, but not fundamentally. It also breaks wire messages for those who upgrade components one by one.

What I expected to happen instead is this:

farmer and cache application instances generate ephemeral IDs during startup and use them for identification
controller keeps track of ephemeral farmers/caches instances and their internal farms/caches
when new farmer/cache ephemeral instance is discovered, stream request is made to retrieve the details, but otherwise details are not sent over the wire at all, saving both bandwidth on the network and compute on the controller since controller needs to do a single check for the whole bunch of potential farms

nazar-pc · 2024-07-29T09:45:22Z

crates/subspace-farmer/src/bin/subspace-farmer/commands/cluster/controller/farms.rs

+    let ClusterFarmerIdentifyFarmBroadcast { details } = identify_message;
+    for detail in details {


I'd extract another function for the loop so the indentation is kept the same before and after for the actual logic

tediou5 · 2024-07-29T10:23:59Z

I see you have tackled the description of the #2900 literally by reducing number of network messages, but that isn't quite the solution I meant.

We still have a similar amount of network bandwidth used and we still process every single identification message just like we did before, so this probably improves situation a bit, but not fundamentally. It also breaks wire messages for those who upgrade components one by one.

What I expected to happen instead is this:

farmer and cache application instances generate ephemeral IDs during startup and use them for identification

controller keeps track of ephemeral farmers/caches instances and their internal farms/caches

when new farmer/cache ephemeral instance is discovered, stream request is made to retrieve the details, but otherwise details are not sent over the wire at all, saving both bandwidth on the network and compute on the controller since controller needs to do a single check for the whole bunch of potential farms

@nazar-pc Oh, so you're saying that we're actually only checking their identity once by creating a stream, right? And then it keeps the stream connected, and pushes the new state when it changes.

tediou5 · 2024-07-29T10:29:42Z

Yeah, it's a better way to do it, and it's still forward compatible. Let me turn to him.

nazar-pc · 2024-07-29T10:33:51Z

@nazar-pc Oh, so you're saying that we're actually only checking their identity once by creating a stream, right? And then it keeps the stream connected, and pushes the new state when it changes.

Not really. The stream will only be used to send individual farm details. But as long as ephemeral farmer ID doesn't change, we know that farmer application didn't restart and don't need to worry about individual farms in it.

In fact if we derive ephemeral farmer ID from fingerprints of individual farms, we don't need to send individual fingerprints in farm details anymore and we'll be able to support quick farmer restarts without losing state on the controller.

So current farm notifications will be replaced with farmer notifications and farm details will be sent in a stream only if/when necessary.

tediou5 · 2024-08-04T12:44:49Z

In fact if we derive ephemeral farmer ID from fingerprints of individual farms, we don't need to send individual fingerprints in farm details anymore and we'll be able to support quick farmer restarts without losing state on the controller.

@nazar-pc I'm a bit confused here, do you mean that we include farm information (like some hash) in the farmer ID so that the derived id doesn't change every time the farmer is restarted if the configuration and machine don't change?

nazar-pc · 2024-08-05T12:00:59Z

@nazar-pc I'm a bit confused here, do you mean that we include farm information (like some hash) in the farmer ID so that the derived id doesn't change every time the farmer is restarted if the configuration and machine don't change?

Exactly. See how fingerprint is derived for farms right now, something similar will need to be done for the whole farmer instead.

tediou5 requested review from nazar-pc, shamil-gadelshin and rg3l3dr as code owners July 24, 2024 15:18

teor2345 reviewed Jul 26, 2024

View reviewed changes

crates/subspace-farmer/src/cluster/cache.rs Outdated Show resolved Hide resolved

crates/subspace-farmer/src/cluster/farmer.rs Outdated Show resolved Hide resolved

teor2345 reviewed Jul 29, 2024

View reviewed changes

nazar-pc reviewed Jul 29, 2024

View reviewed changes

tediou5 closed this Aug 3, 2024

tediou5 force-pushed the opt/optimize-farm-and-cache-identification branch from 4ac9713 to 5506663 Compare August 3, 2024 18:39

tediou5 deleted the opt/optimize-farm-and-cache-identification branch August 19, 2024 03:20

tediou5 mentioned this pull request Sep 7, 2024

Optimize farm and cache identification #2900

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: reduce identification broadcast from one per farm to one per farmer, also for caches #2945

opt: reduce identification broadcast from one per farm to one per farmer, also for caches #2945

tediou5 commented Jul 24, 2024

teor2345 left a comment

teor2345 left a comment

nazar-pc left a comment

nazar-pc Jul 29, 2024

tediou5 commented Jul 29, 2024 •

edited

Loading

tediou5 commented Jul 29, 2024

nazar-pc commented Jul 29, 2024

tediou5 commented Aug 4, 2024

nazar-pc commented Aug 5, 2024

		let ClusterFarmerIdentifyFarmBroadcast { details } = identify_message;
		for detail in details {

opt: reduce identification broadcast from one per farm to one per farmer, also for caches #2945

opt: reduce identification broadcast from one per farm to one per farmer, also for caches #2945

Conversation

tediou5 commented Jul 24, 2024

Code contributor checklist:

teor2345 left a comment

Choose a reason for hiding this comment

teor2345 left a comment

Choose a reason for hiding this comment

nazar-pc left a comment

Choose a reason for hiding this comment

nazar-pc Jul 29, 2024

Choose a reason for hiding this comment

tediou5 commented Jul 29, 2024 • edited Loading

tediou5 commented Jul 29, 2024

nazar-pc commented Jul 29, 2024

tediou5 commented Aug 4, 2024

nazar-pc commented Aug 5, 2024

tediou5 commented Jul 29, 2024 •

edited

Loading