Opinionated changes to internode tracking #1121

sbtourist · 2024-05-15T18:30:23Z

Removed message count as bytes should really be the most important/accurate.
Tracked both inbound and outbound bytes, as writes take up inbound bytes, while reads take up outbound bytes.
Added internode sensors to custom params in the same place where we add the other sensors.
Added internode sensors for request.

* Removed message count as bytes should really be the most important/accurate. * Tracked both inbound and outbound bytes, as writes take up inbound bytes, while reads take up outbound bytes. * Added internode sensors to custom params in the same place where we add the other sensors. * Added internode sensors for request.

aymkhalil

Added few comments, but in general I'm convinced about the benefits of this approach:

Consistency with other sensors
Simpler implementation when compared to the MessagingService sink approach).

However, the trade-offs are:

Approximation of message size which I think is no big deal
Extra work for size calculation <-- I'm not sure about this - heart says its fine, brain says this is on the critical path of messaging but we can test for regression in performance.

aymkhalil · 2024-05-15T18:54:26Z

src/java/org/apache/cassandra/db/ReadCommandVerbHandler.java

        Message.Builder<ReadResponse> reply = message.responseWithBuilder(response);
-        addReadBytesSensorToResponse(reply, context);
+        int size = reply.currentSize(MessagingService.current_version);


At this point, the sensor custom headers themselves as not added so just confirming you intentionally want to exclude them? On one hand, they are not part of the user data, but at the end of the day they are bytes consuming network BW so I would've thought we should include them.

Yes this is intentional. Rationale is they should be a pale share of the overall message size in prod.

I get it, but bc it doesn't cost us anything to include them (just calculating the size right before reply.build() I think why not just include all sensors do - it is as accurate of a measure of network bytes as it gets. Alternatively, we could just keep the payload and forget the headers altogether.

bc it doesn't cost us anything to include them (just calculating the size right before reply.build() I think why not just include all sensors

It's a chicken and egg problem :) How do you include the internode bytes in the header if you haven't computed them yet?

Alternatively, we could just keep the payload and forget the headers altogether.

I actually like this :)

It's a chicken and egg problem :) How do you include the internode bytes in the header if you haven't computed them yet?

I'm just reference to the other UCU sensors (WRITE/READ/INDEX BYTES) can be included. The minimal bytes footprint to exclude is the one that actually captures INTERNODE BYTES

But payload only works for me too

aymkhalil · 2024-05-15T19:06:40Z

src/java/org/apache/cassandra/net/Message.java

+            return doBuild(hasId ? id : nextId());
+        }
+
+        public int currentSize(int version)


So basically we build the message twice now - once we calculate the size, and another when want to build the actual response, is this ok performance wise? This was one of the reasons to use register to the outbound sink (tradeoff being complexity of the overall approach, just making sure you think the redundant message build is worth it)

Point taken. The most expensive computation is probably the payload size, what if we cache it?

Not sure how caching the payload size would help. It seems you call it only once? I was also thinking building the message to calculate the size has a memory hit.

The size calculation of the message and including it in the headers is a chicken and egg problem. tbh don't see an effective way around it.

It seems you call it only once?

Once per message, but we build the message twice. The reason we cache it in the message is exactly the fact that it's the most expensive size computation.

I was also thinking building the message to calculate the size has a memory hit.

Building the Message is just a bunch of objects assignment, isn't it?

Once per message, but we build the message twice. The reason we cache it in the message is exactly the fact that it's the most expensive size computation.

Exactly - that's why I wasn't sure how caching would help, maybe I misunderstood your suggestion.

Building the Message is just a bunch of objects assignment, isn't it?

Fair point. No more concerns here.

aymkhalil · 2024-05-15T19:09:50Z

src/java/org/apache/cassandra/db/ReadCommandVerbHandler.java

-        RequestSensors sensors = new RequestSensors(keyspace, ImmutableSet.of(command.metadata()));
-        sensors.registerSensor(context, Type.READ_BYTES);
-        ExecutorLocals locals = ExecutorLocals.create(sensors);
+        ImmutableSet<TableMetadata> tables = ImmutableSet.of(command.metadata());


This is always a single table, not sure we need the Set.

aymkhalil · 2024-05-15T19:12:20Z

src/java/org/apache/cassandra/db/MutationVerbHandler.java

+        int perSensorSize = response.currentSize(MessagingService.current_version) / requestSensors.size();
+        requestSensors = RequestTracker.instance.get().getSensors(Type.INTERNODE_BYTES);
+        requestSensors.forEach(sensor -> RequestTracker.instance.get().incrementSensor(sensor.getContext(), sensor.getType(), perSensorSize));
+        RequestTracker.instance.get().syncAllSensors();


I think the syncAllSensors is redundant (and in other handlers) - it is already taken care of when mutation is applied.

But this comes after the mutation is applied right?

Oh right. In that case, the syncAllSensors call in the apply method (or when the row iterator is closed in the read case) is redundant. Should we just keep the call in the verb handler? Now that you made idempotent, I'm less concerned about it but still a small performance hit

aymkhalil · 2024-05-15T19:26:51Z

src/java/org/apache/cassandra/db/MutationVerbHandler.java

+        requestSensors = RequestTracker.instance.get().getSensors(Type.INDEX_WRITE_BYTES);
+        addSensorsToResponse(requestSensors, requestParam, tableParam, response);
+
+        // Add internode bytes sensors to the response after updating each per-table sensor with the current response


I commented in the ReadCommandVerbHandler regarding this - I mean if we go with the approximation, we need to make it consistent - here we are including the sensors values from above (WRITE_BYTES and INDEX_WRITE_BYTES) are already added, but in the read example, READ_BYTES is added after we calculate the size.

Good point, we can refactor this.

* Cache the payload size where possible.

sbtourist · 2024-05-23T16:48:22Z

@aymkhalil This should address the points previously discussed: a14504f

* Collect counter mutation WRITE_BYTES from replicas * Rely on replica count to accommodate for mutation replica sensor values * Empty-Commit * Track internode msg bytes and counts * Track internode sensors at the outbound sink level * Fix NPE for ks only sensors * Fix more NPEs in unit tests * Rebase on top of origin/cc-ucu * Remove ks from headers & track internode messages in MessagingService * Add internode sensors per ks and table * Opinionated changes to internode tracking (#1121) * Opinionated changes to internode tracking: * Removed message count as bytes should really be the most important/accurate. * Tracked both inbound and outbound bytes, as writes take up inbound bytes, while reads take up outbound bytes. * Added internode sensors to custom params in the same place where we add the other sensors. * Added internode sensors for request. * Added SensorsInternodeTest + bug fixes. * * Use only the payload size as internode bytes. * Cache the payload size where possible. * Use payload size for inbound messages --------- Co-authored-by: Sergio Bossa <sergio.bossa@gmail.com>

sbtourist force-pushed the cc-ucu-intenode-sb branch from 8aa88ca to 1bba5f7 Compare May 15, 2024 18:33

aymkhalil reviewed May 15, 2024

View reviewed changes

sbtourist added 2 commits May 16, 2024 11:12

Added SensorsInternodeTest + bug fixes.

0d57897

* Use only the payload size as internode bytes.

a14504f

* Cache the payload size where possible.

aymkhalil approved these changes May 23, 2024

View reviewed changes

aymkhalil merged commit 34f83f3 into cc-ucu-intenode May 24, 2024
434 of 445 checks passed

aymkhalil deleted the cc-ucu-intenode-sb branch May 24, 2024 16:35

aymkhalil mentioned this pull request Aug 17, 2024

CNDB-9665 Track internode sensor bytes for LWT #1152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opinionated changes to internode tracking #1121

Opinionated changes to internode tracking #1121

sbtourist commented May 15, 2024

aymkhalil left a comment

aymkhalil May 15, 2024

sbtourist May 16, 2024

aymkhalil May 16, 2024

sbtourist May 16, 2024

aymkhalil May 16, 2024

aymkhalil May 15, 2024

sbtourist May 16, 2024

aymkhalil May 16, 2024

sbtourist May 16, 2024

aymkhalil May 16, 2024

aymkhalil May 15, 2024

aymkhalil May 15, 2024

sbtourist May 16, 2024

aymkhalil May 16, 2024

aymkhalil May 15, 2024

sbtourist May 16, 2024

sbtourist commented May 23, 2024

Opinionated changes to internode tracking #1121

Opinionated changes to internode tracking #1121

Conversation

sbtourist commented May 15, 2024

aymkhalil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbtourist commented May 23, 2024