refactor(sync): simplify bitwise operations in bounded channel #133

dnut · 2024-04-25T21:36:44Z

This aims to make the bitwise operations in the bounded channel implementation easier to read and understand. The channel logic is basically the same, but it uses the bits in the head, tail, and stamp differently. These usize values still encode the same information, but in a different format.

Existing approach

The current logic dynamically divides the usize into three segments: lap, disconnect, and index. Head, tail, and stamp are each divided in the same way.

The least significant bits are used to represent an index within the buffer. Just enough bits are used to represent the largest possible index. For example, if the buffer has a length of 7, the three least significant bits are used to represent the index.

The next largest bit is a flag to indicate whether the channel is disconnected. This bit is set in the tail when the channel is disconnected.

All larger bits represent the number of laps that have been circled around the ring buffer. If the buffer is 7 items in size, and we've sent/received a total of 15 items, that represents two complete laps plus one more item. Since the 4 least significant bits are used for the disconnect and the index, the fifth and so on bits will represent the number 2 for 2 laps. In other words, this is the number 2 left shifted by 4 bits, or 0b0010_0000. The struct has a field one_lap that represents the fifth bit.

The usize is also used to indicate whether a slot is occupied. There is no individual bit allocated for this purpose. Instead, a slot's stamp has a +1 added to it. We can identify that the slot is occupied when we see that it is one larger than we would expect it to be for that particular slot.

New approach

With the new approach, the regions are no longer dynamic. I use the two most significant bits in the usize as flags. Like before: head, tail, and stamp are each divided in the same way as each other.

The largest bit is the disconnect bit. If this bit is set in the tail, the channel is disconnected

The second largest bit is the occupied bit. If this is set in a slot's stamp, that means the slot is occupied.

All remaining bits represent the cumulative index. This is a monotonically increasing number. If 100 items have been sent, the tail will equal 100. The actual index in the buffer is cumulative_index % capacity. There is no need to calculate the number of laps, but you could do this with: cumulative_index / capacity.

The field one_lap is now just the capacity of the buffer. self.buffer.len could actually be used instead.

Why?

easier to make sense of at first glance

In the current code, +1 or ... + 1 == ... + one_lap is used in various places throughout the code, but it's not entirely clear why. The actual reason for this code is often to check or set a slot as occupied, but you won't know this until you read and understand the entire channel implementation. In the new approach, | occupied_bit is used for this purpose. occupied_bit is documented, so the meaning is clear, even if you haven't read code from other functions yet.

simpler logic

When moving the head or tail forward, the existing code checks whether the index will overflow, so it can restart at zero and increment the lap. With the new code, this is unnecessary. We can just increment the cumulative index with a single operation. The modulo operator handles wraparound when getting the index.

The existing code needs to explicitly separate the index and lap, and then work with them in separate steps. The new code still extracts the index, but doesn't need to extract the lap. The lap is no longer needed for anything.

…er have occupied_bit set). this logic was not needed in the old implementation, and it still isn't, for the same reason.

dnut

Added some comments to explain the things that were changed: how they worked before, and how they work now.

dnut · 2024-05-24T18:32:23Z

src/sync/bounded.zig

@@ -27,13 +27,18 @@ pub fn Bounded(comptime T: type) type {
        buffer: []Slot(T),
        head: Atomic(usize),
        tail: Atomic(usize),
-        disconnect_bit: usize, // if this bit is set on the tail, the channel is disconnected
-        one_lap_bit: usize,
+        one_lap: usize,


I called it "one_lap" instead of "one_lap_bit" because it simply indicates the size of the channel and is not necessarily related to any particular bit.

dnut · 2024-05-24T18:34:21Z

src/sync/bounded.zig


                // Inspect the corresponding slot.
                std.debug.assert(index < self.buffer.len);
                var slot = &self.buffer[index];
                const stamp = slot.stamp.load(.acquire);

                if (tail == stamp) {
-                    const new_tail = if (index + 1 < self.buffer.len) tail + 1 else lap +| self.one_lap_bit;
-                    if (self.tail.cmpxchgWeak(tail, new_tail, .seq_cst, .monotonic)) |current_tail| {
+                    if (self.tail.cmpxchgWeak(tail, tail + 1, .seq_cst, .monotonic)) |current_tail| {


This still increments the tail to indicate that it moved up by one slot. But it can be done more simply than before.

All we need to do is +1 to the tail. A single integer counts the cumulative index, so we don't need special logic. The index can be extracted with %.

dnut · 2024-05-24T18:37:15Z

src/sync/bounded.zig

-                    const new_tail = if (index + 1 < self.buffer.len) tail + 1 else lap +| self.one_lap_bit;
-                    if (self.tail.cmpxchgWeak(tail, new_tail, .seq_cst, .monotonic)) |current_tail| {


This increments the tail to indicate that it moved up by one slot.

The way it works is that it increments the index, unless a lap was completed, in which case it would increment the lap and zero out the index. Lap and index are treated as separate integers so each need their own special logic.

dnut · 2024-05-24T18:51:43Z

src/sync/bounded.zig

                        // failed
                        tail = current_tail;
                        backoff.spin();
                    } else {
                        // succeeded
                        temp_slot.slot = slot;
-                        temp_slot.stamp = tail + 1;


This +1 is used to indicate that the slot is occupied.

The slot's stamp follows the same format as the tail and can be decomposed into lap, disconnect bit, and index.

The value in the variable tail contains an index component that is equal to the index of the current slot. You can see that the index is extracted on line 108, and then used to access the slot at line 113. For example, let's say that the current index is 3.

At this line however, we're adding one to the tail. So the index component of the number being stored in the stamp is actually 4 and not 3. After trySend completes, slot 3's stamp contains a slot index of 4.

The meaning is this: When the index in the slot's stamp is actually one more than the index of the slot, it means that the slot is occupied. When the item is removed, the index is decremented to indicate that it is not occupied. This will become clear once you've seen all the other usages of the stamp throughout this file.

dnut · 2024-05-24T19:00:03Z

src/sync/bounded.zig

                        return;
                    }
-                } else if (tail + 1 == (stamp +| self.one_lap_bit)) {


This checks if the current slot is occupied.

Let's decompose it:

tail + 1: this increments the index component of the tail. so if the tail says "lap 10, index 3", now we have a value that means "lap 10, index 4"

stamp +| self.one_lap_bit: this increments the lap component of the stamp. so if the stamp says "lap 9, index 4", now we have "lap 10, index 4"

If the stamp index is 1 more than the tail index, it means the slot is occupied. Simply detecting this is enough to test for the slot being occupied. For example, if the tail is at a slot index of 3, and the index component of the stamp is 4, then we know it's occupied. We don't need to know anything about the lap to figure this out.

So the first bullet point is really the key component of this conditional. We could just extract the index component of the tail+1 and the index component of the stamp and compare them for equality. If they are equal then slot is occupied.

That could look something like this:

Suggested change

} else if (tail + 1 == (stamp +| self.one_lap_bit)) {

} else if (index + 1 == stamp & ~(self.one_lap_bit - 1)) {

But with the second bullet, we're also checking that the tail's lap is one ahead the stamp's lap. Well if we already know the slot is occupied, then the tail must be one lap ahead of the slot. So, checking the lap does not actually add anything after you've checked the index.

So, adding a lap just makes it so the equality operator will actually return true when the indexes are equal. This is just a way to simplify the code, so we don't need to extract out the index components and compare them in isolation. It ensures that the laps are equal whenever the indexes are equal, so we can easily check the equality of the indexes by just comparing the equality of the entire integer.

dnut · 2024-05-24T19:23:59Z

src/sync/bounded.zig

@@ -254,17 +251,17 @@ pub fn Bounded(comptime T: type) type {
                    } else {
                        // succeeded
                        temp_slot.slot = slot;
-                        temp_slot.stamp = head +| self.one_lap_bit;


This adds one to the stamp component of the lap, and it subtracts 1 from the index component of the lap.

It's clear to see that the lap is being increment. The purpose of incrementing the lap is because the sender will compare the tail the the stamp to see if they are on the correct lap.

The index subtraction is more subtle.

We know that the index component of the stamp is already 1 higher than the index component of the head. We tested for this above in the if-statement. The reason it was 1 higher was to indicate that it was occupied. Well now, it is no longer going to be occupied, because we are receiving the item. So we're subtracting 1 from the index by setting it equal to the index component of the head.

In other words, this line of code is basically the same as the following:

temp_slot.stamp +|= self.one_lap_bit - 1;

dnut · 2024-05-24T19:25:20Z

src/sync/bounded.zig

@@ -254,17 +251,17 @@ pub fn Bounded(comptime T: type) type {
                    } else {
                        // succeeded
                        temp_slot.slot = slot;
-                        temp_slot.stamp = head +| self.one_lap_bit;
+                        temp_slot.stamp = head +| self.one_lap;


The logic here is basically the same. This adds a lap to the stamp, and unsets the occupied bit. We know the head will never have the occupied bit set, because the occupied bit is only for the stamp.

dnut · 2024-05-24T19:32:57Z

src/sync/bounded.zig

                    return error.disconnected;
                }

                // Deconstruct the tail.
-                const index = tail & (self.disconnect_bit - 1);
-                const lap = tail & ~(self.one_lap_bit - 1);
+                const index = tail % self.one_lap;

                // Inspect the corresponding slot.
                std.debug.assert(index < self.buffer.len);
                var slot = &self.buffer[index];
                const stamp = slot.stamp.load(.acquire);

                if (tail == stamp) {


(old logic) This checks that both the lap and index of the stamp are the same as the tail. If the index differs, it means the slot is occupied. If index is the same but the lap differs, it means there is another sender from a previous lap who has acquired the slot but has not yet finished writing to the slot.

(new logic) Now the occupied bit indicates it is occupied, and the lap is built in to the cumulative index.

dnut · 2024-05-24T19:36:09Z

src/sync/bounded.zig

                    @fence(.seq_cst);
                    const head = self.head.load(.unordered);

-                    if (head +| self.one_lap_bit == tail) {


Here we check to see if the head is really a full lap behind the tail. If so, we return full. If not, it means that a reader is in the process of reading the current slot, and we should be able to write to it soon.

dnut · 2024-05-24T20:18:04Z

src/sync/bounded.zig

                    return error.disconnected;
                }

                // Deconstruct the tail.
-                const index = tail & (self.disconnect_bit - 1);
-                const lap = tail & ~(self.one_lap_bit - 1);


Previously the lap was only used to determine how to increment the tail. This is not needed in the new implementation because the tail can always be incremented with +1.

0xNineteen · 2024-07-17T12:49:14Z

this pr has been stale for a while now (merge conflicts will likely be expensive to solve) - gonna close it and move it to this ticket which contains more context and a ref to this pr:

https://github.com/orgs/Syndica/projects/2/views/1?pane=issue&itemId=71252009

dnut added 4 commits April 25, 2024 17:25

refactor(sync): simplify bitwise operations in bounded channel

5225830

Merge branch 'ultd/high-perf-channel-impl' into dnut/channel-bitops

3676b0d

refactor(channel): more intuitive disconnect_bit calculation

4d4da5c

refactor(channel): remove redundant occupied_bit unset (head will nev…

00bca9d

…er have occupied_bit set). this logic was not needed in the old implementation, and it still isn't, for the same reason.

dnut commented May 24, 2024

View reviewed changes

0xNineteen assigned dnut May 30, 2024

0xNineteen closed this Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(sync): simplify bitwise operations in bounded channel #133

refactor(sync): simplify bitwise operations in bounded channel #133

dnut commented Apr 25, 2024 •

edited

Loading

dnut left a comment

dnut May 24, 2024

dnut May 24, 2024

dnut May 24, 2024

dnut May 24, 2024

dnut May 24, 2024

dnut May 24, 2024

dnut May 24, 2024

dnut May 24, 2024

dnut May 24, 2024

dnut May 24, 2024

0xNineteen commented Jul 17, 2024

		const new_tail = if (index + 1 < self.buffer.len) tail + 1 else lap +\| self.one_lap_bit;
		if (self.tail.cmpxchgWeak(tail, new_tail, .seq_cst, .monotonic)) \|current_tail\| {

	} else if (tail + 1 == (stamp +\| self.one_lap_bit)) {
	} else if (index + 1 == stamp & ~(self.one_lap_bit - 1)) {

refactor(sync): simplify bitwise operations in bounded channel #133

refactor(sync): simplify bitwise operations in bounded channel #133

Conversation

dnut commented Apr 25, 2024 • edited Loading

Existing approach

New approach

Why?

easier to make sense of at first glance

simpler logic

dnut left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0xNineteen commented Jul 17, 2024

dnut commented Apr 25, 2024 •

edited

Loading