Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(sync): simplify bitwise operations in bounded channel #133

Closed
wants to merge 4 commits into from

Conversation

dnut
Copy link
Contributor

@dnut dnut commented Apr 25, 2024

This aims to make the bitwise operations in the bounded channel implementation easier to read and understand. The channel logic is basically the same, but it uses the bits in the head, tail, and stamp differently. These usize values still encode the same information, but in a different format.

Existing approach

The current logic dynamically divides the usize into three segments: lap, disconnect, and index. Head, tail, and stamp are each divided in the same way.

The least significant bits are used to represent an index within the buffer. Just enough bits are used to represent the largest possible index. For example, if the buffer has a length of 7, the three least significant bits are used to represent the index.

The next largest bit is a flag to indicate whether the channel is disconnected. This bit is set in the tail when the channel is disconnected.

All larger bits represent the number of laps that have been circled around the ring buffer. If the buffer is 7 items in size, and we've sent/received a total of 15 items, that represents two complete laps plus one more item. Since the 4 least significant bits are used for the disconnect and the index, the fifth and so on bits will represent the number 2 for 2 laps. In other words, this is the number 2 left shifted by 4 bits, or 0b0010_0000. The struct has a field one_lap that represents the fifth bit.

The usize is also used to indicate whether a slot is occupied. There is no individual bit allocated for this purpose. Instead, a slot's stamp has a +1 added to it. We can identify that the slot is occupied when we see that it is one larger than we would expect it to be for that particular slot.

New approach

With the new approach, the regions are no longer dynamic. I use the two most significant bits in the usize as flags. Like before: head, tail, and stamp are each divided in the same way as each other.

The largest bit is the disconnect bit. If this bit is set in the tail, the channel is disconnected

The second largest bit is the occupied bit. If this is set in a slot's stamp, that means the slot is occupied.

All remaining bits represent the cumulative index. This is a monotonically increasing number. If 100 items have been sent, the tail will equal 100. The actual index in the buffer is cumulative_index % capacity. There is no need to calculate the number of laps, but you could do this with: cumulative_index / capacity.

The field one_lap is now just the capacity of the buffer. self.buffer.len could actually be used instead.

Why?

easier to make sense of at first glance

In the current code, +1 or ... + 1 == ... + one_lap is used in various places throughout the code, but it's not entirely clear why. The actual reason for this code is often to check or set a slot as occupied, but you won't know this until you read and understand the entire channel implementation. In the new approach, | occupied_bit is used for this purpose. occupied_bit is documented, so the meaning is clear, even if you haven't read code from other functions yet.

simpler logic

When moving the head or tail forward, the existing code checks whether the index will overflow, so it can restart at zero and increment the lap. With the new code, this is unnecessary. We can just increment the cumulative index with a single operation. The modulo operator handles wraparound when getting the index.

The existing code needs to explicitly separate the index and lap, and then work with them in separate steps. The new code still extracts the index, but doesn't need to extract the lap. The lap is no longer needed for anything.

Copy link
Contributor Author

@dnut dnut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments to explain the things that were changed: how they worked before, and how they work now.

@@ -27,13 +27,18 @@ pub fn Bounded(comptime T: type) type {
buffer: []Slot(T),
head: Atomic(usize),
tail: Atomic(usize),
disconnect_bit: usize, // if this bit is set on the tail, the channel is disconnected
one_lap_bit: usize,
one_lap: usize,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called it "one_lap" instead of "one_lap_bit" because it simply indicates the size of the channel and is not necessarily related to any particular bit.


// Inspect the corresponding slot.
std.debug.assert(index < self.buffer.len);
var slot = &self.buffer[index];
const stamp = slot.stamp.load(.acquire);

if (tail == stamp) {
const new_tail = if (index + 1 < self.buffer.len) tail + 1 else lap +| self.one_lap_bit;
if (self.tail.cmpxchgWeak(tail, new_tail, .seq_cst, .monotonic)) |current_tail| {
if (self.tail.cmpxchgWeak(tail, tail + 1, .seq_cst, .monotonic)) |current_tail| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still increments the tail to indicate that it moved up by one slot. But it can be done more simply than before.

All we need to do is +1 to the tail. A single integer counts the cumulative index, so we don't need special logic. The index can be extracted with %.

Comment on lines -117 to -118
const new_tail = if (index + 1 < self.buffer.len) tail + 1 else lap +| self.one_lap_bit;
if (self.tail.cmpxchgWeak(tail, new_tail, .seq_cst, .monotonic)) |current_tail| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This increments the tail to indicate that it moved up by one slot.

The way it works is that it increments the index, unless a lap was completed, in which case it would increment the lap and zero out the index. Lap and index are treated as separate integers so each need their own special logic.

// failed
tail = current_tail;
backoff.spin();
} else {
// succeeded
temp_slot.slot = slot;
temp_slot.stamp = tail + 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This +1 is used to indicate that the slot is occupied.

The slot's stamp follows the same format as the tail and can be decomposed into lap, disconnect bit, and index.

The value in the variable tail contains an index component that is equal to the index of the current slot. You can see that the index is extracted on line 108, and then used to access the slot at line 113. For example, let's say that the current index is 3.

At this line however, we're adding one to the tail. So the index component of the number being stored in the stamp is actually 4 and not 3. After trySend completes, slot 3's stamp contains a slot index of 4.

The meaning is this: When the index in the slot's stamp is actually one more than the index of the slot, it means that the slot is occupied. When the item is removed, the index is decremented to indicate that it is not occupied. This will become clear once you've seen all the other usages of the stamp throughout this file.

return;
}
} else if (tail + 1 == (stamp +| self.one_lap_bit)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks if the current slot is occupied.

Let's decompose it:

  • tail + 1: this increments the index component of the tail. so if the tail says "lap 10, index 3", now we have a value that means "lap 10, index 4"
  • stamp +| self.one_lap_bit: this increments the lap component of the stamp. so if the stamp says "lap 9, index 4", now we have "lap 10, index 4"

If the stamp index is 1 more than the tail index, it means the slot is occupied. Simply detecting this is enough to test for the slot being occupied. For example, if the tail is at a slot index of 3, and the index component of the stamp is 4, then we know it's occupied. We don't need to know anything about the lap to figure this out.

So the first bullet point is really the key component of this conditional. We could just extract the index component of the tail+1 and the index component of the stamp and compare them for equality. If they are equal then slot is occupied.

That could look something like this:

Suggested change
} else if (tail + 1 == (stamp +| self.one_lap_bit)) {
} else if (index + 1 == stamp & ~(self.one_lap_bit - 1)) {

But with the second bullet, we're also checking that the tail's lap is one ahead the stamp's lap. Well if we already know the slot is occupied, then the tail must be one lap ahead of the slot. So, checking the lap does not actually add anything after you've checked the index.

So, adding a lap just makes it so the equality operator will actually return true when the indexes are equal. This is just a way to simplify the code, so we don't need to extract out the index components and compare them in isolation. It ensures that the laps are equal whenever the indexes are equal, so we can easily check the equality of the indexes by just comparing the equality of the entire integer.

@@ -254,17 +251,17 @@ pub fn Bounded(comptime T: type) type {
} else {
// succeeded
temp_slot.slot = slot;
temp_slot.stamp = head +| self.one_lap_bit;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds one to the stamp component of the lap, and it subtracts 1 from the index component of the lap.

It's clear to see that the lap is being increment. The purpose of incrementing the lap is because the sender will compare the tail the the stamp to see if they are on the correct lap.

The index subtraction is more subtle.

We know that the index component of the stamp is already 1 higher than the index component of the head. We tested for this above in the if-statement. The reason it was 1 higher was to indicate that it was occupied. Well now, it is no longer going to be occupied, because we are receiving the item. So we're subtracting 1 from the index by setting it equal to the index component of the head.

In other words, this line of code is basically the same as the following:

temp_slot.stamp +|= self.one_lap_bit - 1;

@@ -254,17 +251,17 @@ pub fn Bounded(comptime T: type) type {
} else {
// succeeded
temp_slot.slot = slot;
temp_slot.stamp = head +| self.one_lap_bit;
temp_slot.stamp = head +| self.one_lap;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is basically the same. This adds a lap to the stamp, and unsets the occupied bit. We know the head will never have the occupied bit set, because the occupied bit is only for the stamp.

return error.disconnected;
}

// Deconstruct the tail.
const index = tail & (self.disconnect_bit - 1);
const lap = tail & ~(self.one_lap_bit - 1);
const index = tail % self.one_lap;

// Inspect the corresponding slot.
std.debug.assert(index < self.buffer.len);
var slot = &self.buffer[index];
const stamp = slot.stamp.load(.acquire);

if (tail == stamp) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(old logic) This checks that both the lap and index of the stamp are the same as the tail. If the index differs, it means the slot is occupied. If index is the same but the lap differs, it means there is another sender from a previous lap who has acquired the slot but has not yet finished writing to the slot.

(new logic) Now the occupied bit indicates it is occupied, and the lap is built in to the cumulative index.

@fence(.seq_cst);
const head = self.head.load(.unordered);

if (head +| self.one_lap_bit == tail) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we check to see if the head is really a full lap behind the tail. If so, we return full. If not, it means that a reader is in the process of reading the current slot, and we should be able to write to it soon.

return error.disconnected;
}

// Deconstruct the tail.
const index = tail & (self.disconnect_bit - 1);
const lap = tail & ~(self.one_lap_bit - 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the lap was only used to determine how to increment the tail. This is not needed in the new implementation because the tail can always be incremented with +1.

@0xNineteen
Copy link
Contributor

this pr has been stale for a while now (merge conflicts will likely be expensive to solve) - gonna close it and move it to this ticket which contains more context and a ref to this pr:

https://github.com/orgs/Syndica/projects/2/views/1?pane=issue&itemId=71252009

@0xNineteen 0xNineteen closed this Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants