Add encoding of consumer group #222

Yangsx-1 · 2024-05-23T03:35:59Z

Add consumer group related data structure.

torwig

Added some ideas. Please, double check the following

the timestamp of the last consumer interaction
the timestamp of the last successful consumer interaction (actually read or claimed some entries).

maybe I missed something (I think, it's hard to store the last idle time because there was no activity by the definition :) )

torwig · 2024-05-23T15:36:58Z

community/data-structure-on-rocksdb.md

@@ -265,6 +265,39 @@ key|version|EID MS|EID SEQ => |     encoded value     |
                              +-----------------------+
 ```

+#### stream consumer group metadata
+
+The consumer group metadata contains the basic information of the consumer group used in XINFO GROUPS. The key is composed by the stream key, version, group name and a metadata delimiter. The metadata delimiter is a const string `METADATA` in the implementation. The value is composed by consumer number, pending number, last delivered id, entries read and lag.


=>

The consumer group metadata contains the basic information of the consumer group used in the XINFO GROUPS command.
The key starts with the stream name, version, consumer group name. The next segment is a delimiter - a string METADATA (hardcoded value).
After the delimiter, there are:

number of consumers in the group

the length of the group's PEL (pending entries list)

the ID of the last entry delivered to the group's consumers

the ID of the last entry delivered to the group's consumers

the number of entries in the stream that are still waiting to be delivered to the group's consumers.

torwig · 2024-05-23T15:46:57Z

community/data-structure-on-rocksdb.md

+
+#### stream consumer metadata
+
+A consumer group contains several consumer and each consumer also has a metadata which is used in XINFO CONSUMERS. The key is composed by the stream key, version, group name, consumer name and a metadata delimiter. The metadata delimiter is a const string `METADATA` in the implementation. The value is composed by pending number, last idle time and last active time.


=>

A consumer group contains several consumer and each consumer also has its own metadata which is used in the XINFO CONSUMERS command.
The key starts with the stream key, version, consumer group name, consumer name. The next segment is a delimiter - a string METADATA (hardcoded value).
After the delimiter, there are:

the number of entries in the PEL

the timestamp of the last consumer interaction

the timestamp of the last successful consumer interaction (actually read or claimed some entries).

Yangsx-1 · 2024-05-24T02:56:46Z

Added some ideas. Please, double check the following
the timestamp of the last consumer interaction
the timestamp of the last successful consumer interaction (actually read or claimed some entries).
maybe I missed something (I think, it's hard to store the last idle time because there was no activity by the definition :) )

Oh, it's the name's problem. The last idle time actually stores the timestamp of the last consumer interaction and the last active time actually stores the timestamp of the last successful consumer interaction. :)

I'll change the name in kvrocks code.

PragmaTwice

Several questions regarding the encoding:

how to distinguish between EID MS|EID SEQ and group_name|..? since EID MS and EIQ SEQ is encoded as binary integer 64 bit, any value is possible (from 0 to 2^64).
how to list all group names? if we cannot distinguish between group encoding and normal subkey encoding.

Yangsx-1 · 2024-05-25T04:12:32Z

Several questions regarding the encoding:

how to distinguish between EID MS|EID SEQ and group_name|..? since EID MS and EIQ SEQ is encoded as binary integer 64 bit, any value is possible (from 0 to 2^64).

how to list all group names? if we cannot distinguish between group encoding and normal subkey encoding.

For the first question, actually there is a sizeof(stream_name), sizeof(group_name), sizeof(consumer_name) before the corresponding field which help us distinguish the different field.

For the second question, the group subkey is longer than the normal subkey of the stream entry which makes it possible to distinguish group subkey and normal stream entry subkey. You can refer https://github.com/apache/kvrocks/blob/9b1ebcd65a111ab8f8aded158d7427b7409ec4cc/src/types/redis_stream.cc#L298

PragmaTwice · 2024-05-25T05:21:16Z

For the first question, actually there is a sizeof(stream_name), sizeof(group_name), sizeof(consumer_name) before the corresponding field which help us distinguish the different field.

Actually only the group_name length field cannot help us to distinguish them, we should compare the total length of the subkey to distingush them (always > 16 bytes), which is very weird to me.

For the second question, the group subkey is longer than the normal subkey of the stream entry which makes it possible to distinguish group subkey and normal stream entry subkey.

We cannot list all group_names without iterating all subkeys of the stream, which is inefficient.

AS IS:

key|version|EID MS (8 bytes)|EID SEQ (8 bytes)

key|version|group_name_length (8 bytes)|group_name|METADATA(8 bytes)

key|version|group_name_length (8 bytes)|group_name|consumer_name_length (8 bytes)|consumer_name|METADATA(8 bytes)

key|version|group_name_length (8 bytes)|group_name|EID MS (8 bytes)|EID SEQ (8 bytes)

For example, if we have such an encoding:

key|version|EID MS (8 bytes)|EID SEQ (8 bytes)

(since EID MS can hardly be the max value)

key|version|(uint64_t)(-1) (8 bytes)|GROUP_META (1 byte)|group_name_length (8 bytes)|group_name

key|version|(uint64_t)(-1) (8 bytes)|CONSUMER_META (1 byte)|group_name_length (8 bytes)|group_name|consumer_name_length (8 bytes)|consumer_name

key|version|(uint64_t)(-1) (8 bytes)|PEL_ENTRY (1 byte)|group_name_length (8 bytes)|group_name|EID MS (8 bytes)|EID SEQ (8 bytes)

GROUP_META = (uint8_t)1
CONSUMER_META = (uint8_t)2
PEL_ENTRY = (uint8_t)3

Now we can list all group names just by scanning the prefix key|version|(uint64_t)(-1)|GROUP_META.
Also, we can list all PEL entries if we want in a very efficient way.

Most importantly, the lexicographic order of key distinguishes several different subkey types, instead of a mixing order.

Yangsx-1 · 2024-05-25T05:46:26Z

For the first question, actually there is a sizeof(stream_name), sizeof(group_name), sizeof(consumer_name) before the corresponding field which help us distinguish the different field.

Actually only the group_name length field cannot help us to distinguish them, we should compare the total length of the subkey to distingush them (always > 16 bytes), which is very weird to me.

For the second question, the group subkey is longer than the normal subkey of the stream entry which makes it possible to distinguish group subkey and normal stream entry subkey.

We cannot list all group_names without iterating all subkeys of the stream, which is inefficient.

AS IS:
key|version|EID MS (8 bytes)|EID SEQ (8 bytes)

key|version|group_name_length (8 bytes)|group_name|METADATA(8 bytes)

key|version|group_name_length (8 bytes)|group_name|consumer_name_length (8 bytes)|consumer_name|METADATA(8 bytes)

key|version|group_name_length (8 bytes)|group_name|EID MS (8 bytes)|EID SEQ (8 bytes)
For example, if we have such an encoding:
key|version|EID MS (8 bytes)|EID SEQ (8 bytes)

(since EID MS can hardly be the max value)

key|version|(uint64_t)(-1) (8 bytes)|GROUP_META (1 byte)|group_name_length (8 bytes)|group_name

key|version|(uint64_t)(-1) (8 bytes)|CONSUMER_META (1 byte)|group_name_length (8 bytes)|group_name|consumer_name_length (8 bytes)|consumer_name

key|version|(uint64_t)(-1) (8 bytes)|PEL_ENTRY (1 byte)|group_name_length (8 bytes)|group_name|EID MS (8 bytes)|EID SEQ (8 bytes)

GROUP_META = (uint8_t)1
CONSUMER_META = (uint8_t)2
PEL_ENTRY = (uint8_t)3
Now we can list all group names just by scanning the prefix key|version|(uint64_t)(-1)|GROUP_META. Also, we can list all PEL entries if we want in a very efficient way.

Most importantly, the lexicographic order of key distinguishes several different subkey types, instead of a mixing order.

Make sense. I'll change the encoding later.

PragmaTwice · 2024-05-25T06:00:59Z

Also the group/consumer name length can be in 4 bytes instead of 8 bytes

PragmaTwice · 2024-06-15T07:21:02Z

@Yangsx-1 Hi, have you had a chance to implement the encoding change recently?

Yangsx-1 · 2024-06-15T08:20:22Z

@Yangsx-1 Hi, have you had a chance to implement the encoding change recently?

I'm a little busy these days, but i'll try to finish this work and xpending command this month.

PragmaTwice · 2024-07-20T11:11:31Z

Hi @Yangsx-1 , could you update the document according to the changes when you have some time?

Add encoding of consumer group

74a4175

Yangsx-1 mentioned this pull request May 23, 2024

Data structure desgin documentation of STREAM GROUP apache/kvrocks#2268

Open

2 tasks

torwig reviewed May 23, 2024

View reviewed changes

Update data-structure-on-rocksdb.md

841fa9e

PragmaTwice reviewed May 25, 2024

View reviewed changes

Yangsx-1 mentioned this pull request Jun 29, 2024

refactor(stream): change the encoding of stream consumer group apache/kvrocks#2384

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add encoding of consumer group #222

Add encoding of consumer group #222

Yangsx-1 commented May 23, 2024

torwig left a comment

torwig May 23, 2024

torwig May 23, 2024

Yangsx-1 commented May 24, 2024

PragmaTwice left a comment •

edited

Loading

Yangsx-1 commented May 25, 2024

PragmaTwice commented May 25, 2024 •

edited

Loading

Yangsx-1 commented May 25, 2024

PragmaTwice commented May 25, 2024 •

edited

Loading

PragmaTwice commented Jun 15, 2024

Yangsx-1 commented Jun 15, 2024

PragmaTwice commented Jul 20, 2024 •

edited

Loading


		#### stream consumer metadata

		A consumer group contains several consumer and each consumer also has a metadata which is used in XINFO CONSUMERS. The key is composed by the stream key, version, group name, consumer name and a metadata delimiter. The metadata delimiter is a const string `METADATA` in the implementation. The value is composed by pending number, last idle time and last active time.

Add encoding of consumer group #222

Are you sure you want to change the base?

Add encoding of consumer group #222

Conversation

Yangsx-1 commented May 23, 2024

torwig left a comment

Choose a reason for hiding this comment

torwig May 23, 2024

Choose a reason for hiding this comment

torwig May 23, 2024

Choose a reason for hiding this comment

Yangsx-1 commented May 24, 2024

PragmaTwice left a comment • edited Loading

Choose a reason for hiding this comment

Yangsx-1 commented May 25, 2024

PragmaTwice commented May 25, 2024 • edited Loading

Yangsx-1 commented May 25, 2024

PragmaTwice commented May 25, 2024 • edited Loading

PragmaTwice commented Jun 15, 2024

Yangsx-1 commented Jun 15, 2024

PragmaTwice commented Jul 20, 2024 • edited Loading

PragmaTwice left a comment •

edited

Loading

PragmaTwice commented May 25, 2024 •

edited

Loading

PragmaTwice commented May 25, 2024 •

edited

Loading

PragmaTwice commented Jul 20, 2024 •

edited

Loading