Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support compression for Keeper protocol #49507

Closed
alesapin opened this issue May 4, 2023 · 4 comments · Fixed by #54957
Closed

Support compression for Keeper protocol #49507

alesapin opened this issue May 4, 2023 · 4 comments · Fixed by #54957
Assignees
Labels

Comments

@alesapin
Copy link
Member

alesapin commented May 4, 2023

[Zoo]Keeper Client and [Zoo]Keeper server communicate using custom TCP protocol:

/** ZooKeeper wire protocol.
Debugging example:
strace -t -f -e trace=network -s1000 -x ./clickhouse-zookeeper-cli localhost:2181
All numbers are in network byte order (big endian). Sizes are 32 bit. Numbers are signed.
zxid - incremental transaction number at server side.
xid - unique request number at client side.
Client connects to one of the specified hosts.
Client sends:
int32_t sizeof_connect_req; \x00\x00\x00\x2c (44 bytes)
struct connect_req
{
int32_t protocolVersion; \x00\x00\x00\x00 (Currently zero)
int64_t lastZxidSeen; \x00\x00\x00\x00\x00\x00\x00\x00 (Zero at first connect)
int32_t timeOut; \x00\x00\x75\x30 (Session timeout in milliseconds: 30000)
int64_t sessionId; \x00\x00\x00\x00\x00\x00\x00\x00 (Zero at first connect)
int32_t passwd_len; \x00\x00\x00\x10 (16)
char passwd[16]; \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 (Zero at first connect)
};
Server replies:
struct prime_struct
{
int32_t len; \x00\x00\x00\x24 (36 bytes)
int32_t protocolVersion; \x00\x00\x00\x00
int32_t timeOut; \x00\x00\x75\x30
int64_t sessionId; \x01\x62\x2c\x3d\x82\x43\x00\x27
int32_t passwd_len; \x00\x00\x00\x10
char passwd[16]; \x3b\x8c\xe0\xd4\x1f\x34\xbc\x88\x9c\xa7\x68\x69\x78\x64\x98\xe9
};
Client remembers session id and session password.
Client may send authentication request (optional).
Each one third of timeout, client sends heartbeat:
int32_t length_of_heartbeat_request \x00\x00\x00\x08 (8)
int32_t ping_xid \xff\xff\xff\xfe (-2, constant)
int32_t ping_op \x00\x00\x00\x0b ZOO_PING_OP 11
Server replies:
int32_t length_of_heartbeat_response \x00\x00\x00\x10
int32_t ping_xid \xff\xff\xff\xfe
int64 zxid \x00\x00\x00\x00\x00\x01\x87\x98 (incremental server generated number)
int32_t err \x00\x00\x00\x00
Client sends requests. For example, create persistent node '/hello' with value 'world'.
int32_t request_length \x00\x00\x00\x3a
int32_t xid \x5a\xad\x72\x3f Arbitrary number. Used for identification of requests/responses.
libzookeeper uses unix timestamp for first xid and then autoincrement to that value.
int32_t op_num \x00\x00\x00\x01 ZOO_CREATE_OP 1
int32_t path_length \x00\x00\x00\x06
path \x2f\x68\x65\x6c\x6c\x6f /hello
int32_t data_length \x00\x00\x00\x05
data \x77\x6f\x72\x6c\x64 world
ACLs:
int32_t num_acls \x00\x00\x00\x01
ACL:
int32_t permissions \x00\x00\x00\x1f
string scheme \x00\x00\x00\x05
\x77\x6f\x72\x6c\x64 world
string id \x00\x00\x00\x06
\x61\x6e\x79\x6f\x6e\x65 anyone
int32_t flags \x00\x00\x00\x00
Server replies:
int32_t response_length \x00\x00\x00\x1a
int32_t xid \x5a\xad\x72\x3f
int64 zxid \x00\x00\x00\x00\x00\x01\x87\x99
int32_t err \x00\x00\x00\x00
string path_created \x00\x00\x00\x06
\x2f\x68\x65\x6c\x6c\x6f /hello - may differ to original path in case of sequential nodes.
Client may place a watch in their request.
For example, client sends "exists" request with watch:
request length \x00\x00\x00\x12
xid \x5a\xae\xb2\x0d
op_num \x00\x00\x00\x03
path \x00\x00\x00\x05
\x2f\x74\x65\x73\x74 /test
bool watch \x01
Server will send response as usual.
And later, server may send special watch event.
struct WatcherEvent
{
int32_t type;
int32_t state;
char * path;
};
response length \x00\x00\x00\x21
special watch xid \xff\xff\xff\xff
special watch zxid \xff\xff\xff\xff\xff\xff\xff\xff
err \x00\x00\x00\x00
type \x00\x00\x00\x02 DELETED_EVENT_DEF 2
state \x00\x00\x00\x03 CONNECTED_STATE_DEF 3
path \x00\x00\x00\x05
\x2f\x74\x65\x73\x74 /test
Example of multi request:
request length \x00\x00\x00\x82 130
xid \x5a\xae\xd6\x16
op_num \x00\x00\x00\x0e 14
for every command:
int32_t type; \x00\x00\x00\x01 create
bool done; \x00 false
int32_t err; \xff\xff\xff\xff -1
path \x00\x00\x00\x05
\x2f\x74\x65\x73\x74 /test
data \x00\x00\x00\x06
\x6d\x75\x6c\x74\x69\x31 multi1
acl \x00\x00\x00\x01
\x00\x00\x00\x1f
\x00\x00\x00\x05
\x77\x6f\x72\x6c\x64 world
\x00\x00\x00\x06
\x61\x6e\x79\x6f\x6e\x65 anyone
flags \x00\x00\x00\x00
int32_t type; \x00\x00\x00\x05 set
bool done \x00 false
int32_t err; \xff\xff\xff\xff -1
path \x00\x00\x00\x05
\x2f\x74\x65\x73\x74
data \x00\x00\x00\x06
\x6d\x75\x6c\x74\x69\x32 multi2
version \xff\xff\xff\xff
int32_t type \x00\x00\x00\x02 remove
bool done \x00
int32_t err \xff\xff\xff\xff -1
path \x00\x00\x00\x05
\x2f\x74\x65\x73\x74
version \xff\xff\xff\xff
after commands:
int32_t type \xff\xff\xff\xff -1
bool done \x01 true
int32_t err \xff\xff\xff\xff
Example of multi response:
response length \x00\x00\x00\x81 129
xid \x5a\xae\xd6\x16
zxid \x00\x00\x00\x00\x00\x01\x87\xe1
err \x00\x00\x00\x00
in a loop:
type \x00\x00\x00\x01 create
done \x00
err \x00\x00\x00\x00
path_created \x00\x00\x00\x05
\x2f\x74\x65\x73\x74
type \x00\x00\x00\x05 set
done \x00
err \x00\x00\x00\x00
stat \x00\x00\x00\x00\x00\x01\x87\xe1
\x00\x00\x00\x00\x00\x01\x87\xe1
\x00\x00\x01\x62\x3a\xf4\x35\x0c
\x00\x00\x01\x62\x3a\xf4\x35\x0c
\x00\x00\x00\x01
\x00\x00\x00\x00
\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x06
\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x01\x87\xe1
type \x00\x00\x00\x02 remove
done \x00
err \x00\x00\x00\x00
after:
type \xff\xff\xff\xff
done \x01
err \xff\xff\xff\xff
*/
.
ZooKeeper has quite strict (but configurable) limitations on request/response size but for Keeper we don't have them. For large multi-requests and big responses (for example getChildren response) we can have big advantages from compression, because both requests and responses should compress well.

We already support compression in client-server TCP protocol: https://github.com/ClickHouse/clickhouse-private/blob/c496ff8dc14f1a5f115c257e96c9b1d16e628e32/src/Server/TCPHandler.cpp#L1745-L1748. And in client-server HTTP protocol: https://github.com/ClickHouse/clickhouse-private/blob/c496ff8dc14f1a5f115c257e96c9b1d16e628e32/src/Server/HTTPHandler.cpp#L575-L581.

The same technique can be applied -- wrap our TCP socket with compressed buffer on both sides:

  1. Server:
    in = std::make_shared<ReadBufferFromPocoSocket>(socket());
    out = std::make_shared<WriteBufferFromPocoSocket>(socket());
  2. Client:
    in.emplace(socket);
    out.emplace(socket);

To determine do we need to use compression or not, user can specify special tag in client's configuration: https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/zookeeper.xml#LL1C1-L10C1, like <compressed_protocol>true</compressed_protocol>. I think the best way for Keeper Client to tell Server about compression is protocol version constant

static constexpr int32_t ZOOKEEPER_PROTOCOL_VERSION = 0;
. If client configured to use compression, tried to send ZOOKEEPER_PROTOCOL_VERSION_WITH_COMPRESSION (1) but received an exception it should just retry without compression.

cc @antonio2368, @alexey-milovidov

@zhanglistar
Copy link
Contributor

zhanglistar commented May 8, 2023

@alesapin What't the performance of keeper comparing with apache zookeeper nowadays?

@helifu
Copy link
Contributor

helifu commented May 30, 2023

Seems only one single thread processes the requests, so there is no chance of a big performance improvement, am I right?

Here is the code:
https://github.com/ClickHouse/ClickHouse/blob/bb2acb50d2d65226e5ceb51d0d83a70f4283ce60/src/Coordination/KeeperDispatcher.cpp#LL188C40-L188C40

@antonio2368
Copy link
Member

@helifu
Requests are processed in a single thread because of the ordering guarantees, and because the operations are simple multithreading could have negative effects (based on some tests I did).

Receiving requests and sending responses is done in different threads. What's more each connection has its own thread for receiving requests. Decompressing would be done in a multithreaded way so we expect some improvements for bigger requests.

@helifu
Copy link
Contributor

helifu commented May 31, 2023

Thanks @antonio2368 ~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants