Implement a proxy module with manual failover #2625

kostja · 2017-07-21T11:49:46Z

Implement a tarantool module which would proxy traffic to a master-master configuration and fail over automatically. Right now this is done in https://github.com/tarantool/sentinel, we should take it on board.

The proxy has to be built-in.

API

proxy = require('proxy').new()
proxy:set_rw(uri)
proxy:set_ro(uri)
proxy:listen(uri)

Failover, manual:

proxy:fail_over_to(uri)

How should we forward authentication?
How should we detect which node is a master and which is a replica?

The text was updated successfully, but these errors were encountered:

kostja · 2018-01-19T20:59:04Z

Requested by ICQ

rtsisyk · 2018-01-22T11:06:58Z

Please move to 1.7.8

Gerold103 · 2018-03-01T19:06:02Z

In progress, but already opened for comments and discussion.

Why?

A client must not know which instance is master, and which is replica - these
details must be hidden behind a proxy.
When multiple clients exist, their client-to-proxy connections could be
multiplexed into more 'fat' proxy-to-storage connections.

Main solved problems

Connections can not be multiplexed, if they contains different credentials.
Each 'fat' connection must serve only one Tarantool user (literaly user, from
space._user).
Proxy must use its own sync values, because multiple client-to-proxy
connections, multiplexed in a single proxy-to-storage connection, can send
conflicting syncs. So the proxy must replace these syncs with its own when
sending requests to a storage, and replace them back, sending response to a
client. This mechanism in a manner similar to NAT must work per one
proxy-to-storage connection.
If a proxy is run in the same process as a replicaset master, then all its
requests must be sent to a master directly via cbus, with no overhead on sockets
IO. It is the killer-feature of the Tarantool proxy.

API

Proxy is a module, and can be configured like this:

proxy = require('proxy')
proxy.cfg{
	replicaset = {
		{ uri = <instance URI>, is_master = <boolean> },
		...
	},
	-- Options, keept from storage --
	listen
	readahead,
	log,
	log_nonblock,
	log_format
}

Proxy on cfg with set listen starts an IProto thread, or joins to an existing
one, if box.cfg is already called and a master in proxy configuration is
exactly this instance. All options are static on the first implementation.

Architecture

client, user1 ------*
 ...                 \           proxy              master
client, user1 --------*----------* - *----------------*
                         ---- SYNC -> proXYNC ---->
                         <--- SYNC <- proXYNC -----

Connecting, authorization

All proxy internals can be implemented inside IProto thread. Only
configuration/reconfiguration details must stay in TX thread.

IProto thread when started, knows which port is proxy, and to which storages it
must proxy incomming connections. IProto thread establishes connections to all
storages under a guest user, with no IPROTO_AUTH request. Despite of the fact
that proxy sends all requests to a master, it must be able to do fast failover
to one of replicas. So it must connect to slaves too.

On connect a proxy for each proxy-to-storage connection saves salt - name it
storage salt further. When a client is connected, a proxy sends to him its
own proxy salt, different from a storage salt.
Then the client sends IPROTO_AUTH request. A proxy extracts a user name from the
request, looks up in its local credentials storage for a password hash, salt it
with the proxy salt and compares them. If they mismatch, then authorisation
error is sent to a client. If they match, then salt the password hash with the
storage salt.
Then search for a proxy-to-storage connection under the same user. If found,
then this modified IPROTO_AUTH request is sent to it. Further
all requests are proxied with no changes.

If there is no a connection under this user, then create a new one.

Note: if a proxy has out of date passwords, then authorization can be
success on a proxy, but not success on a storage. This is because each new
client-to-proxy connection must check a password again.

Sync translation

If a proxy-to-storage connection serves one client-to-proxy connection, then
sync translation is not needed - there are no conflicts.

When a proxy-to-storage connection serves multiple client-to-proxy connections,
the first one stores and maintains increasing sync counter. Consider the
communication steps:

A client sends a request to a proxy with sync = s1;
A proxy remembers this sync, changes it to sync = s2, sends the request
to a storage;
A response with sync = s2 is received from the storage. The proxy replaces
s2 back to s1 and sends the response to the client.

Queuing

Consider one proxy-to-storage connection. To prevent mixing parts of multiple
requests from different client-to-proxy connections, a proxy must forward
requests one by one. To do it fairly, a proxy-to-storage connection has a queue.
In the queue client-to-proxy connections are stored, those sockets are available
for reading.

When a client socket with no available data becomes available for reading, it
stands at the end of the queue. First client in the queue after sending ONE
request is removed from a queue. If it has more requests, then it stands at the
end of the queue to send them. It guarantees a fairness if one client will be
always available for reading.

To speed up sync translation, it can be done right after receiving a request
from a client, with no waiting until a proxy-to-storage connection is available
for writing. It allows to do not dawdle with syncs when a client appears in
the front of the queue.

Storage and proxy alliance

The main feature of the proxy is ability to merge a proxy and a storage. If in
the same process a master storage with no box.cfg.listen and a proxy are
started, then the proxy could send requests to the master with no sockets,
directly via cbus. To do it, a proxy on start must check, that UUID of a master
in its proxy.cfg.replicaset is the same as the local UUID. If it is true, then
all requests are sent directly to tx_pipe.

Gerold103 · 2018-03-01T22:47:57Z

@kostja comments:

proxy must not do any automatic things - neither detect master change or initiate it;
some remarks about authorisation, that I can not understand now;
proxy must be a fully separate module;
look at another proxies: MariaDB proxy, haproxy, MySQL Router for MySQL, proxysql, redis-proxy, redis-sentinel;
cite:

Ah, almost forgot: the proxy configuration should make it possible
to forward some requests to the same tarantool which is operating
the proxy, without a round trip to the operating system/socket
interface. In other words, if one of the instances in proxy configuration
is a loopback, the proxy should not need to perform any network
I/O to deliver data to its local node, but push it right into tx
thread.

This would make it possible to entirely replace box.cfg{listen=}
with require('proxy').listen() and make box.cfg{listen=}
obsolete.

Mons · 2020-04-10T12:47:00Z

https://github.com/moonlibs/apex already implements proxy with failover, role-aware request balancing, health ping checks and lua-namespace autoload proxying.

kyukhin · 2020-04-10T13:08:43Z

We might want to implement override of IPROTO_(SELECT|INSERT|...) instead.

knazarov · 2020-04-26T17:47:57Z

After thinking about it for some time, I don't believe this will be useful. For sharded tarantool, vshard already takes care of this. Even if we put it before (multiple) vshard-routers, this proxy will fail to handle the whole traffic, unless it's multi-threaded.

kostja · 2020-04-26T17:59:56Z

This is supposed to be present in every tarantool instance, so as many threads as iproto threads in all instances you have. I don't see how it's possible to make RAFT work without it.

knazarov · 2020-04-26T18:19:22Z

@kostja I mean that it will be mostly useless for clients if designed as written above. Proxying authentication and iproto is only useful when the client is working with only one instance of Tarantool. And working with single instance makes it impossible to scale horizontally.

If we subtract the actual proxying and only implement it for regular net.box calls between a router and storage, it may be useful for RAFT. But still, RAFT is not on the horizon anytime soon, as far as I know. As for vshard, it already can implement such logic in "callrw".

kostja · 2020-04-27T08:30:42Z

@knazarov the idea with this patch is that eventually you will add vshard router functions into iproto. What you're suggesting is that since you can't do it yet it's useless. That would be a pity to stop this work only because it doesn't as is meets your present day needs.

knazarov · 2020-04-27T15:28:35Z

The description and design by @Gerold103 doesn't say anything about the fact that it's needed for vshard, or the plans to add vshard calls to iproto. So I'm only making assumptions from what I read, or can deduce from my current knowledge. The original description only talks about client -> storage connections.

Af for vshard, I doubt the usefulness of proxying it for a few reasons:

you will add an additional network hop: client -> router -> storage -> another storage, instead of the current client -> router -> storage
it will mean that client libraries get a low-level vshard API and have to program their own merging facilities and a high level querying API to be generally useful.
tarantool libraries didn't follow a "fat client" design since the inception, and I didn't hear anywhere that this approach will change

So, in general I see exposing current vshard API as useless, and would instead like to see a high-level API for client libraries to work with. (The API that has at least CRUD-level querying capabilities)

If you think otherwise, please share your reasoning.

kostja · 2020-04-27T15:34:48Z

Let's talk about read/write queries for elastic, clustered storage for a minute. The target scheme we need to come to is client -> correct storage instance. For this, the client needs to be sharding topology aware. However, not all clients are sharding-topology aware, and there are times when topology changes. So buckets are being moved around. For these clients and times, iproto should be able to bounce the sharded write/read on client's behalf.
The "router" function is artificial - a perfect tarantool cluster cluster should only have application nodes and storage nodes, or even be homogeneous (true data grid architecture), when every app node is also a storage node.
To transition (eventually) to this kind of topology all routing functions have to be handled by iproto.

kostja · 2020-04-27T15:36:50Z

When I was mentioning adding vshard router functions to iproto, I didn't mean exposing a new API. I meant iproto should be able to route a query to the right bucket automatically if it arrived at a wrong storage instance.

kostja · 2020-04-27T15:45:56Z

And yes, the goal is to move all routing logic to iproto, be it for RAFT or for sharding. It won't isolate this logic in iproto alone, it's impossible: imagine you run a stored procedure on a storage node, eventually we'll want to select from a remote bucket from it. And as you are rightfully pointing out, it is not the best idea to route a request to a random node, so client will have to be topology aware as well. This counts for at least 3 places where we will do the routing. The way I view iproto routing function is:

enforcing correctness (i.e. doing the right thing in 100% of cases, client and in-tx net.box can be only 99% right)
support dumb clients efficiently, so that anyone could write a simple key/value app using tarantool cluster without messing with heterogeneous roles or complex drivers.

kostja · 2020-04-27T15:53:11Z

Imagine operating a heterogeneous cluster, i.e. a cluster you're trying to upgrade from 3.0 to 4.0. In 4.0, your routing logic may be different from 3.0, so at 4.0 nodes you'll need to behave differently depending on whether the bucket is on old or new node. It's impossible you'll be able to fix old clients and 3.0 net.box to handle it.

kostja · 2020-04-27T15:56:56Z

BTW, @Gerold103 write-up + comments about "some kostja's comments which I did not understand" doesn't pass as a complete spec to me :) Why not make a yet another round of RFC discussion about this item and iron out all wrinkles?

kostja assigned GeorgyKirichenko Jul 21, 2017

kostja added the feature A new functionality label Jul 21, 2017

kostja added this to the 1.7.6 milestone Jul 21, 2017

kostja added the prio1 label Aug 15, 2017

rtsisyk modified the milestones: 1.7.6, 1.7.7 Oct 21, 2017

kostja modified the milestones: 1.8.5, 1.7.7 Oct 27, 2017

kostja mentioned this issue Jan 19, 2018

Pure-proxy mode tarantool/vshard#60

Closed

kostja added the customer label Jan 19, 2018

rtsisyk assigned rtsisyk and unassigned GeorgyKirichenko Jan 22, 2018

rtsisyk added the needs feedback Something is unclear with the issue label Jan 22, 2018

Gerold103 assigned Gerold103 and unassigned rtsisyk Feb 9, 2018

kostja modified the milestones: 1.8.0, 1.9.0 Feb 11, 2018

kostja added replication in design Requires design document and removed needs feedback Something is unclear with the issue in design Requires design document labels Feb 11, 2018

kostja assigned kbelyavs Feb 22, 2018

Gerold103 added the complicated label Feb 25, 2018

Gerold103 added the in design Requires design document label Mar 1, 2018

Gerold103 changed the title ~~Implement a proxy module with automatic failover~~ Implement a proxy module with manual failover Mar 5, 2018

Gerold103 added the design review label Mar 6, 2018

kyukhin added this to the 2.3.0 milestone Apr 8, 2019

rosik mentioned this issue Jun 21, 2019

console.listen() with authorization #4301

Closed

kyukhin modified the milestones: 2.3.1, 2.4.0 Aug 9, 2019

kostja mentioned this issue Dec 2, 2019

Add support for multiple transports at the same time #3554

Closed

sergepetrenko removed the in progress label Jan 16, 2020

kyukhin modified the milestones: 2.4.1, 2.5.1 Jan 23, 2020

kyukhin removed prio1 customer labels Apr 27, 2020

kyukhin modified the milestones: 2.5.1, 2.6.1 May 15, 2020

olegrok mentioned this issue May 22, 2020

Implement proxying box calls on the IPROTO level #5012

Open

kyukhin modified the milestones: 2.6.1, 2.7.1 Aug 26, 2020

kyukhin modified the milestones: 2.7.1, wishlist Nov 2, 2020

R-omk mentioned this issue Aug 5, 2021

Master-discovering support #6256

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a proxy module with manual failover #2625

Implement a proxy module with manual failover #2625

kostja commented Jul 21, 2017 •

edited

kostja commented Jan 19, 2018

rtsisyk commented Jan 22, 2018

Gerold103 commented Mar 1, 2018 •

edited

Gerold103 commented Mar 1, 2018 •

edited

Mons commented Apr 10, 2020

kyukhin commented Apr 10, 2020 •

edited

knazarov commented Apr 26, 2020

kostja commented Apr 26, 2020

knazarov commented Apr 26, 2020

kostja commented Apr 27, 2020

knazarov commented Apr 27, 2020

kostja commented Apr 27, 2020

kostja commented Apr 27, 2020

kostja commented Apr 27, 2020 •

edited

kostja commented Apr 27, 2020

kostja commented Apr 27, 2020 •

edited

Implement a proxy module with manual failover #2625

Implement a proxy module with manual failover #2625

Comments

kostja commented Jul 21, 2017 • edited

API

kostja commented Jan 19, 2018

rtsisyk commented Jan 22, 2018

Gerold103 commented Mar 1, 2018 • edited

Why?

Main solved problems

API

Architecture

Connecting, authorization

Sync translation

Queuing

Storage and proxy alliance

Gerold103 commented Mar 1, 2018 • edited

Mons commented Apr 10, 2020

kyukhin commented Apr 10, 2020 • edited

knazarov commented Apr 26, 2020

kostja commented Apr 26, 2020

knazarov commented Apr 26, 2020

kostja commented Apr 27, 2020

knazarov commented Apr 27, 2020

kostja commented Apr 27, 2020

kostja commented Apr 27, 2020

kostja commented Apr 27, 2020 • edited

kostja commented Apr 27, 2020

kostja commented Apr 27, 2020 • edited

kostja commented Jul 21, 2017 •

edited

Gerold103 commented Mar 1, 2018 •

edited

Gerold103 commented Mar 1, 2018 •

edited

kyukhin commented Apr 10, 2020 •

edited

kostja commented Apr 27, 2020 •

edited

kostja commented Apr 27, 2020 •

edited