All kernel resources consumed by userspace, must be accounted and limited in some way. In bus1, all resources are accounted per user. This page outlines how the resource accounting and quotas work and what the rationale is.
The quota logic is designed in such a way as to the extent possible stay out of the way and not be triggered unless something is really wrong. Like the out-of-memory killer it should be thought of as a last-resort and not something that should ever be hit during regular operation.
An operation that fails due to the resource limits being exhausted will simply fail and have no effect. As such it is perfectly harmless, but on the other hand there is no support for recovering gracefully, such as polling for resources being available again.
Each user has a static, but configurable, limit on how much of each kind of resources it may consume, and this is shared between all its peers. Resources fully owned by a peer is accounted on the peer's user, and resources consumed by in-flight messages are accounted on the receiving peer's user. Later we will outline how we divide a user's limits between all its peers and all the other users that may be sending it messages.
Accounting on the receiving, rather than sending user for in-flight resources was a design decision, and the converse could also have been a possibility. One reason for accounting on the receiver, is that doing it in this way we can guarantee (with one exception, see fd accounting below) that if a message was sent successfully, it will also be possible to receive it successfully (as there is no need to reaccount from the sender to the receiver for resources that are still pinned after the message has been received).
The number of bytes of pool memory used is accounted, both for in-flight as well as received messages. It is worth noting that the kernel will also account the memory usage, but will do this on the sender rather than the receiver. Resolving this apparent conflict is still a work in progress.
The number of slices are accounted the same way as the number of bytes; counting both the in-flight and the dequeued slices.
The number of handles are accounted, but with a twist. When a message is in-flight an upper bound on the number of handles it comes with will be accounted, and once it has been properly installed this will be adjusted down to the real number of handles. The reason for this logic is that handles are reference counted so even if a message is sent with a given number of handles, it may be that some of them are already installed in the destination, and hence only a reference is taken.
Only the number of in-flight file descriptors are accounted, and once they are received the regular file descriptor accounting takes over. This means that it is possible for a message to be sent, but that receiving it will fail due to no more file descriptors being available.
As in-flight messages are accounted on the receiver, quotas must be applied to avoid any set of senders consuming all the resources of a receiver. The available resources are divided among the sending users by the quota logic outlined below.
If a given peer does not dequeue its incoming messages (fast enough) it may end up pinning the whole quota of the sending user. To avoid this, the quota of a sending user is again divided up among the receiving peers of the receiving user using the same quota logic described below.
The two kinds of quotas outlined above are both of the same kind and are implemented in the same way. A given amount of resources are to be dynamically divided among an unbounded number of consumers. We cannot know up-front how many consumers will request resources, nor how much each of them will require. We have to decide for a given request whether or not to grant it, without exhausting the resources too eagerly (ending up having to limit some consumers too much in the future) and without denying requests too eagerly and hence leaving too much of the available resources unused.
The resources are divided up as follows: for any given request, the resources available to the consumer making the request is half the resources not currently consumed by anybody else.