-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
To reset the conversation a bit in preparation for 1.9, this is an updated proposal for gRPC-based components. This proposal and its underlying implementation does not deviate much from the previous proposal and work from @patrickhuber but adds a bit more clarity around the design and proposed packaging of components.
Dapr Pluggable Components
Proposal
To build a general-purpose mechanism for implementing Dapr components outside of the daprd binary itself. The goal of this is to both improve the overall health of the Dapr sidecar as well as increase flexibility when building new component implementations.
Terminology
Building block -- an abstract, reusable, description of a set of operations including their expected behavior, inputs and outputs. For example, a "state store" building block can store state, fetch state, delete state or update state.
Component an implementation of a Dapr building block that provides the desired functionality. For example, a Redis state store would implement the "state store" building block operations by using Redis as a storage engine.
Design goals
- Create a mechanism for building Dapr components in a way that does not require modifying the dapr source code and recompiling the binaries.
- Minimize the performance penalty incurred by such a system.
- Allow components to be developed in languages other than Go (specifically, those supported by gRPC)
- Increase flexibility of the dapr sidecar
- Improve long-term health of the dapr project
Non-goals
- Support components running on machines separate from where the dapr sidecar or daemon is being run (i.e., via remote TCP/IP sockets).
- Supporting programming languages that do not have existing protobuf compilers.
Background
Today, building a new component requires that the code for the component be built into the Dapr binary itself. This mechanism presents a few hurdles for Dapr (though this also applies to any other generally extensible system) that make development of new features and long-term maintenance of the platform itself challenging. In addition to project health-related challenges, it increases the friction for adoption of Dapr when customers need to integrate with systems that are not otherwise open-source. As a result, Dapr will benefit from a framework that enables the development of components that are not tightly coupled to the Dapr source code in both maintainability and adoption.
Challenges related to project health
As Dapr grows, and along with it the number of components that exist, there arFe several technical challenges that the project will face including development, deployment and security-related problems. These include (but are not necessarily limited to):
-
New components increase the size of the dependency-graph, leading to a higher likelihood of conflicts or forced-upgrades.
-
Larger application footprint as all modules will be built into the binary distribution whether they are used or not.
-
Increased surface area for security vulnerabilities due to the inclusion of unused code and dependencies.
-
All components are required to be built using Go (which is not a problem per-se but can make adding new components challenging if the technology being used does not already have a Go SDK or implementation).
-
Bugs in components can cause instability across the platform, rendering the system inoperable.
Challenges related to adoption of Dapr
As much as we may want all things to be open source the truth is that users cannot always open source the things that they build. This means that, if a customer needs to extend Dapr for integration with their own internal systems, they currently face a few challenges that makes it harder for them to adopt and maintain Dapr internally. For example:
-
Adopters must maintain an internal fork of Dapr (or apply patches to Dapr) to build and manage their own custom extensions that are not open source.
-
As mentioned before, all extensions to Dapr must be written in Go, which may or may not be feasible in any given organization for a variety of reasons; this means that a team may not be able to adopt Dapr.
-
Adding support for features that require CGO or other exotic implementations present additional challenges and changes to the build process / toolchain
Proposed Solution
This document proposes a system by which components can be plugged in to Dapr without being compiled into the core binary. To achieve this, the Dapr platform will implement each building block / component type in the dapr binary as a stub and then communicate with the implementation (component) using inter-process communication (IPC). In this proposal, we will look at using gRPC as the transport mechanism (see below for alternatives considered). Components can then be built external to the dapr binary and communicate with the runtime using gRPC as the underlying transport to communicate with the Dapr runtime. Plugins can then implement the APIs for these building blocks, much the same way they do now, but communicate with Dapr using gRPC as opposed to operating in-process with the dapr binary. This presents several benefits, such as the potential to have a much smaller core codebase for dapr, smaller binaries, and a more flexible system for building components in any language that supports gRPC (which includes most commonly used languages as of today). However, it is also to be noted that this model is not without its drawbacks, including increased latency (see bottom of proposal for a brief initial analysis), a requirement to build a system to package and deploy these extensions, and challenges around compatibility guarantees as the components change at different rates.
Rather than compile all the component implementations into the Dapr sidecar binary, this solution proposes that the Dapr sidecar instead implement each component as two parts:
-
A gRPC stub (component client) that has no concrete functionality.
-
A gRPC daemon (component service) that is responsible for implementing the actual component logic (i.e a Redis state store, a RabbitMQ pub/sub bus).
In addition to these, there is a "pluggable component" registry in the Dapr runtime that integrates with the existing component registry but performs the following additional behavior:
-
Specify a UNIX domain socket that both the client and server can connect to.
-
Register an instance of the component in the Dapr runtime, which is a gRPC client implementing the underlying building block’s API (i.e., a "state store”)
-
Create a new instance of the pluggable component (container or process)
-
Wait for the external component to be ready and listening on the UNIX domain socket.
-
Signal that the component is ready (or block on dependencies).
-
Communicate with the service to provide the component to the Dapr runtime.
The gRPC daemon (service) external to the Dapr runtime performs the following:
-
Initialize the component implementation (connect to Redis, RabbitMQ, etc.)
-
Listen for gRPC connections on the specified UNIX domain socket.
-
Serve the component API to the Dapr runtime and perform the underlying operations.
In this model, the components themselves are processes that run alongside the Dapr runtime; the Dapr runtime is responsible for communicating with these components, not implementing the functionality. As such, the Dapr runtime behaves more like an abstraction layer than a concrete implementation, leaving the implementation details to the component processes.
Side-effects of this model include reduced dependencies in the runtime, which has several benefits on its own (including reduced surface area for security vulnerabilities, as well as decreasing the chance of dependency conflicts), and isolation of the component processes, which means that if a single component has a bug the runtime can degrade gracefully rather than crash entirely.
Packaging / Runtime
One of the challenges of implementing this approach is the other side of one of its benefits – operating components as separate processes means that we need a way to deploy, execute, discover, and manage those components. As Dapr has two primary modes of operation, standalone and cluster-mode, there are two different scenarios under which this model needs to function. We propose that this be solved using containers as the default deployment model, though not necessarily the required deployment model. In order to connect components with the runtime, they need to have access to a shared UNIX domain socket on the local host that they can use to communicate with one another, and so it is possible to run components without using containers if they are able to access these shared sockets.
In this implementation it would be possible to have daprd spawn and manage processes for pluggable components (as the only requirement is being able to communicate with the component via gRPC) but this is not currently part of this proposal for a number of reasons:
- Dependency management inside a single container / host, including runtimes, for all external components is challenging
- Based on feedback, users would like to be able to run "official" dapr release artifacts (containers, binaries, etc.) with minimal addition to those artifacts and would prefer external components to be packaged and managed independently
- This model implificitly brings on the requirement that Dapr re-implement lifecycle management for external components (which Kubernetes / Docker already support)
- Distribution of pluggable components is much simpler in this model as they can be vended as a container, complete with runtime, and be more easily audited and managed independently.
Kubernetes Mode
When run alongside an application in a Kubernetes cluster, components would be deployed as additional sidecars within a pod; being that all the containers in a pod execute on the same host, it is simple enough for a deployment to map a shared volume of some sort, ephemeral or other, between the runtime and the components being used. For example, a pod that is deploying the runtime along with a handful of components may end up looking like the following diagram:
Standalone Mode
When running Dapr in standalone mode, containers can still be leveraged as a tool for deployment and execution of components (assuming that a container runtime is available and able to interact with the Dapr runtime), this guarantees a consistent runtime environment for each component. If a container runtime is not available, it is entirely possible to deploy components as standalone services running on the host but then it is up to the operator to ensure that a compatible runtime (and any dependencies) are available on the host in order for the component(s) to operate.
Alternatives Considered
There are many ways to achieve inter-process communication (IPC) as it is prevalent in modern software (for example Google Chrome uses IPC to transfer data between internal processes). As a result, there are a lot of different approaches available to us beyond using gRPC. In fact, there are several reasons why gRPC is not (technically) the best choice, including: large library overhead, use of non-standard HTTP libraries, and only supporting HTTP/2 as the communication protocol. In order to be comprehensive, we looked at several alternatives before deciding on this approach including:
gRPC + protobuf
Pros:
- Reuse existing protobuf mechanisms
- Auto-generates client / server stubs
- Used already by dapr runtime and SDKs for API communication (making it easy to write components using existing SDK infrastructure)
- Relatively minimal latency (~50us measured)
Cons:
- Latency is higher than in-process
- Only HTTP/2 supported
- Non-standard HTTP library (in Go)
ZeroMQ + protobuf
Pros:
- Lightweight
- Multiple languages
- Easy to implement
- Uses existing protobuf mechanism
Cons:
- Requires a new library / way of doing things
- Depending on Go library, may require C bindings / FFI
- No auto-implemented client/service components
Twirp (gRPC-alike framework from Twitch)
Pros:
- HTTP/1.1 as well as HTTP/2
- Multiple serializers
- Uses standard HTTP library in Go
Cons:
- Not in use by runtime currently (new model)
Custom-written IPC library
Pros:
- None, per-se, other than raw performance
Cons:
- You are likely to be eaten by a Grue... 😅
Go binary plugins (using Go's built-in plugin library)
Pros:
- Low overhead, likely the same, or very similar, latency as built-ins
- No requirement to build a gRPC service + implementation
Cons:
- Only works on FreeBSD, macOS and Linux (no Windows support)
- Still requires that all components be written in Go
Future work
There are several follow-on items to be evaluated, including (but not necessarily limited to):
- Potentially leveraging CNAB (Cloud-Native Application Bundles) as a packaging tool
- Cross-host component discovery and hosting over TCP or other network protocol
- Post-launch attachment / dynamic registration of components
Notes on latency
This is by no means a complete analysis but rather a quick sanity check of the performance overhead introduced by a gRPC plugin. In this case, I have written a Redis-based .NET state store and compared its performance with the built-in Redis state store by using the .NET "bank transaction" API service to exercise each of them. Each was run with the same user profiles / behavior over time using Gatling (gatling.io), and the results are plotted here. Before launch we will do strenuous load testing of the various building blocks to determine where users might encounter issues with regards to load, throughput, and overhead.



