Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b15be92
commit 9360205
Showing
1 changed file
with
113 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
--- | ||
markup: md | ||
title: Digging Into etcd | ||
date: "2019-12-01" | ||
draft: true | ||
categories: | ||
- OSS | ||
- <span class="emoji" style="background-image:url(/images/kubernetes_logo.svg)" title=":kubernetes:"/>:kubernetes:</span> | ||
- <span class="emoji" style="background-image:url(/images/emoji/emoji_u1f525.png)" title=":git:"/>:fire:</span> | ||
--- | ||
|
||
## What Is etcd? | ||
|
||
[etcd] per the official site is: | ||
|
||
> A distributed, reliable key-value store for the most critical data of a distributed system | ||
I presume `etcd` is a play on `/etc` and the long history of nameing [daemons] | ||
with a `d` suffix, a daemon for your `/etc` config, though I've not yet found | ||
proof of this. | ||
|
||
[Kubernetes] uses etcd as the [backing store for cluster data][k8s-uses-etcd], | ||
which drove my own interest in collecting the information in this post. | ||
|
||
Clearly a lot of clusters out there are using etcd for critical data storage, | ||
but how does it work? | ||
|
||
## History | ||
|
||
For a history of etcd see: https://coreos.com/blog/history-etcd | ||
|
||
For bonus points: https://www.wired.com/2013/08/coreos-the-new-linux/ | ||
|
||
Roughly etcd was created out of a desire for a distributed data store addressing | ||
the following issues: | ||
|
||
- Google's [Chubby Paper] was public, but Chubby itself was / is not. | ||
|
||
- Zookeeper was expensive to run, didn't scale down, and couldn't be interacted | ||
with via common simple tools like [curl]. | ||
|
||
Initially etcd was used by coreOS' `fleet` container orchestration system, | ||
but it was quickly adopted for other uses and later donated to the [CNCF]. | ||
|
||
|
||
## Architecture | ||
|
||
### Overview | ||
|
||
- `etcd` is a Go binary with a seperate CLI (`etcdctl`). | ||
|
||
- etcd exposes a [gRPC] service along with an HTTP JSON API. | ||
|
||
- Data is persisted in multiversion key-value format, stored on disk. | ||
|
||
- Typically one, three, or five replicas are used. | ||
|
||
- Each replica stores the full dataset, following the leader. | ||
|
||
### Data Model | ||
|
||
etcd's upstream documentation is instructive here: [github.com/etcd-io/etcd/blob/master/Documentation/learning/data_model.md](https://github.com/etcd-io/etcd/blob/master/Documentation/learning/data_model.md) | ||
|
||
### Consensus | ||
|
||
Leader election is used to maintain a single leader replica, all requests | ||
are routed to the leader internally and comitted only after acheiving consensus | ||
on the request. | ||
|
||
[Raft] is the [consensus algorithm] used both requests and for leader elections. | ||
The official [raft site][Raft] is a a good reference for understanding how this works. | ||
Another great resource linked from the official site is [thesecretlivesofdata.com/raft/]. | ||
|
||
etcd's [raft implementation][etcd-raft-implementation] is widely used and | ||
contains some useful documentation. | ||
|
||
### Storage | ||
|
||
Data is stored with a [memory-mapped][memory-mapped] [B+ tree] using [bbolt], a fork of [bolt], inspired by [LMDB]. | ||
|
||
## TODO | ||
|
||
- elaborate on Kubernetes's usage | ||
- talk more aboult multiversion and revisions | ||
- talk more about data model | ||
- talk more about supported operations | ||
|
||
## Additional Resources | ||
|
||
The Carnegie Mellon Database Group "Database of Databases" site has a great page | ||
on etcd at [dbdb.io/db/etcd][dbdb-etcd] | ||
|
||
|
||
[etcd]: https://etcd.io/ | ||
[daemons]: https://en.wikipedia.org/wiki/Daemon_(computing) | ||
[Kubernetes]: https://kubernetes.io/ | ||
[k8s-uses-etcd]: https://kubernetes.io/docs/concepts/overview/components/#etcd | ||
[Chubby Paper]: https://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf | ||
[curl]: https://curl.haxx.se/ | ||
[Raft]: https://raft.github.io/ | ||
[consensus algorithm]: https://en.wikipedia.org/wiki/Consensus_algorithm | ||
[CNCF]: https://www.cncf.io/ | ||
[write-ahead logging]: https://en.wikipedia.org/wiki/Write-ahead_logging | ||
[dbdb-etcd]: https://dbdb.io/db/etcd | ||
[gRPC]: https://grpc.io/ | ||
[thesecretlivesofdata.com/raft/]: http://thesecretlivesofdata.com/raft/ | ||
[etcd-raft-implementation]: https://github.com/etcd-io/etcd/tree/4b755e8935e626d35e9ab9ee2b25906a658846c2/raft#raft-library | ||
[LMDB]: https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database | ||
[bolt]: https://github.com/boltdb/bolt | ||
[bbolt]: https://github.com/etcd-io/bbolt | ||
[memory-mapped]: https://en.wikipedia.org/wiki/Memory-mapped_file | ||
[B+ tree]: https://en.wikipedia.org/wiki/B%2B_tree | ||
|