Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 115 lines (68 sloc) 6.858 kB
cec4493 @d2fn add readme and test harness
d2fn authored
1 # Flake: A decentralized, k-ordered id generation service in Erlang
2
3
85a9338 @d2fn add faq and deployment sections
d2fn authored
4 Flake produces 128-bit, k-ordered ids (read time-ordered lexically). Run one on each node in your infrastructure and they will generate conflict-free ids on-demand without coordination.
5
6 Read the original [post](http://blog.boundary.com/2012/01/12/Flake-A-decentralized-k-ordered-id-generation-service-in-Erlang.html) on the Boundary blog.
cec4493 @d2fn add readme and test harness
d2fn authored
7
8 To get started
9
10 git clone git://github.com/boundary/flake.git
11
12 Then edit <tt>rel/files/sys.config</tt> to fit your environment.
13
14 * <tt>interface</tt> - set to an available network interface from which to pull a 48-bit mac address as the worker id.
15 * <tt>timestamp_path</tt> - set to a location where flake can periodically save the current time. If flake detects on startup that this file contains timestamps in the future or the distant past, it will refuse to startup. This is to prevent problematic ids from being distributed.
16 * <tt>allowable_downtime</tt> - an added safeguard to prevent flake from starting up if it sees it hasn't run in a long period of time according to the system clock since this might be an indication that the clock has been skewed far into the future.
17
18 Example configuration:
19
20 [
21 {flake, [
22 {interface, "en0"},
23 {timestamp_path, "/srv/flake/timestamp-dets"},
24 {allowable_downtime, 2592000000}
25 ]}
26 ].
27
28 Then simply run the server inline
29
30 ./start.sh
31
32 And use the embedded test harness to ensure that you are able to generate ids.
33
34 Generate 1 id and receive the erlang binary
35
36 (flake@localhost)1> flake_harness:generate(1).
37
38 [<<0,0,1,52,212,33,45,67,16,154,221,94,14,143,0,0>>]
39
40 Generate 10 base-62 encoded ids
41
42 (flake@localhost)2> flake_harness:generate(10,62).
43
44 ["8HFaR8qWtRlGDHnO57","8HFaR8qWtRlGDHnO56",
45 "8HFaR8qWtRlGDHnO55","8HFaR8qWtRlGDHnO54",
46 "8HFaR8qWtRlGDHnO53","8HFaR8qWtRlGDHnO52",
47 "8HFaR8qAulTgCBd6Wp","8HFaR8qAulTgCBd6Wo",
48 "8HFaR8qAulTgCBd6Wn","8HFaR8qAulTgCBd6Wm"]
49
50 Time how long it takes to generate 100,000 ids
51
52 (flake@localhost)3> flake_harness:timed_generate(100000).
53
54 src/flake_harness.erl:33: generating ids: 0.402 s
55
85a9338 @d2fn add faq and deployment sections
d2fn authored
56
57 These last steps simple ensure that a flake application is up and running. Next we'll talk more about operational use.
58
59
60 # Deployment
61
62 Flake is a standalone application. Request ids with a <tt>gen_server:call</tt> from another Erlang VM (or application that speaks Erlang distribution a la [Scalang](https://github.com/boundary/scalang)).
63
64 Example usage from your application.
cec4493 @d2fn add readme and test harness
d2fn authored
65
66 flake() ->
67 Node = {flake, flake@localhost},
68 {ok, FlakeId} = gen_server:call(Node, get),
69 {ok, FlakeIdBase62} = gen_server:call(Node, {get,62}),
70 %% example id decomposition for demonstration only
3a97111 @d2fn more readme / faq updates, include flake anatomy
d2fn authored
71 <<_Time:64/integer,_WorkerId:48/integer,_Sequence:16/integer>> = FlakeId,
cec4493 @d2fn add readme and test harness
d2fn authored
72 FlakeId.
73
85a9338 @d2fn add faq and deployment sections
d2fn authored
74 # Anatomy
75
3a97111 @d2fn more readme / faq updates, include flake anatomy
d2fn authored
76 Flake ids are 128-bits wide described here from most significant to least significant bits.
77
78 64-bit timestamp - milliseconds since the epoch (Jan 1 1970)
79 48-bit worker id - MAC address from a configurable device
80 16-bit sequence # - usually 0, incremented when more than one id is requested in the same millisecond and reset to 0 when the clock ticks forward
81
82
85a9338 @d2fn add faq and deployment sections
d2fn authored
83 # Roadmap
84
85 * Bulk id generation
86 * HTTP interface
3a97111 @d2fn more readme / faq updates, include flake anatomy
d2fn authored
87 * Client library (Erlang, possibly others)
85a9338 @d2fn add faq and deployment sections
d2fn authored
88 * Provides an abstraction for the messaging format
89 * Will generate a notional flake id (with an empty worker id) given a timestamp. This is useful for generating a lexical range of values that could have been generated in a span of time. At Boundary we store data in Riak keyed by flake ids and use keyfilters to page out blocks of data using this technique.
90
91
92 # Frequently Asked Questions
93
94 ######What if I'm not using Erlang?
95 An HTTP interface is on the roadmap. For applications written in Scala, [Scalang](https://github.com/boundary/scalang) is an option.
96
97 ######How does this differ from snowflake developed at Twitter?
3a97111 @d2fn more readme / faq updates, include flake anatomy
d2fn authored
98 The differences stem primarily from the fact that Twitter snowflake ids are 64-bits wide. This means that additional coordination is required to pick a worker id (twitter does this via a ZooKeeper ensemble) and a new epoch must be defined so that timestamps can fit into a small number of bits. Their scheme works great when your ids must fit into 64-bits. However this comes at the cost of additional coordination among nodes and a system that is generally a little more difficult to reason about. It is a fine system though and we were able to learn from it in our efforts.
85a9338 @d2fn add faq and deployment sections
d2fn authored
99
100 ######How is flake different from rearranging the bits of a UUID-1 id?
3a97111 @d2fn more readme / faq updates, include flake anatomy
d2fn authored
101 First, successive UUID versions aim to make ids increasingly *opaque* in nature through various means. We have actually found a great deal of utility in structurally transparent unique ids and that has motivated much of this work. The most transparent variant is UUID-1 (for which it has received a fair bit of criticism) and thus the nearest relative to a flake id. There are some important differences though.
85a9338 @d2fn add faq and deployment sections
d2fn authored
102
3a97111 @d2fn more readme / faq updates, include flake anatomy
d2fn authored
103 UUID-1 is an odd beast. First, the timestamp is based on the number of 100 nanosecond intervals since October 15, 1582. This is not how most of us familiar with Unix timestamps reason about time. If that isn't bad enough, the timestamp is an odd 60-bits in length with the most significant bits shifted to the least significant bits of the UUID. This property makes lexical ordering essentially meaningless. The remaining bits contain a clock id (initially set to a random number) and a node id (usually the MAC address).
85a9338 @d2fn add faq and deployment sections
d2fn authored
104
105 The first problem is the timestamp. We could rearrange the bits to get some k-ordering love, but reasoning on timestamps of this nature makes reasoning about these ids themselves more complex than it needs to be. That's why flake uses a standard 64-bit Unix timestamp, unaltered, as the most significant bits.
106
3a97111 @d2fn more readme / faq updates, include flake anatomy
d2fn authored
107 The next problem is clock skew and protection against replaying ids for a time in the past. The UUID-1 spec dictates that the clock id be incremented in such an event, but this behavior is implementation-specific and we aren't aware of any Erlang implementations that met our safety goals. Flake durably writes a timestamp to a dets table periodically while running. Following a restart, flake will refuse to startup if the timestamp written there is from the future. Furthermore, flake will refuse to generate ids if it detects that the system clock is running backwards.
85a9338 @d2fn add faq and deployment sections
d2fn authored
108
109 ######When are flake ids *not* appropriate?
110 Flake ids are predictable by design. Don't use use flake to generate ids that you'd rather be unpredictable. Don't use flake ids to generate passwords, security tokens, or anything else you wouldn't want someone to be able to guess.
111
112 Don't do modulo 2 arithmetic on flake ids with the expectation of random distribution. The least significant 16-bits are usually going to be 0 on a machine that is generating an average of one or fewer ids per millisecond.
cec4493 @d2fn add readme and test harness
d2fn authored
113
114
Something went wrong with that request. Please try again.