New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch auto-generated IDs to Flake IDs from random UUIDs #7531
Conversation
Using SecureRandom as a UUID generator is slow and doesn't allow us to take adavantage of some lucene optimizations around ids with common prefixes. This commit will allow us to use a timestamp64bit-macAddr-counter UUID. Since the macAddr may be shared among several nodes running on the same hardware we use an xor of the macaddr with a SecureRandom number generated on startup. See elastic#5941
…hout an incoming id. Wire up the timestampUUID generator to indexing. See elastic#5941
Incorporate some of the changes from @kimchy and @s1monw. Move the UUID generators into their own classes and provide a common interface as a first step to moving them under a singleton. Use a better method of getting the mac address and fall back to a secure random address if it fails. Add tests to test conccurency and shared prefix integrity of UUIDs. Use PaddedAtomicLongs to hold the sequence number and lasttime. Check to see if a time slip has occured as described by @s1monw in a CAS loop. Next step is to move the impls under a singleton. See elastic#5941
Reduce number of time bytes to 6 reducing total number of bytes to 20. Validate that we have a mac address that contains data to avoid getting addresses that are just 00:00:00:00:00:00 which can happen on virtualized machines. Remove use of ByteBuffer on puts to reduce overhead. Add code to attempt to prevent time slips. See elastic#5941
Simplify mac address validation routing and remove unneed variable.
/** These are essentially flake ids (http://boundary.com/blog/2012/01/12/flake-a-decentralized-k-ordered-unique-id-generator-in-erlang) but | ||
* we use 6 (not 8) bytes for timestamp, and use 3 (not 2) bytes for sequence number. */ | ||
|
||
class TimeBasedUUID implements UUIDGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd that it does not end in Generator
, which makes it seem like a generated UUID (same for RandomBasedUUID
).
@mikemccand LGTM. Just minor fluff. |
Thanks @pickpg I pushed a new commit... |
|
||
public class MacAddressProvider { | ||
|
||
private static final ESLogger logger = Loggers.getLogger("MacAddressProvider"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be Loggers.getLogger(MacAddressProvider.class)
for consistency with other classes?
Thanks @jpount, I pushed a new commit folding in your feedback... |
LGTM |
Flake IDs give better lookup performance in Lucene since they share
predictable prefixes (timestamp).
Closes #5941
This PR starts from @GaelTadh's original PR (#6004) and just folds in the last round of feedback ... I think it's ready?