New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add time based UUIDs #6004
Add time based UUIDs #6004
Conversation
Using SecureRandom as a UUID generator is slow and doesn't allow us to take adavantage of some lucene optimizations around ids with common prefixes. This commit will allow us to use a timestamp64bit-macAddr-counter UUID. Since the macAddr may be shared among several nodes running on the same hardware we use an xor of the macaddr with a SecureRandom number generated on startup. See elastic#5941
…hout an incoming id. Wire up the timestampUUID generator to indexing. See elastic#5941
can we add a unit test that verifies the shared prefix behavior of this new UUID? |
} | ||
} | ||
|
||
} catch (Exception uhe){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we catch Throwable
here to be safe? Also, maybe do a simple debug log it so we can see why it fails if we really want to?
LAST_TIMESTAMP handling is not thread safe, but at the end, an 8 bytes incremental sequence number with salted mac address should protect from time shifts and restarts. I wonder if we can remove it completely? Need to think about this. Also, now we are moving to 22 byte UID, compared to previously having 16 byte UID, potentially, the sequence number can be smaller, but then, we might need to make sure we generate / make it incremental within the same timestamp as the seq generation scope is smaller which entails synchronization. |
also, the mac address code is flaky, its not only on OSX that we can potentially fail, but also on systems that are virtualized, ... . Check https://github.com/cowtowncoder/java-uuid-generator/blob/3.0/src/main/java/com/fasterxml/uuid/EthernetAddress.java for inspiration. |
@@ -1546,6 +1581,36 @@ public static String randomBase64UUID(Random random) { | |||
} | |||
} | |||
|
|||
public static String timestampBase64UUID() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of things:
- I think this should be it's own class that can be a singleton. We should have a TimeBasedUUID.java and RandomBasedUUID.java that way we can put in the right impl on initialisation time and don't need the if here.
- if we have dedicated impls then we can simply use member variables rather than static variable.s
- we should use a PaddedAtomicLong for the timestamp and update in a
CAS
loop like:
do {
long last = lastTimestamp.get();
if (lastTimeStamp.compareAndSet(last, timestamp) {
break;
}
} while(last < timestamp);
- I wonder if 32bit sequence IDs are enough?
- do we really need the ByteBuffer instance of can we maybe just use a simple copy method for the bytes, the long should be simple as well just a static method with 4 shifts?
Incorporate some of the changes from @kimchy and @s1monw. Move the UUID generators into their own classes and provide a common interface as a first step to moving them under a singleton. Use a better method of getting the mac address and fall back to a secure random address if it fails. Add tests to test conccurency and shared prefix integrity of UUIDs. Use PaddedAtomicLongs to hold the sequence number and lasttime. Check to see if a time slip has occured as described by @s1monw in a CAS loop. Next step is to move the impls under a singleton. See elastic#5941
I've pushed some new changes including the unit tests. @s1monw I can remove the ByteBuffer stuff if we really want to and add our own. I'm pulling things into a singleton now. |
long last; | ||
|
||
final byte[] uuidBytes = new byte[22]; | ||
do { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing the time is going backwards
checks here? Essentially we should:
- check the current timestamp and if it goes backwards add one
- try update the timestamp
- if success step out
- else read the current timestamp again and retry
Reduce number of time bytes to 6 reducing total number of bytes to 20. Validate that we have a mac address that contains data to avoid getting addresses that are just 00:00:00:00:00:00 which can happen on virtualized machines. Remove use of ByteBuffer on puts to reduce overhead. Add code to attempt to prevent time slips. See elastic#5941
Simplify mac address validation routing and remove unneed variable.
private PaddedAtomicLong lastTimestamp = new PaddedAtomicLong(); | ||
|
||
byte[] secureMungedAddress = MacAddressProvider.getSecureMungedAddress(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these members can be final?
address = getMacAddress(); | ||
} catch( SocketException se ) { | ||
logger.warn("Unable to get mac address, will use a dummy address",se); | ||
address = constructDummyMulticastAddress(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens regardless in the next if
-statement since address
remains null
.
Thanks @pickypg, I started from @GaelTadh patch and then folded in The one thing I didn't do is move the singleton to UUIDGenerator I removed the PaddedAtomicLong: I think it's overkill here, and a I left the CAS loop to ensure lastTimestamp grows monotonically, but I removed APIs to parse out the time-based UUID: it's supposed to be I broke out SecureRandomHolder as its own package private class. I dropped sequence number from 8 bytes to 3, and now increment the |
@mikemccand LGTM |
See #5941.
Using SecureRandom as a UUID generator is slow and doesn't allow us
to take adavantage of some lucene optimizations around ids with common
prefixes. This commit will allow us to use a
timestamp64bit-macAddr-counter UUID. Since the macAddr may be shared
among several nodes running on the same hardware we use an xor of the
macaddr with a SecureRandom number generated on startup.