Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time based UUIDs #6004

Closed
wants to merge 5 commits into from
Closed

Conversation

GaelTadh
Copy link
Contributor

@GaelTadh GaelTadh commented May 1, 2014

See #5941.

Using SecureRandom as a UUID generator is slow and doesn't allow us
to take adavantage of some lucene optimizations around ids with common
prefixes. This commit will allow us to use a
timestamp64bit-macAddr-counter UUID. Since the macAddr may be shared
among several nodes running on the same hardware we use an xor of the
macaddr with a SecureRandom number generated on startup.

Using SecureRandom as a UUID generator is slow and doesn't allow us
to take adavantage of some lucene optimizations around ids with common
prefixes. This commit will allow us to use a
timestamp64bit-macAddr-counter UUID. Since the macAddr may be shared
among several nodes running on the same hardware we use an xor of the
macaddr with a SecureRandom number generated on startup.

See elastic#5941
…hout an incoming id.

Wire up the timestampUUID generator to indexing.

See elastic#5941
@kimchy
Copy link
Member

kimchy commented May 1, 2014

can we add a unit test that verifies the shared prefix behavior of this new UUID?

}
}

} catch (Exception uhe){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we catch Throwable here to be safe? Also, maybe do a simple debug log it so we can see why it fails if we really want to?

@kimchy
Copy link
Member

kimchy commented May 1, 2014

LAST_TIMESTAMP handling is not thread safe, but at the end, an 8 bytes incremental sequence number with salted mac address should protect from time shifts and restarts. I wonder if we can remove it completely? Need to think about this.

Also, now we are moving to 22 byte UID, compared to previously having 16 byte UID, potentially, the sequence number can be smaller, but then, we might need to make sure we generate / make it incremental within the same timestamp as the seq generation scope is smaller which entails synchronization.

@kimchy
Copy link
Member

kimchy commented May 1, 2014

also, the mac address code is flaky, its not only on OSX that we can potentially fail, but also on systems that are virtualized, ... . Check https://github.com/cowtowncoder/java-uuid-generator/blob/3.0/src/main/java/com/fasterxml/uuid/EthernetAddress.java for inspiration.

@@ -1546,6 +1581,36 @@ public static String randomBase64UUID(Random random) {
}
}

public static String timestampBase64UUID() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of things:

  • I think this should be it's own class that can be a singleton. We should have a TimeBasedUUID.java and RandomBasedUUID.java that way we can put in the right impl on initialisation time and don't need the if here.
  • if we have dedicated impls then we can simply use member variables rather than static variable.s
  • we should use a PaddedAtomicLong for the timestamp and update in a CAS loop like:
do {
  long last = lastTimestamp.get();
  if (lastTimeStamp.compareAndSet(last, timestamp) {
    break;
  } 
} while(last < timestamp);
  • I wonder if 32bit sequence IDs are enough?
  • do we really need the ByteBuffer instance of can we maybe just use a simple copy method for the bytes, the long should be simple as well just a static method with 4 shifts?

Incorporate some of the changes from @kimchy and @s1monw.
Move the UUID generators into their own classes and provide a common
interface as a first step to moving them under a singleton. Use a better
method of getting the mac address and fall back to a secure random address
if it fails. Add tests to test conccurency and shared prefix integrity of
UUIDs. Use PaddedAtomicLongs to hold the sequence number and lasttime.
Check to see if a time slip has occured as described by @s1monw in a CAS loop.
Next step is to move the impls under a singleton.

See elastic#5941
@GaelTadh
Copy link
Contributor Author

GaelTadh commented May 2, 2014

I've pushed some new changes including the unit tests. @s1monw I can remove the ByteBuffer stuff if we really want to and add our own. I'm pulling things into a singleton now.

long last;

final byte[] uuidBytes = new byte[22];
do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing the time is going backwards checks here? Essentially we should:

  • check the current timestamp and if it goes backwards add one
  • try update the timestamp
    • if success step out
    • else read the current timestamp again and retry

Reduce number of time bytes to 6 reducing total number of bytes to 20.
Validate that we have a mac address that contains data to avoid getting
addresses that are just 00:00:00:00:00:00 which can happen on virtualized
machines. Remove use of ByteBuffer on puts to reduce overhead. Add code to
attempt to prevent time slips.

See elastic#5941
Simplify mac address validation routing and remove unneed variable.
@kevinkluge kevinkluge added v1.2.0 and removed v1.2.0 labels May 8, 2014
@clintongormley clintongormley changed the title Enhancement/5941 Add time based UUIDs Jul 11, 2014
private PaddedAtomicLong lastTimestamp = new PaddedAtomicLong();

byte[] secureMungedAddress = MacAddressProvider.getSecureMungedAddress();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these members can be final?

address = getMacAddress();
} catch( SocketException se ) {
logger.warn("Unable to get mac address, will use a dummy address",se);
address = constructDummyMulticastAddress();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens regardless in the next if-statement since address remains null.

@mikemccand
Copy link
Contributor

Thanks @pickypg, I started from @GaelTadh patch and then folded in
most of your & my feedback into a new branch here:

mikemccand@e5e21b7

The one thing I didn't do is move the singleton to UUIDGenerator
interface; I think Strings.* APIs is good place to expose access to
these different generators?

I removed the PaddedAtomicLong: I think it's overkill here, and a
distraction.

I left the CAS loop to ensure lastTimestamp grows monotonically, but
tried to simplify it.

I removed APIs to parse out the time-based UUID: it's supposed to be
opaque to callers.

I broke out SecureRandomHolder as its own package private class.

I dropped sequence number from 8 bytes to 3, and now increment the
timestamp when the bottom two bytes are 0, to deal with a biggish
clock slip backwards while JVM is up. I also randomized the sequence
id on init for best effort protection of backwards clock-slip while
JVM is down.

@pickypg
Copy link
Member

pickypg commented Aug 31, 2014

@mikemccand LGTM

@mikemccand mikemccand removed the review label Sep 1, 2014
@mikemccand mikemccand closed this in 9c1ac95 Sep 2, 2014
@mikemccand mikemccand removed the v1.4.0 label Sep 2, 2014
mikemccand added a commit that referenced this pull request Sep 2, 2014
Flake IDs give better lookup performance in Lucene since they share
predictable prefixes (timestamp).

Closes #7531

Closes #6004

Closes #5941
mikemccand added a commit that referenced this pull request Sep 8, 2014
Flake IDs give better lookup performance in Lucene since they share
predictable prefixes (timestamp).

Closes #7531

Closes #6004

Closes #5941
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants