New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate a local UUID for requests with a parent UUID for debug purposes #60
Comments
There was a suggestion a year ago that we derive new request uuids from passed in ones, such that they become unique but it is still possible to search on the parent. the intention was to append to the string of the parent uuid, but that could conceivably be considered to be an implementation choice of this request. |
I think that it's better to generate a new uuid and log that as a new field On Tuesday, January 7, 2014, Simon Matic Langford wrote:
|
Won't you have to use regex or some other configuration to extract the uuid anyway? If we introduce a seperate uuid, then bear in mind that that will only be logged in one place alongside the parent (likely the request log since it's common to all transports). From the PoV of something like Zipkin which is automagically stitching things together and grabbing all available data this is simple, for a human you will always need to join across this single log to trace something, making scripts more complex. The appending mechanism is actually quite an elegant solution since it helps in a number of styles of usage:
It's also not dissimilar to a mechanism we have in tracing events through the system - perhaps we should/could consider joining them. If we change the appending style from what I'd originally considered, namely just appending -1, -2, -3 to the current uuid when making a request, to including the hostname like we do for new uuids, then we know the exact location of the source (at least where it's something sharing the uuid scheme - and if it's not at least it will look different). |
Fair enough.. But I don't know how could we append the -1, -2, -... in the client side. Most of the times, there are requests to the same endpoints in parallel. |
And yet they all get passed the same RequestUUID - I see 2 choices here:
|
2014/1/8 Simon Matic Langford notifications@github.com
|
I think you got the jist of it in 1.. the 2 choices here weren't really choices on re-reading, just different parts of the same thing. I do think we should follow my suggestion to add the host as well as the -1, -2, but only if the uuid that we're creating from is not generated by this host (because there's wasn't one on the request). e.g. I'm making a new request on host abc001:
|
Sounds like a plan.. :) |
So I started looking at this this morning and there are some probs..
Suggested solutions:
|
I wonder if the extended uuid should be sent in a new header alongside the legacy one, so that we don't have to have a config option - will have to consider impact on binary client.. |
Thinking about the 2nd suggested solution, in fact, it's what we need for Just to clarify the origin of this issue.. I think that it was created So, maybe we can implemented this issue and solve the debug problem and at 2014-07-02 11:56 GMT+01:00 Simon Matic Langford notifications@github.com:
|
That is precisely why I'm working on this now ;) Also hence the comments regarding the need for ExecutionContext changes for Zipkin integration. The only thing that seems a little rough around the edges is how to ensure the client doesn't send out the same uuid for multiple calls - at the moment it's up to the developer to pass a sub-uuid to the client when constructing the execution context.. |
I think that we can delegate the generation of the sub-uuid to the cougar After that, when a cougar service receives call with the 3 values Using this approach, the UUID that should be logged to represent the This is a little bit difficult to explain by email, but I hope that you can 2014-07-02 14:32 GMT+01:00 Simon Matic Langford notifications@github.com:
|
All understood, it's exactly what i was thinking, except i think we want to log the full compound uuid in all cases, and the zipkin log parsing can extract the bits it needs from that field and humans still get to see the full uuid (it also means no breaking change to the log format). With regard to delegating to the client, it seems the preferred approach, but I only wonder if this could cause probs where existing apps already pass in a new generated request uuid. However I wonder if we can detect and deal with this. I will think some more on the situations which could be problematic. |
When I was investigating how we could integrate zipkin in cougar, I wasn't planing to use log files because I think that we don't really need it. At least in Betfair, they're using 0MQ to emit metrics to their own metrics aggregator system (something like statsd). So, I was planning to emit zipkin events to 0MQ and then use that system to pick those events and send them to the zipkin collector. Regarding your concern, you're talking about the cases where the developers are creating a new ExecutionContext object and setting the UUID of that ExecutionContext? Sorry if it doesn't make sense but my memory isn't fresh enough to remember those "strange" situations. Could you post some snippets of code (if you've some)? |
The metrics aggregator system is called statse and whilst a client has been released and it is integrated into Cougar via Tornjak it's now 8 months down the line and the server has not been released, hence the feature raised to integrate statsd into Tornjak. My concern on using 0MQ to integrate Cougar with Zipkin would be as to whether the other end of 0MQ (the collection daemon) would also be released in any sensible timescales. So it may well be that #59 is the shared abstraction and the Zipkin integration for Cougar use something off the shelf and already available (for example Brave's Zipkin Span Collector. It rather depends on Betfair's capacity/appetite for open sourcing more components. I would still emit the full uuid in the log files to support the other usecases surrounding uuids. Regarding my concern - it's rubbish, I realised that now, so ignore me on that. |
If by chance you're talking about Cougar applications that receive a On Wed, Jul 2, 2014 at 8:16 PM, Simon Matic Langford <
|
Ah, I'd rather assumed it wasn't. I think with the changes we're suggesting, that would be no worse, and if tracing across distributed calls was desirable then perhaps service owners could be persuaded to fix their apps - especially if all they need to do is pass the RequestUUID from the EC passed into the service code into the client calls. I suspect in most cases this is a case of people not realising the benefits they can achieve by just passing this item on. |
I'd have thought people were self-motivated to fix this, given the On 2 July 2014 20:23, Simon Matic Langford notifications@github.com wrote:
|
I think, in the applications I've seen, it was often the case that by the Another case I seem to remember was where a popular internal service On Wed, Jul 2, 2014 at 8:39 PM, richardqd notifications@github.com wrote:
|
The distributed client library of course comes with it's own issues, like generated code incompatibility with different Cougar versions, but the first is the situation I had imagined originally. |
Can anyone think of a better mechanism for ensuring backwards compatibility (new clients talking to old servers) when introducing this? So far I have 2 ideas, I think I prefer the second:
|
I've just realized that Pedro has earlier said: "this issue.. was created after an internal discussion in betfair to define That was me representing service B. I definitely did not want multiple I'd even be in favor of an inbuilt mechanism that actively tries to prevent Another thing I'm not keen on regarding UUIDs is that people can use Simon, idea 2 sounds better to me because it doesn't create an 'old way vs On Wed, Jul 2, 2014 at 9:07 PM, Simon Matic Langford <
|
I'd wondered about client 'memory' to try to resolve that issue, although if we change the clients so they always generate a sub this wouldn't have much effect. It also doesn't deal with clients which are not using cougar generated clients. The UUID naming I totally agree on, although is hampered by the ability to plugin a custom generator which seems valid if you wish some form of service identifier but your hostname is not sufficient. I wonder if it's valid for a server to reject a client uuid or regenerate part of it if it disagrees with the formatting? Probably not, this sounds like a governance issue to me although I definitely agree with the sentiment. |
From a Zipkin point of view, considering it follows Google's Dapper paper we will need to support not 2 but 3 data fields (headers in the case of HTTP):
These fields should all be probabilistically unique 64 bits integers (i.e. longs), which would imply a conflict with the current X-UUID format. This would be especially problematic if we use something like Brave (like was referred above) as its API is expecting longs, but depending on whether Zipkin collector/query/web components introduce this format contraint it might be even impossible to use any other format. I think using hash functions here in order to convert the UUID string format to a long would imply unnecessary work per request/span and defeat the purpose of using the entire 64 bits range. I would suggest we add those 3 new fields to the transports while keeping the current X-UUID field format for backward compatibility. |
Is there a suggested mechanism that makes them probabilistically unique enough? Seems odd to restrict the ids to longs when it’s relatively easy to generate GUIDs if you allow strings. I’d almost argue that’s rather a large failing, but I’m sure they have their reasons. Frustratingly, I’d rather assumed strings were ok, should have read the docs better. Given the amount of work we do everywhere else, hashing might not be unreasonable and would allow dual use of the X-UUID data.. On 3 Jul 2014, at 00:07, André Pinto notifications@github.com wrote:
|
On Dapper's paper? No. I'm using ThreadLocalRandom.nextLong(Long.MIN_VALUE, Long.MAX_VALUE) (Java 7+) in Mantis in order to avoid contention. Pedro Vilaça also suggested the use of UUID.getMostSignificantBits() although I think that would probably be a little more expensive (haven't tested though). |
Can it handle clashes if they do occur? I can't find any reference in the docs. |
Are you referring to Zipkin ids clashes? I think they don't do any magic here. If you have duplicated trace ids (which is very unlikely with a uniform probabilistic function over a 64 bit range - 5.4210109e-20 probability) then it will probably override the old one:
From a browser extension plugin but still on the twitter/zipkin repo: |
ok, cool thanks. Am going to explore hashing a little to see if there's any legs there in terms of low duplicate rates and full use of the 64bit spectrum, otherwise will consider fleshing out the isTracingEnabled on ExecutionContext into a TracingInfo interface which contains the isTracingEnabled and enables tracing plugins to hook in the data they need. |
So was thinking on this some more overnight... This feature request sits on it's own if you ignore the Zipkin/tracing requirements for the purposes of debugging, and so we should continue to make the changes on the basis I was already working. It would be nice to be able to leverage this for Zipkin/tracing, but not mandatory. If Zipkin needs the same info but in a different format, and is unable to derive it from this info (ie hashing doesn't work) then I think it's reasonable to state that the Zipkin ids are also request uuids, just a different format of them and thus it's reasonable for the Zipkin integration to provide a RequestUUID impl which provides the standard Cougar uuids as well as those required for Zipkin. Then we do not need changes to the EC interface (which as @richardqd says is painful) and we only then need to provide the tracing hooks in the client and transports. This also suggests that the header name I proposed is sensible and if we need extra info for tracing we can use an appropriate seperate header such as X-Trace-Info |
Hashing would also imply that every non-Cougar node of the ecosystem would have to be aware of the way Cougar's hash function works as they need to pass and receive the fields to/from them. |
Yes it would. It would have to be a well published algorithm. I appreciate On 4 July 2014 09:06, André Pinto notifications@github.com wrote:
|
ThreadLocalRandom.nextLong(Long.MIN_VALUE, Long.MAX_VALUE) is not an option. Looking at the method definition, and considering it refers no restriction on the documentation, I thought it would allow the entire long range, but after testing and looking at the implementation you can see that it only allows 32 bit range as the kind of operations it performs internally introduce overflow problems for larger ranges. I'm now using Vilaça's suggestion
|
…new socket protocol version (5) to ensure backwards compatibility. Includes unit and integration tests following the new style started in #73
…new socket protocol version (5) to ensure backwards compatibility. Includes unit and integration tests following the new style started in #73
This issue is now complete. Any further discussion regarding zipkin or tracing should occur on the appropriate issues. |
At the moment, if service A needs to make 2 independent calls to service B, according the best-practices, service A should pass the UUID to service B.
That is useful to identify a global transaction but isn't so useful if we need to identify a single request. We can track the requests looking at the logs, timestamps, ... but that could be a problem if we have alerts based on that field.
The text was updated successfully, but these errors were encountered: