-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hazelcast Instrumentation #2658
Conversation
From version 3.6 onwards - Initial revision - Refactoring and refinement
I am not sure what the build failure is but it's happening in the Hibernate instrumentation and appears to be intermittent. |
There is an intermittent issue with building Hibernate (and Elasticsearch) on circleci, so don't worry about it if you see that. All the failures I looked at just now were related to Hazelcast though. |
...cast-4.0/src/main/java/datadog/trace/instrumentation/v4/ClientInvocationInstrumentation.java
Outdated
Show resolved
Hide resolved
...ast-3.6/src/main/java/datadog/trace/instrumentation/v3/DistributedObjectInstrumentation.java
Outdated
Show resolved
Hide resolved
...ast-3.6/src/main/java/datadog/trace/instrumentation/v3/HazelcastInstanceInstrumentation.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very welcome contribution, but the implementation a very complicated. My concerns surround the propagation of the instance name to the invocation, which involves transforming DistributedObject
to hold a hard reference to HazelcastInstance
, so that the instance name is available when the span is started in the client proxy.
Field injection can have unanticipated consequences like preventing garbage collection of the context value, increases the size of the host object, and can lead to locking of the host object.
The instance name is actually available to ClientInvocation
via the HazelcastClientInstanceImpl
and can be read when the invocation proceeds, so this should be simple to fix. This means we can just get rid of the HazelcastInstanceInstrumentation
and its associated context stores.
I haven't figured out how propagate the method name from the proxy down to the ClientInvocation
but there are a few attributes to rule out extracting similar information from in the ClientInvocation
. The ClientMessage
and object name look like good candidates for enriching the ClientInvocationInstrumentation
. Please investigate extracting attributes similar to the proxy's method name from the context available within ClientInvocation
so that we can remove DistributedObjectInstrumentation
. If that's not possible, then I would see justification for DistributedObjectInstrumentation
's existence.
Yes, the instrumentation for 3.9 and beyond is a lot simpler based on the In theory, we could create our own operation name with a mapping table of message types, but that's a lot of manual labour and still leaves us without an object name. The object name is like the table name, so it's important information to have. We really have no good way to get it, since it doesn't appear consistent how it's represented in the message. This is why I ended up instrumenting the entire public interface for 3.6.
Yes, this would work for the 3.6 instrumentation but maybe not as well for the 3.9+ since we instrument when invoking the
Yes, this is what I did for 3.9+ but as I mention above, the ClientInvocation just doesn't have the information needed until 3.9. |
I think we need to consolidate this in to two instrumentations:
Does this sound like a reasonable approach? If simplifying the 3.9+ instrumentation is more work than it's worth to you given that you're on 3.8, perhaps we could limit the scope of this PR to 3.8, so you get tracing for your applications in return for the effort you've already put in, and we don't have to take on the risk. We could then do a 3.9+ instrumentation in house. Specifically regarding:
We could always field inject the name (i.e. a plain old string value) into |
This is actually exactly what I've done. The 3.6 instrumentation is the only one that operates on the |
Would this be done using the ContextStore or a different approach? Is the main risk to stay away from linking to a complex entity with this approach? |
I apologise for mixing things up - instrumentation is hard to review because both the framework and the code in the change needs to be considered. There is a lot of code to review here, let alone the surrounding context of the client library, and it would be easier to review if the changes were made in smaller units - either in separate PRs or in separate commits for each version of the library. As things stand, we have the following:
If the instance name were captured and injected in to |
Yes, I don't really have any concerns about injecting a string or an instrumentation object as in the Ignite instrumentation, but if we field inject complex services with lifecycles, we mutate the object graph in ways library developers won't be able to anticipate, and we can prevent the injected service being garbage collected, for example. |
Instrument ClientInvocation to capture Hazelcast client instance instead of having to use context store
Sorry about that. It was about 2 days of work that I was working on in parallel, so I created a single unit to support from past to present. I'll aim for smaller PR's in the future.
I have removed the use of the contextStore from this one based on your suggestions. I will call it
Actually, the lack of instance name in 3.9-3.10 is an accidental omission. I'll see what I can do. The main reason these were split was because Hazelcast moved around some of the classes I was depending on in 3.11. Is there a good way to support this scenario in the same instrumentation?
Yes, this should be doable.
See above. |
- Renamed the 3.6 instrumentation to hazelcast_legacy, so it is enabled independently of the more streamlined mainstream hazelcast instrumentations - Set to always be disabled by default since this will never be a mainstream instrumentation
It's not a big deal but would be appreciated in the future (thanks for the contributions!).
Which class names changed? |
The main reason for the 3.11 version was that |
The reason for the |
Since it implements |
Yes, but it would be complicated to do within dd-trace-java and what you have done feels like a reasonable trade-off to me. Thanks for explaining the rationale. |
Hmm, that's weird. I already had the namedOneOf() in there. I thought I had tried and it didn't liked it when I used HazelcastInstance as the parameter in the Advice, but seems to be working now, so maybe we can kill off the 3.11 instrumentation after all. |
- Was able to get around the renaming of the HazelcastClientInstanceImpl by acting upon it as a HazelcastInstance, which thus allowed us eliminate the need for the 3.11 instrumentation
...hazelcast-3.6/src/main/java/datadog/trace/instrumentation/v3/DistributedObjectDecorator.java
Outdated
Show resolved
Hide resolved
.../hazelcast-4.0/src/main/java/datadog/trace/instrumentation/v4/ClientInvocationDecorator.java
Outdated
Show resolved
Hide resolved
.../hazelcast-4.0/src/main/java/datadog/trace/instrumentation/v4/ClientInvocationDecorator.java
Outdated
Show resolved
Hide resolved
...azelcast-3.9/src/main/java/datadog/trace/instrumentation/v3_9/ClientInvocationDecorator.java
Outdated
Show resolved
Hide resolved
...elcast-4.0/src/main/java/datadog/trace/instrumentation/v4/ClientListenerInstrumentation.java
Outdated
Show resolved
Hide resolved
This is looking good. There are a few minor comments about logging. Also please consider inlining any decorator methods which only do one thing to the span into the advice. |
You can do a few different things:
|
- Updates based on review comments - Backport improved tests from 4.0 to 3.9
- Ensure that parallel runs of the hazelcast tests don't discover each other
- configServer() method was missing to allow custom server config from the test
- Enable reporting of Client.createProxy operations for consistency with 4.0 - Update tests to use random names to avoid any ordering dependency on tests -- the createProxy call only happens the first time a particular resource is requested
It works now for all supported versions. I removed a few tests that were particularly prone to this issue, for example the Reliable Topic one, which is implemented on top of the ring buffer. |
Please pay particular attention to the enhancement I made to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main problem is the sourceCompatibility
. I think it would have been obvious if our build was actually working.
dependencies { | ||
compile project(':dd-java-agent:instrumentation:hazelcast') | ||
|
||
compileOnly group: 'com.hazelcast', name: 'hazelcast-all', version: '3.8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, this version should match the version labeled in the project name.
Also, please add a comment why testCompile
isn't using 3.6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything specific to async in your instrumentation advice classes (just tests and instrumentation matchers), so maybe this could be declared as compile for 3.6, but test with 3.8?
...ast-3.6/src/main/java/datadog/trace/instrumentation/v3/DistributedObjectInstrumentation.java
Outdated
Show resolved
Hide resolved
...tion/hazelcast/src/main/java/datadog/trace/instrumentation/hazelcast/HazelcastConstants.java
Outdated
Show resolved
Hide resolved
...st-3.9/src/main/java/datadog/trace/instrumentation/v3_9/ClientInvocationInstrumentation.java
Outdated
Show resolved
Hide resolved
dd-java-agent/instrumentation/hazelcast/hazelcast-4.0/hazelcast-4.0.gradle
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor changes then I think we're good to merge. Nice work.
...cast-3.6/src/main/java/datadog/trace/instrumentation/v3/ClientInvocationInstrumentation.java
Outdated
Show resolved
Hide resolved
...azelcast-3.9/src/main/java/datadog/trace/instrumentation/v3_9/ClientInvocationDecorator.java
Outdated
Show resolved
Hide resolved
...elcast-4.0/src/main/java/datadog/trace/instrumentation/v4/ClientListenerInstrumentation.java
Outdated
Show resolved
Hide resolved
Also linking to this discussion regarding wire propagation for future reference: https://groups.google.com/g/hazelcast/c/i6FAz4LL1H8 |
- Address potential random test failures due to timing issues
I moved the instrumentations to the root level dd-java-agent/instrumentation directory since Gradle was complaining when they resided inside a hazelcast directory, which wasn't itself a Gradle module. |
oh, right... I forgot about that. sounds good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Nice work.
From Hazelcast version 3.6 onwards (tested up to 4.2.x, which is the latest at the time of writing).
Client-side instrumentation only.