Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hazelcast Instrumentation #2658

Merged
merged 16 commits into from
May 4, 2021
Merged

Conversation

darylrobbins
Copy link
Contributor

@darylrobbins darylrobbins commented Apr 25, 2021

From Hazelcast version 3.6 onwards (tested up to 4.2.x, which is the latest at the time of writing).

Client-side instrumentation only.

image

  • Resource names and other related tags differ in 3.9 and beyond with Hazelcast's introduction of an operation name.
  • For 3.6, the instrumentation is at the public interface level because the internal machinery does not provide sufficient visibility to do it at a lower level (but this is the version I am currently using)
  • For 3.9 and beyond, the instrumentation is carried out at the ClientInvocation-level, so will automatically pickup new operations. All the future version use the same general approach but the classes have moved around a few times and there are some new capabilities available from time to time.
  • I have tested it with Hazelcast Jet but the invocations vary so much from one run to another that I couldn't include it as a test.

@darylrobbins darylrobbins requested a review from a team as a code owner April 25, 2021 02:53
@darylrobbins darylrobbins marked this pull request as draft April 25, 2021 13:43
From version 3.6 onwards

- Initial revision
- Refactoring and refinement
@darylrobbins darylrobbins marked this pull request as ready for review April 25, 2021 21:20
@darylrobbins
Copy link
Contributor Author

I am not sure what the build failure is but it's happening in the Hibernate instrumentation and appears to be intermittent.

@richardstartin
Copy link
Member

There is an intermittent issue with building Hibernate (and Elasticsearch) on circleci, so don't worry about it if you see that. All the failures I looked at just now were related to Hazelcast though.

Copy link
Member

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very welcome contribution, but the implementation a very complicated. My concerns surround the propagation of the instance name to the invocation, which involves transforming DistributedObject to hold a hard reference to HazelcastInstance, so that the instance name is available when the span is started in the client proxy.

Field injection can have unanticipated consequences like preventing garbage collection of the context value, increases the size of the host object, and can lead to locking of the host object.

The instance name is actually available to ClientInvocation via the HazelcastClientInstanceImpl and can be read when the invocation proceeds, so this should be simple to fix. This means we can just get rid of the HazelcastInstanceInstrumentation and its associated context stores.

I haven't figured out how propagate the method name from the proxy down to the ClientInvocation but there are a few attributes to rule out extracting similar information from in the ClientInvocation. The ClientMessage and object name look like good candidates for enriching the ClientInvocationInstrumentation. Please investigate extracting attributes similar to the proxy's method name from the context available within ClientInvocation so that we can remove DistributedObjectInstrumentation. If that's not possible, then I would see justification for DistributedObjectInstrumentation's existence.

@darylrobbins
Copy link
Contributor Author

darylrobbins commented Apr 26, 2021

This is a very welcome contribution, but the implementation a very complicated.

Yes, the instrumentation for 3.9 and beyond is a lot simpler based on the ClientInvocation. The problem before that is the ClientInvocation doesn't have the objectName and the ClientMessage doesn't have the operationName. The easiest answer would be to support only 3.9+ but the version I care about is 3.8 at the moment.

In theory, we could create our own operation name with a mapping table of message types, but that's a lot of manual labour and still leaves us without an object name. The object name is like the table name, so it's important information to have. We really have no good way to get it, since it doesn't appear consistent how it's represented in the message.

This is why I ended up instrumenting the entire public interface for 3.6.

The instance name is actually available to ClientInvocation via the HazelcastClientInstanceImpl and can be read when the invocation proceeds, so this should be simple to fix. This means we can just get rid of the HazelcastInstanceInstrumentation and its associated context stores.

Yes, this would work for the 3.6 instrumentation but maybe not as well for the 3.9+ since we instrument when invoking the ClientInvocation and it doesn't store the reference to the client even though it gets it in the constructor.

I haven't figured out how propagate the method name from the proxy down to the ClientInvocation but there are a few attributes to rule out extracting similar information from in the ClientInvocation. The ClientMessage and object name look like good candidates for enriching the ClientInvocationInstrumentation. Please investigate extracting attributes similar to the proxy's method name from the context available within ClientInvocation so that we can remove DistributedObjectInstrumentation. If that's not possible, then I would see justification for DistributedObjectInstrumentation's existence.

Yes, this is what I did for 3.9+ but as I mention above, the ClientInvocation just doesn't have the information needed until 3.9.

@richardstartin
Copy link
Member

I think we need to consolidate this in to two instrumentations:

  1. A simplified 3.9+ instrumentation which focuses on ClientInvocation along the lines I mentioned, particularly removing the field injection of DistributedObject. hazelcast-client 3.9.0 was released in October 2017 so this instrumentation would carry the most risk (and value) to our customers if we merge it, so we need it to be as simple as possible.
  2. A 3.8 instrumentation where we can tolerate more complexity, which we would disable by default. This way you would get the tracing you need for your applications, and we don't need to worry about exposing our customers to unanticipated edge cases. I don't see a good reason to provide support for 3.6 or 3.7 in 2021.

Does this sound like a reasonable approach? If simplifying the 3.9+ instrumentation is more work than it's worth to you given that you're on 3.8, perhaps we could limit the scope of this PR to 3.8, so you get tracing for your applications in return for the effort you've already put in, and we don't have to take on the risk. We could then do a 3.9+ instrumentation in house.

Specifically regarding:

Yes, this would work for the 3.6 instrumentation but maybe not as well for the 3.9+ since we instrument when invoking the ClientInvocation and it doesn't store the reference to the client even though it gets it in the constructor.

We could always field inject the name (i.e. a plain old string value) into ClientInvocation on exit from its constructor.

@darylrobbins
Copy link
Contributor Author

darylrobbins commented Apr 26, 2021

Does this sound like a reasonable approach? If simplifying the 3.9+ instrumentation is more work than it's worth to you given that you're on 3.8, perhaps we could limit the scope of this PR to 3.8, so you get tracing for your applications in return for the effort you've already put in, and we don't have to take on the risk. We could then do a 3.9+ instrumentation in house.

This is actually exactly what I've done. The 3.6 instrumentation is the only one that operates on the DistributedObject public interface. From the 3.9 instrumentation and beyond, they operate entirely on the ClientInvocation. I migrated to the simpler approach as soon as it was possible. The 3.11 instrumentation is mostly the same but they moved around the classes a bit. The 4.0 instrumentation has a few additional capabilities but it's the same general approach.

@darylrobbins
Copy link
Contributor Author

We could always field inject the name (i.e. a plain old string value) into ClientInvocation on exit from its constructor.

Would this be done using the ContextStore or a different approach? Is the main risk to stay away from linking to a complex entity with this approach?

@richardstartin
Copy link
Member

This is actually exactly what I've done. The 3.6 instrumentation is the only one that operates on the DistributedObject public interface. From the 3.9 instrumentation and beyond, they operate entirely on the ClientInvocation. I migrated to the simpler approach as soon as it was possible. The 3.11 instrumentation is mostly the same but they moved around the classes a bit. The 4.0 instrumentation has a few additional capabilities but it's the same general approach.

I apologise for mixing things up - instrumentation is hard to review because both the framework and the code in the change needs to be considered. There is a lot of code to review here, let alone the surrounding context of the client library, and it would be easier to review if the changes were made in smaller units - either in separate PRs or in separate commits for each version of the library.

As things stand, we have the following:

  1. A complex 3.6-3.8 instrumentation which is the one you need to trace your own applications. I would be ok with accepting this one as is, but we need this to be disabled by default and independently of other versions of the instrumentation because of its complexity.
  2. A much simpler ClientInvocation based instrumentation for 3.9-3.10. I don't have any concerns about this one.
  3. a 3.11 instrumentation which instruments ClientInvocation and ClientMessage, and includes a context store from ClientInvocation to HazelcastInstance which makes me uneasy. I think this could be merged with 3.9 if the ClientInvocationInstrumentation added a context store to field inject the instance name in to ClientInvocation, even if it's not necessary for 3.9 and 3.10.
  4. A 4.0 instrumentation which again injects HazelcastInstance into ClientInvocation, where I think we could just inject the instance name instead, but doesn't look like it can be merged into a 3.9 instrumentation because the library has changed too much.

If the instance name were captured and injected in to ClientInvocation in the ClientInvocation constructor, is there any reason not to merge the 3.9 and 3.11 instrumentations?

@richardstartin
Copy link
Member

Would this be done using the ContextStore or a different approach? Is the main risk to stay away from linking to a complex entity with this approach?

Yes, I don't really have any concerns about injecting a string or an instrumentation object as in the Ignite instrumentation, but if we field inject complex services with lifecycles, we mutate the object graph in ways library developers won't be able to anticipate, and we can prevent the injected service being garbage collected, for example.

@richardstartin richardstartin added tag: community Community contribution inst: others All other instrumentations labels Apr 26, 2021
Instrument ClientInvocation to capture Hazelcast client instance instead of having to use context store
@darylrobbins
Copy link
Contributor Author

There is a lot of code to review here, let alone the surrounding context of the client library, and it would be easier to review if the changes were made in smaller units - either in separate PRs or in separate commits for each version of the library.

I apologise for mixing things up - instrumentation is hard to review because both the framework and the code in the change needs to be considered. There is a lot of code to review here, let alone the surrounding context of the client library, and it would be easier to review if the changes were made in smaller units - either in separate PRs or in separate commits for each version of the library.

Sorry about that. It was about 2 days of work that I was working on in parallel, so I created a single unit to support from past to present. I'll aim for smaller PR's in the future.

As things stand, we have the following:

  1. A complex 3.6-3.8 instrumentation which is the one you need to trace your own applications. I would be ok with accepting this one as is, but we need this to be disabled by default and independently of other versions of the instrumentation because of its complexity.

I have removed the use of the contextStore from this one based on your suggestions. I will call it hazelcast_legacy to separate from the more modern integration. I'd suggest we make this an undocumented feature so it can be killed off when we no longer need to support it easily.

  1. A much simpler ClientInvocation based instrumentation for 3.9-3.10. I don't have any concerns about this one.
  1. a 3.11 instrumentation which instruments ClientInvocation and ClientMessage, and includes a context store from ClientInvocation to HazelcastInstance which makes me uneasy. I think this could be merged with 3.9 if the ClientInvocationInstrumentation added a context store to field inject the instance name in to ClientInvocation, even if it's not necessary for 3.9 and 3.10.

Actually, the lack of instance name in 3.9-3.10 is an accidental omission. I'll see what I can do. The main reason these were split was because Hazelcast moved around some of the classes I was depending on in 3.11. Is there a good way to support this scenario in the same instrumentation?

  1. A 4.0 instrumentation which again injects HazelcastInstance into ClientInvocation, where I think we could just inject the instance name instead, but doesn't look like it can be merged into a 3.9 instrumentation because the library has changed too much.

Yes, this should be doable.

If the instance name were captured and injected in to ClientInvocation in the ClientInvocation constructor, is there any reason not to merge the 3.9 and 3.11 instrumentations?

See above.

- Renamed the 3.6 instrumentation to hazelcast_legacy, so it is enabled independently of the more streamlined mainstream hazelcast instrumentations
- Set to always be disabled by default since this will never be a mainstream instrumentation
@richardstartin
Copy link
Member

Sorry about that. It was about 2 days of work that I was working on in parallel, so I created a single unit to support from past to present. I'll aim for smaller PR's in the future.

It's not a big deal but would be appreciated in the future (thanks for the contributions!).

The main reason these were split was because Hazelcast moved around some of the classes I was depending on in 3.11. Is there a good way to support this scenario in the same instrumentation?

Which class names changed? com.hazelcast.client.spi.impl.ClientInvocation, com.hazelcast.client.impl.protocol.ClientMessage, com.hazelcast.core.HazelcastInstance, and com.hazelcast.client.impl.clientside.HazelcastClientInstanceImpl don't seem to have changed.

@darylrobbins
Copy link
Contributor Author

The main reason for the 3.11 version was that com.hazelcast.client.impl.HazelcastClientInstanceImpl moved to com.hazelcast.client.impl.clientside.HazelcastClientInstanceImpl. Is there a good way to handle this in the same instrumentation?

@darylrobbins
Copy link
Contributor Author

The reason for the ClientMessageInstrumentation is that the operationName field of the ClientMessage is not exposed as a setter until 4.0. Is there any way to access a private member of a field from an instrumentation?

@richardstartin
Copy link
Member

The main reason for the 3.11 version was that com.hazelcast.client.impl.HazelcastClientInstanceImpl moved to com.hazelcast.client.impl.clientside.HazelcastClientInstanceImpl. Is there a good way to handle this in the same instrumentation?

Since it implements com.hazelcast.core.HazelcastInstance, you could match it with namedOneOf("com.hazelcast.client.impl.clientside.HazelcastClientInstanceImpl", "com.hazelcast.client.impl.HazelcastClientInstanceImpl") and reference it as com.hazelcast.core.HazelcastInstance wherever it's required in Java code. If any implementation methods not on that interface are relied on, forget about it and keep the instrumentations separate.

@richardstartin
Copy link
Member

The reason for the ClientMessageInstrumentation is that the operationName field of the ClientMessage is not exposed as a setter until 4.0. Is there any way to access a private member of a field from an instrumentation?

Yes, but it would be complicated to do within dd-trace-java and what you have done feels like a reasonable trade-off to me. Thanks for explaining the rationale.

@darylrobbins
Copy link
Contributor Author

Since it implements com.hazelcast.core.HazelcastInstance, you could match it with namedOneOf("com.hazelcast.client.impl.clientside.HazelcastClientInstanceImpl", "com.hazelcast.client.impl.HazelcastClientInstanceImpl") and reference it as com.hazelcast.core.HazelcastInstance wherever it's required in Java code. If any implementation methods not on that interface are relied on, forget about it and keep the instrumentations separate.

Hmm, that's weird. I already had the namedOneOf() in there. I thought I had tried and it didn't liked it when I used HazelcastInstance as the parameter in the Advice, but seems to be working now, so maybe we can kill off the 3.11 instrumentation after all.

- Was able to get around the renaming of the HazelcastClientInstanceImpl by acting upon it as a HazelcastInstance, which thus allowed us eliminate the need for the 3.11 instrumentation
@richardstartin
Copy link
Member

This is looking good. There are a few minor comments about logging. Also please consider inlining any decorator methods which only do one thing to the span into the advice.

@tylerbenson
Copy link
Contributor

tylerbenson commented Apr 26, 2021

Do you have a good pattern for ignoring certain traces in the test?

You can do a few different things:

  1. Split up the earliest test from the latest. This is usually best if the result is significantly different. (Remove the block containing the dirName from the latestDepTest declaration.)
  2. Configure the test to dynamically detect which library version is being used and modify the test behavior accordingly. (Better if difference is minimal.)
  3. Make the assert section more flexible via removing/filtering out specific spans or making certain tags optional.

- Updates based on review comments
- Backport improved tests from 4.0 to 3.9
- Ensure that parallel runs of the hazelcast tests don't discover each other
- configServer() method was missing to allow custom server config from the test
- Enable reporting of Client.createProxy operations for consistency with 4.0
- Update tests to use random names to avoid any ordering dependency on tests -- the createProxy call only happens the first time a particular resource is requested
@darylrobbins
Copy link
Contributor Author

Do you have a good pattern for ignoring certain traces in the test?

You can do a few different things:

It works now for all supported versions. I removed a few tests that were particularly prone to this issue, for example the Reliable Topic one, which is implemented on top of the ring buffer.

@darylrobbins
Copy link
Contributor Author

Please pay particular attention to the enhancement I made to ListWriter for filtering out traces before they're written. This allowed me to filter out the noisy Client.* operations which can sometimes happen seemingly randomly in the tests.

Copy link
Member

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good, thanks

Copy link
Contributor

@tylerbenson tylerbenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main problem is the sourceCompatibility. I think it would have been obvious if our build was actually working.

dependencies {
compile project(':dd-java-agent:instrumentation:hazelcast')

compileOnly group: 'com.hazelcast', name: 'hazelcast-all', version: '3.8'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, this version should match the version labeled in the project name.

Also, please add a comment why testCompile isn't using 3.6.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything specific to async in your instrumentation advice classes (just tests and instrumentation matchers), so maybe this could be declared as compile for 3.6, but test with 3.8?

Copy link
Contributor

@tylerbenson tylerbenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor changes then I think we're good to merge. Nice work.

@tylerbenson
Copy link
Contributor

Also linking to this discussion regarding wire propagation for future reference: https://groups.google.com/g/hazelcast/c/i6FAz4LL1H8

- Address potential random test failures due to timing issues
@darylrobbins
Copy link
Contributor Author

I moved the instrumentations to the root level dd-java-agent/instrumentation directory since Gradle was complaining when they resided inside a hazelcast directory, which wasn't itself a Gradle module.

@tylerbenson
Copy link
Contributor

oh, right... I forgot about that. sounds good.

Copy link
Contributor

@tylerbenson tylerbenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Nice work.

@tylerbenson tylerbenson added this to the 0.80.0 milestone May 4, 2021
@tylerbenson tylerbenson merged commit 3efbb9c into DataDog:master May 4, 2021
@tylerbenson tylerbenson deleted the hazelcast branch May 4, 2021 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inst: others All other instrumentations tag: community Community contribution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants