Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zipkin receiver in SkyWalking collector #1273

Merged
merged 39 commits into from Jun 18, 2018
Merged

Zipkin receiver in SkyWalking collector #1273

merged 39 commits into from Jun 18, 2018

Conversation

wu-sheng
Copy link
Member

Dear @peng-yongsheng @ascrutae @candyleer @liuhaoyang @adriancole @basvanbeek @jcchavezs,

I am working on this pull request, which is about receiving and analysis Zipkin trace. I paste the REAMDE doc of this module at here, to give everyone a brief:

Zipkin receiver

Zipkin receiver provides the feature to receive span data from Zipkin instrumented applications. SkyWalking backend provides
analysis, aggregation and visualization. So the user will not need to learn how SkyWalking auto instrumentation
agents(Java, .NET, node.js) work, or they don't want to change for some reasons, such as Zipkin integration has been completed.

Zipkin receiver is only an optional features in SkyWalking, even now it is an incubating feature.

Limits

As an incubating feature, it is a prototype. So it has following limits:

  1. Don't try to use SkyWalking native agents and Zipkin's libs in the same distributed system. Considering HEADERs of Zipkin and SkyWalking aren't shared/interoperable, their two will not propagate context for each other. Trace will not continue.
  2. Don't support cluster mode.
  3. Analysis based on trace will be finished in the certain and given duration. The default assumption is 2 min most. SkyWalking used more complex header and context to avoid this in analysis stage.

Right now(28th, May.), I just finished my codes with following features:

  1. Open and listen /api/v2/spans service
  2. Receive and deserialize spans from existed openzipkin/sleuth-webmvc-example.
  3. Use caffeine cache implementor to organize all spans into a trace.
  4. Use cache expired mechanism, assume trace can be analysis x(setting) mins after last span of the certain traceid reported.

I want to ask any one who has time, interest and is familiar with Zipkin format, especially @adriancole @basvanbeek @jcchavezs , to check whether I miss anything for a zipkin v2 json format.

The next I am going to do is:

After trace finished, analysis the whole trace(spans), transfer them to TraceSegment based on its tree structure and localendpoint/serviceName as application code.

For milestone, in beta2, I will definitely consider this as an incubating feature only. Wait for me or someone else to change the local cache implementor to Redis(cluster) based in 5.1.x series, maybe.

@wu-sheng wu-sheng added backend OAP backend related. feature New feature labels May 28, 2018
@wu-sheng wu-sheng added this to the 5.0.0-beta2 milestone May 28, 2018
@coveralls
Copy link

coveralls commented May 28, 2018

Coverage Status

Coverage increased (+0.5%) to 24.463% when pulling 8798e2e on zipkin/receiver-v1 into da78420 on master.

Copy link
Member

@peng-yongsheng peng-yongsheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the dequeue coding for the span cache?

*
*/

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate apache license header.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it


List<ZipkinSpan> spans = gson.fromJson(br, spanListType);
spans.forEach(span ->
CacheFactory.INSTANCE.get(config).addSpan(span)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INSTANCE variable was synchronized -> addSpan method is single threaded -> SpanJettyHandler is single threaded.

Is my understanding correct?

@wu-sheng
Copy link
Member Author

@peng-yongsheng This pr is Work In Progress. Open this to keep you posted. Transform codes are still on going.

When you process data in stack, you didn't bother concurrency issue when you use local variables. For cached trace and spans, #AddSpan method has a reentrancelock to protect.

Copy link

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for scoping this out. main comment is maybe re-use the zipkin main jar as it is less than 200KiB and has no dependencies. This can provide the model and codec operations. Maybe this can help? https://github.com/openzipkin/zipkin/blob/master/zipkin-junit/src/main/java/zipkin/junit/ZipkinDispatcher.java#L88

try {
BufferedReader br = req.getReader();

List<ZipkinSpan> spans = gson.fromJson(br, spanListType);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious.. instead of defining the same model here and also gson decoding.. why not depend on io.zipkin.zipkin2:zipkin and use SpanBytesDecoder? you will get proto3 format for free this way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriancole I will try to switch to it today. I just don't know, haha. Sorry.

@wu-sheng
Copy link
Member Author

thanks for scoping this out. main comment is maybe re-use the zipkin main jar as it is less than 200KiB and has no dependencies. This can provide the model and codec operations. Maybe this can help? https://github.com/openzipkin/zipkin/blob/master/zipkin-junit/src/main/java/zipkin/junit/ZipkinDispatcher.java#L88

@adriancole Done. I am moving on.

@wu-sheng
Copy link
Member Author

Where is the dequeue coding for the span cache?

I am not using any explicit or manual dequeue. I am using the expired mechanism. See CaffeineSpanCache#onRemoval. When we move the in memory buffer to Redis. this mechanism could stay, but only active the in-cache/remove-from-cache by root span, which will make the cluster mode works.

…nse text. All Apache 2.0 LICENSE files will be removed before beta2 release.
Copy link

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far so good

thermodynamicCountOfResponseTimeSteps: 40
# TODO: receiver_zipkin need to remove before merge, and only provide this in document.
receiver_zipkin:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to pass this information being structured? Can't you just pass the endpoint (e.g. endpoint: http://localhost:9411/api/v2/spans as we usually do for zipkin? cc @adriancole @wu-sheng

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which parts do you refer? receiver_zipkin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcchavezs I guess you mean, ip, port, context path, servlet mapping path, etc. That is because I don't use an URL util to analysis the endpoint string. Does Zipkin do this by providing an URL endpoint only? If this is a tradition, I can follow that.

@codefromthecrypt
Copy link

codefromthecrypt commented May 29, 2018 via email

@wu-sheng
Copy link
Member Author

wu-sheng commented Jun 15, 2018

I finish the prototype of this feature. Here is how people to test this:

  1. Download and build the project.
  2. Find application.yml in apm-collector-boot module, and open receiver_zipkin section in the yaml.
  3. Install ElasticSearch 5.x(5.5 or 5.6), change the elasticsearch.yml like this: https://gist.github.com/wu-sheng/4f869fb465bbd15f0852fa923357b4a6
  4. Find CollectorBootStartUp#main also in apm-collector-boot module, and startup.
  5. Follow build document to package the source codes in this branch.
  6. Find release in dist folder and enter bin folder to start up webapp by webappService.sh.
  7. Access the Sleuth example.

Right now, I use Caffeine as an expire checker, and it doesn't work well. So just ignore the following errors. Just access Spring Sleuth more than once. I usually set a timer in browser. I intend to replace Caffeine.

java.lang.NullPointerException: null
	at org.apache.skywalking.apm.collector.receiver.zipkin.provider.transform.SegmentBuilder.buildRef(SegmentBuilder.java:235) ~[classes/:?]
	at org.apache.skywalking.apm.collector.receiver.zipkin.provider.transform.SegmentBuilder.initSpan(SegmentBuilder.java:166) ~[classes/:?]
	at org.apache.skywalking.apm.collector.receiver.zipkin.provider.transform.SegmentBuilder.scanSpansFromRoot(SegmentBuilder.java:126) ~[classes/:?]
	at org.apache.skywalking.apm.collector.receiver.zipkin.provider.transform.SegmentBuilder.build(SegmentBuilder.java:89) ~[classes/:?]
	at org.apache.skywalking.apm.collector.receiver.zipkin.provider.transform.Zipkin2SkyWalkingTransfer.transfer(Zipkin2SkyWalkingTransfer.java:59) ~[classes/:?]
	at org.apache.skywalking.apm.collector.receiver.zipkin.provider.cache.caffeine.CaffeineSpanCache.onRemoval(CaffeineSpanCache.java:77) ~[classes/:?]
	at org.apache.skywalking.apm.collector.receiver.zipkin.provider.cache.caffeine.CaffeineSpanCache.onRemoval(CaffeineSpanCache.java:45) ~[classes/:?]
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$notifyRemoval$1(BoundedLocalCache.java:286) ~[caffeine-2.6.2.jar:?]
	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) [?:1.8.0_91]
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_91]
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_91]
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_91]
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:1.8.0_91]

@wu-sheng
Copy link
Member Author

I tested and only tested by basic sleuth example: https://github.com/openzipkin/sleuth-webmvc-example

So, I intend to merge this if this is passed. Here are my screenshots.

image
image
image
image

Then I will work on more about new expire mechanism, cluster mode and more zipkin examples.

@wu-sheng wu-sheng changed the title [WIP]Zipkin receiver in SkyWalking collector Zipkin receiver in SkyWalking collector Jun 15, 2018
@codefromthecrypt
Copy link

first cut works.. well done! I'll review closely more easily later as my workspace is all sorted etc

@wu-sheng
Copy link
Member Author

first cut works.. well done! I'll review closely more easily later as my workspace is all sorted etc

Yes. Do a lot of assumptions in the transfer. Need Zipkin community to confirm. Of course, we can improve them in the further test. I will try more cases :)

if (applicationCode != null) {
int applicationId = registerServices.getApplicationIDService().getOrCreateForApplicationCode(applicationCode);
if (applicationId != 0) {
registerServices.getOrCreateApplicationInstanceId(applicationId, applicationCode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use applicationCode instead of UUID will let all the instance in same application to be a single one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but do you have new idea? Zipkin has no specific tag for instance. No such concept.

ascrutae
ascrutae previously approved these changes Jun 16, 2018
@wu-sheng wu-sheng merged commit ecc0f94 into master Jun 18, 2018
@wu-sheng wu-sheng deleted the zipkin/receiver-v1 branch June 18, 2018 15:36
@codefromthecrypt
Copy link

codefromthecrypt commented Jun 19, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related. feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants