Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zipkin receiver wrongly deduced whether the spans are compressed (gzipped) or not #2641

Closed
3 tasks
tommic81 opened this issue May 9, 2019 · 12 comments
Closed
3 tasks
Assignees
Labels
question End user question and discussion.
Milestone

Comments

@tommic81
Copy link

tommic81 commented May 9, 2019

Please answer these questions before submitting your issue.

  • Why do you submit this issue?
  • Question or discussion
  • [*] Bug
  • Requirement
  • Feature or performance improvement

Question

  • What do you want to know?

Bug

  • Which version of SkyWalking, OS and JRE?

SkyWalking: 6.1.0
JRE: Jdk8
OS: win10

zipkin client libs:
io.zipkin.zipkin2:zipkin:2.11.10
io.zipkin.reporter2:zipkin-reporter:2.7.13

  • Which company or project?
    ING Bank Śląski SA

  • What happen?

Zipkin allows to decide whether we want to send compressed or uncopressed spans.

It turned out that zipkin-receiver-plugin was not able to figure out that the spans were compressed. In other words compressed data was interpreted as uncompressed which
led to exception.

 java.lang.IllegalArgumentException: Malformed reading List<Span> from json
	at zipkin2.internal.JsonCodec.exceptionReading(JsonCodec.java:240) ~[zipkin-2.9.1.jar:?]
	at zipkin2.internal.JsonCodec.readList(JsonCodec.java:155) ~[zipkin-2.9.1.jar:?]
	at zipkin2.codec.SpanBytesDecoder$3.decodeList(SpanBytesDecoder.java:107) ~[zipkin-2.9.1.jar:?]
	at zipkin2.codec.SpanBytesDecoder.decodeList(SpanBytesDecoder.java:151) ~[zipkin-2.9.1.jar:?]
	at zipkin2.codec.SpanBytesDecoder$3.decodeList(SpanBytesDecoder.java:118) ~[zipkin-2.9.1.jar:?]
	at org.apache.skywalking.oap.server.receiver.zipkin.handler.SpanProcessor.convert(SpanProcessor.java:58) ~[zipkin-receiver-plugin-6.1.0.jar:6.1.0]
	at org.apache.skywalking.oap.server.receiver.zipkin.handler.SpanV2JettyHandler.doPost(SpanV2JettyHandler.java:70) [zipkin-receiver-plugin-6.1.0.jar:6.1.0]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) [javax.servlet-api-3.1.0.jar:3.1.0]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:841) [jetty-servlet-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:543) [jetty-servlet-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188) [jetty-server-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1239) [jetty-server-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168) [jetty-server-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:481) [jetty-servlet-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166) [jetty-server-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1141) [jetty-server-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) [jetty-server-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.server.Server.handle(Server.java:564) [jetty-server-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) [jetty-server-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) [jetty-server-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279) [jetty-io-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110) [jetty-io-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124) [jetty-io-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:672) [jetty-util-9.4.2.v20170220.jar:9.4.2.v20170220]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:590) [jetty-util-9.4.2.v20170220.jar:9.4.2.v20170220]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: zipkin2.internal.gson.stream.MalformedJsonException: Use JsonReader.setLenient(true) to accept malformed JSON at line 1 column 1 path $
	at zipkin2.internal.gson.stream.JsonReader.syntaxError(JsonReader.java:1568) ~[zipkin-2.9.1.jar:?]
	at zipkin2.internal.gson.stream.JsonReader.checkLenient(JsonReader.java:1409) ~[zipkin-2.9.1.jar:?]
	at zipkin2.internal.gson.stream.JsonReader.doPeek(JsonReader.java:593) ~[zipkin-2.9.1.jar:?]
	at zipkin2.internal.gson.stream.JsonReader.beginArray(JsonReader.java:343) ~[zipkin-2.9.1.jar:?]
	at zipkin2.internal.JsonCodec$JsonReader.beginArray(JsonCodec.java:57) ~[zipkin-2.9.1.jar:?]
	at zipkin2.internal.JsonCodec.readList(JsonCodec.java:149) ~[zipkin-2.9.1.jar:?]
	... 26 more 

I found in org.apache.skywalking.oap.server.receiver.zipkin.handler.SpanProcessor
what probably causes the problem.

private InputStream getInputStream(HttpServletRequest request) throws IOException {
        InputStream requestInStream;

        String headEncoding = request.getHeader("accept-encoding");
        if (headEncoding != null && (headEncoding.indexOf("gzip") != -1)) {
            requestInStream = new GZIPInputStream(request.getInputStream());
        } else {
            requestInStream = request.getInputStream();
        }

        return requestInStream;
    }

Header "accept-encoding" is tested whilst zipkin sends Content-Encoding: gzip in the request.


Requirement or improvement

  • Please describe about your requirements or improvement suggestions.

I think that checking "Content-Encoding" instead of "accept-encoding" should fix the issue.

@wu-sheng
Copy link
Member

wu-sheng commented May 9, 2019

Hi read #2525 (comment) . Adrian explain why this happens.

Btw, for curiousness, SkyWalking just receives Zipkin trace and analysis is not for production env. What do you expect for the next step? Maybe make it ready for production?

@wu-sheng wu-sheng added the question End user question and discussion. label May 9, 2019
@wu-sheng wu-sheng added this to the 6.2.0 milestone May 9, 2019
@wu-sheng wu-sheng self-assigned this May 9, 2019
@tommic81
Copy link
Author

tommic81 commented May 9, 2019

Hi ,

I had read adriancole's comments before created this ticket. I do not think that is the case.
I tested and debugged this issue and in my case, spans are not empty.
The problem occures when we are trying to read compressed data from "normal" input stream instead of GZIPInputStream.
I fixed it locally and seams to work.

"SkyWalking just receives Zipkin trace and analysis is not for production env. What do you expect for the next step? Maybe make it ready for production?"

We wanted to give Skywalking a try as it seams to be more advanced/rich solution than native zipkin-server.

Obviously, before going to production we want to make sure whether it works and meets our needs or not.

You mentioned "SkyWalking just receives Zipkin trace and analysis is not for production env".
Can you please share more info about the status of Skywalking<->Zipkin integration?
I am aware that analysis is not working yet (I switched needAnalysis: false).

So which features are available right now? What data can we review from gui perspective (dashboard, trace, alarm, topology)?

Maybe we should rather wait for further releases?

Tom

@wu-sheng
Copy link
Member

wu-sheng commented May 9, 2019

Can you please share more info about the status of Skywalking<->Zipkin integration?
So which features are available right now? What data can we review from gui perspective (dashboard, trace, alarm, topology)?

The status is, when you turn needAnalysis: false, there is no topology, dashboard, alarm. In our roadmap, we want to make the zipkin integration works in production env, but currently, no core maintainer is working on that, because they all decide to use our java(or other language) agent to monitor their system.

I keep waiting for some active contributor to work with us, to improve the experimental feature(zipkin trace analysis), make it product ready.

@wu-sheng
Copy link
Member

wu-sheng commented May 9, 2019

The problem occures when we are trying to read compressed data from "normal" input stream instead of GZIPInputStream.
I fixed it locally and seams to work.

Could you send a pull request to fix it directly?

@tommic81
Copy link
Author

Hi,

Do you know how far on the roadmap is Zipkin support? Is there really so much work left to do?
Maybe you can share some expected dates of releases.
We believe that Zipkin support would bring a great value to the project as Zipkin implements OpenTracing which is world wide tracing standard.

Have you got any idea how could we enable basic Skywalking features anyway?
Any possible workarounds?
Mayby we might develop some code that would transform/enrich zipkin messages into accepted by Skywalking format.

I would really appreciate your answers.

Tom

@wu-sheng
Copy link
Member

wu-sheng commented May 11, 2019

Hi @tmichnik1981

Sorry for late. I was busy on SkyWalking DevCon.

Right now, the major issues of Zipkin receiver, not production ready are

  1. Cluster mode is not supported. Because Zipkin sends trace through span by span. But the whole analysis needs the whole trace to build SkyWalking segments. One segment means all spans of one trace in one process(or thread). But in ZipkinSkyWalkingTransfer#doTransfer, it uses CacheFactory.INSTANCE, which use CaffeineSpanCache to cache the span with timeout mechanism. The CaffeineSpanCache use memory cache. In production and cluster mode, this cache should be a distributed cache env, and each trace timeout should be controlled by one OAP instance. Such as always let the OAP instance, who received the root span, to process the timeout and analysis the whole trace.
  2. CaffeineSpanCache#onRemoval is the entrance based on timeout to assume the whole trace has been received, which could begin to analysis in Zipkin2SkyWalkingTransfer. The transfer process in SegmentBuilder#build including the complex steps, which I am not sure ready and suitable for everyone.

I think you have read https://github.com/SkyAPM/zipkin-skywalking , actually, that document is wrote by me and @adriancole back to the back we were building this feature as step one.

So, you need to fix the above two issues.

Also, when we came into SkyWalking 6.x, we have a better solution than analysis of the whole trace.

The following suggestion should have better performance in trace analysis, but you need to understand SkyWalking more.
When needAnalysis=false today, the trace/span have been saved in storage already, and query has been supported. So, the things we left are topology and metrics, these two actually are the same thing. Build scopes. You could read MultiScopesSpanListener#parseEntry, MultiScopesSpanListener#parseExit and MultiScopesSpanListener#build. Especially the #build method. If you have read SkyWalking OAL document, you should know, SkyWalking analysis core receives the scopes.
In Zipkin receiver, we could only process span having remoteEndpoint, matching the client span <-> server span, instead of building the whole trace. Build the service, service instance, endpoint and relationship scopes. This could significantly reduce the memory cost of distributed cache in issue(1). But you still need that the cache.

@wu-sheng
Copy link
Member

Today's analysis process, in the easy scenario, really could transfer spans to SkyWalking trace. We used to demo it. Such as

I am welcome to have a deeper discussion on improving this feature. But please notice, as an APM, we have less flexible in span tags and the spans of each trace. Meaning, if you want to have the SkyWalking topology, metrics, and alarm, you need to have logically the same raw data.

@codefromthecrypt
Copy link

codefromthecrypt commented May 12, 2019 via email

@codefromthecrypt
Copy link

FYI I suggest creating a new topic for analysis related stuff as the gzip related concern is quite not the same :P

@wu-sheng
Copy link
Member

Yes, make sense. Created #2653 . Let's move the further discussion there. @tmichnik1981 @adriancole

@wu-sheng
Copy link
Member

@tmichnik1981 Any pull request from this issue?

@wu-sheng wu-sheng removed this from the 6.2.0 milestone May 25, 2019
@wu-sheng
Copy link
Member

wu-sheng commented Jul 2, 2019

Close as no update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question End user question and discussion.
Projects
None yet
Development

No branches or pull requests

3 participants