Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jaeger and istio setup. #3447

Closed
1 of 4 tasks
dbones opened this issue Sep 10, 2019 · 14 comments
Closed
1 of 4 tasks

jaeger and istio setup. #3447

dbones opened this issue Sep 10, 2019 · 14 comments
Assignees
Labels
question End user question and discussion.
Milestone

Comments

@dbones
Copy link

dbones commented Sep 10, 2019

Please answer these questions before submitting your issue.

  • Why do you submit this issue?
  • Question or discussion
  • Bug
  • Requirement
  • Feature or performance improvement

Question

Do you have an example of setting up Skywalking with jaeger (for tracing) and istio (for metrics)

If possible can you show using simple k8s deployments? i am new to helm (sorry)

my current progress:

I believe i have setup elasticsearch correctly from looking at the compose and helm setups.

I do not think i have understood the the Istio instructions, these seem to increase the count of endpoints on the main dash, as follows, however i cannot see any metrics:

image

finally, im not quite sure what to do with the jaeger agent.

my attempt sofar:
https://gist.github.com/dbones/4d87efc1dfa43e2cad38cfd17f219f4f

reason for my interest

as an FYI, I am trying to prepare a demo on how to embrace CNCF (opentracing and istio) and get a great APM experience.

@wu-sheng wu-sheng self-assigned this Sep 10, 2019
@wu-sheng wu-sheng added the question End user question and discussion. label Sep 10, 2019
@wu-sheng wu-sheng added this to the 6.5.0 milestone Sep 10, 2019
@wu-sheng
Copy link
Member

Hi, from your screenshot, I think the most possible reason is, you haven't changed the timezone in the right bottom of page. In docker/k8s, the timezone is UTC-0 as default, but the UI will set your local timezone as default, so you only could see endpoint, no service, then no other metrics.

as an FYI, I am trying to prepare a demo on how to embrace CNCF (opentracing and istio) and get a great APM experience.

Interesting, where do you prepare to present?

@dbones
Copy link
Author

dbones commented Sep 10, 2019

I wanted to present this at an internal company summit (we have people for around the world attending)

Ah I see about the timezone, the UI is set to UTC +1 (I changed the viewable timespan to show data from yesturday) and left left skywalking enabled for a few hours

From what i can see, it has:

  • recognised that I have several services, but we have no data
  • not picked up data from my separate jaeger instance (I do not think i have set this correctly)
  • has not recognised the RabbitMq, Postgres and Redis instances
  • shows data for the istio tracing component, but no other dashboard is populated.

Here is a screenshot:

image

@dbones
Copy link
Author

dbones commented Sep 10, 2019

if it helps, the app i am running is a simple shop

image

I wanted to be able to use the following setup to show how observability requires Metrics, Tracing and Logging as follows:

image

@wu-sheng
Copy link
Member

recognised that I have several services, but we have no data

I think you didn't set mixer right. The only thing reports traffic today is mixer itself. From the UI I saw. You could open debug log, and OAP log should be able to show which service metrics are sending.
I believe, you set the OAP right, at least.

not picked up data from my separate jaeger instance (I do not think i have set this correctly)

We are using jaeger grpc service, so jaeger agent required. Do you deploy that and make it working?

has not recognised the RabbitMq, Postgres and Redis instances

That because Istio mixer reports thing in http and https only.

shows data for the istio tracing component, but no other dashboard is populated.

What is istio tracing component?

@wu-sheng
Copy link
Member

wu-sheng commented Sep 10, 2019

Our .net core agent may could get more info and tracing. If you has interests, could check it, I am not sure does it have all plugins you required.

But maybe you just want observability in mesh solution, then you could pass the agent solution.

@dbones
Copy link
Author

dbones commented Sep 10, 2019

thanks for the quick responses :)

i have looked at the following:

We are using jaeger grpc service, so jaeger agent required.

replaced the jaeger collector with the agent, like the following:

image

this is not working, do i need to configure something with the OAP server?

"msg":"Could not create collector proxy","error":"could not create collector proxy, address is missing"

if so how do I configure this in the deployement yaml? (can i pass it in as a env var)

I think you didn't set mixer right. The only thing reports traffic today is mixer itself. From the UI I saw. You could open debug log, and OAP log should be able to show which service metrics are sending. I believe, you set the OAP right, at least.

I installed Istio via the helmchart, installed Elastic, OAP and the UI (v6.1) and then applied the Istio yaml files as mentioned above (from the docs)

i am getting an warn in the OAP:

graphql.execution.SimpleDataFetcherExceptionHandler -10192306 [qtp583015088-91] WARN [] - Exception while fetching data (/cpmC) : IDs can't be null

it is now showing the app conponents along with 2 istio components (is there anything I can do to filter out these)

image

this is what my skywalking namespace looks like:

image

note that the agent I have just added one of them, for now, and the collector has been turned off.

Our .net core agent may could get more info and tracing. If you has interests, could check it, I am not sure does it have all plugins you required.

correct, it looks like some plugins are missing

@dbones
Copy link
Author

dbones commented Sep 10, 2019

I noticed that i have the agent incorrectly, I have updated to set the opa:14250 via the args now.....

my latest exception from the Jaeger agent is:

{"level":"error","ts":1568135502.7668867,"caller":"grpc/reporter.go:70","msg":"Could not send spans over gRPC","error":"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.43.28.245:14250: connect: connection refused\"","stacktrace":"github.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc.(*Reporter).send\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc/reporter.go:70\ngithub.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc.(*Reporter).EmitBatch\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/grpc/reporter.go:50\ngithub.com/jaegertracing/jaeger/cmd/agent/app/reporter.(*MetricsReporter).EmitBatch\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/metrics.go:77\ngithub.com/jaegertracing/jaeger/thrift-gen/jaeger.(*agentProcessorEmitBatch).Process\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:138\ngithub.com/jaegertracing/jaeger/thrift-gen/jaeger.(*AgentProcessor).Process\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:112\ngithub.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).processBuffer\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:115"}

the connection is refused, I am not sure if I am missing a setup somewhere

  • agent with --reporter.grpc.host-port=oap:14250 arg
  • added service entry to clusterIP for port 14250 TCP for the OAP
  • i have not configured the main OAP app in any way, I am double checking any docs to see if I need to set anything for enable this module

This is the latest setup I have for the SW deployment:
https://gist.github.com/dbones/4d87efc1dfa43e2cad38cfd17f219f4f

@wu-sheng
Copy link
Member

this is not working, do i need to configure something with the OAP server?
"msg":"Could not create collector proxy","error":"could not create collector proxy, address is missing"

Ignore this, this is just a UI bug, query data when it should not do. There is nothing harm there. Have been removed in the latest release.

it is now showing the app conponents along with 2 istio components (is there anything I can do to filter out these)

That is based on what istio mixer sent. We don't support filter at OAP side.


From the screenshot, you should have the metrics and topology today, right?

For jaeger, do you open jaeger receiver?

#receiver_jaeger:
default:
  gRPCHost: ${SW_RECEIVER_JAEGER_HOST:0.0.0.0}
  gRPCPort: ${SW_RECEIVER_JAEGER_PORT:14250}

We used to have this as yaml mode, but I think it is missed in helm mode.
https://github.com/apache/skywalking-kubernetes/blob/master/archive/6/6.0.0-GA/oap/01-config.yml#L23

@wu-sheng
Copy link
Member

wu-sheng commented Sep 10, 2019

I found this in docker config, https://github.com/apache/skywalking-docker/tree/master/6/6.3/oap#xxx_enabled

Please enable SW_RECEIVER_JAEGER_ENABLED.

@wu-sheng
Copy link
Member

By reading the doc, I submitted this, #3449. Jaeger receiver will have issues in that docker entry shell.

Please read the issue and documents I refer there, you may need to package a new docker image.

@dbones
Copy link
Author

dbones commented Sep 10, 2019

I have have done the following:

  • just enable jaeger to the 6.1 image, this look to try and add some trace info (but under an unknown service)
  • forked the docker image, and hacked the jeager-elasticsearch (but i did not take into account that i needed to also refactor the elasticsearch env vars as well)
  • updated my setup to 6.3, and loaded all the files in the config folder from a configmap (updated the https://gist.github.com/dbones/4d87efc1dfa43e2cad38cfd17f219f4f to show my current setup), i have guessed the contents form cat-ing the contents from a running instance

my last attempt yielded in the following error:

org.apache.skywalking.oap.server.starter.OAPServerStartUp -13300 [main] ERROR [] - metrics-name can't be null

if the team can do a hotfix it would be appreciated?

else if possible it would be great to do a video chat, ensure I have not made a silly error, and I can provide you remote access to my test platform (the platform i will have up for a little bit longer before i shut it down)

@wu-sheng
Copy link
Member

I could have a video chat with you, but I am not a k8s fun :) I could guide you about how SkyWalking should work with its configuration.

@dbones
Copy link
Author

dbones commented Sep 10, 2019

epic, i have sent an email to the email address on your github profile.

@wu-sheng
Copy link
Member

According to our video chat, all setup should be good for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question End user question and discussion.
Projects
None yet
Development

No branches or pull requests

2 participants