Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skywalking v9.3: no data in k8s service #10230

Closed
2 of 3 tasks
geniuslc11 opened this issue Jan 5, 2023 · 19 comments
Closed
2 of 3 tasks

skywalking v9.3: no data in k8s service #10230

geniuslc11 opened this issue Jan 5, 2023 · 19 comments
Assignees
Labels
backend OAP backend related. bug Something isn't working and you are sure it's a bug! TBD To be decided later, need more discussion or input.

Comments

@geniuslc11
Copy link

geniuslc11 commented Jan 5, 2023

Search before asking

  • I had searched in the issues and found no similar issues.

Apache SkyWalking Component

OAP server (apache/skywalking)

What happened

Hi skywalking team,
Recently I am researching the new features in skywalking v9.3,and found it has datas in cluster but no data in service when monitoring k8s node and endpoints, and found some error messages in OAP logs,please help to fix this issue,thanks.

I used showcase configurations to monitor it.

k8s version :1.20.* and 1.21.*
otel collector version: I tried version 0.50.0 and 0.68.0 etc, all failed.

2023-01-05 08:24:02,037 - org.apache.skywalking.oap.meter.analyzer.dsl.Expression - 89 [grpcServerPool-1-thread-1] ERROR [] - failed to run "(container_memory_working_set_bytes.tagNotEqual('container' , '').tagNotEqual('pod' , '').retagByK8sMeta('service' , K8sRetagType.Pod2Service , 'pod' , 'namespace').tagNotEqual('service' , '').sum(['cluster' , 'service' , 'pod'])).service(['cluster' , 'service'], '::', Layer.K8S_SERVICE)"
java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: io.kubernetes.client.openapi.ApiException: Content type "text/html; charset=utf-8" is not supported for type: class io.kubernetes.client.openapi.models.V1Pod
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:547) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:113) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:240) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2317) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2283) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache.get(LocalCache.java:3966) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3989) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4950) ~[guava-31.1-jre.jar:?]
        at org.apache.skywalking.oap.meter.analyzer.k8s.K8sInfoRegistry.findServiceName(K8sInfoRegistry.java:122) ~[meter-analyzer-9.3.0.jar:9.3.0]
        at org.apache.skywalking.oap.meter.analyzer.dsl.tagOpt.K8sRetagType$1.lambda$execute$0(K8sRetagType.java:40) ~[meter-analyzer-9.3.0.jar:9.3.0]
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_261]
        at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_261]
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_261]
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_261]
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:546) ~[?:1.8.0_261]
        at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) ~[?:1.8.0_261]
        at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:438) ~[?:1.8.0_261]
        at org.apache.skywalking.oap.meter.analyzer.dsl.tagOpt.K8sRetagType$1.execute(K8sRetagType.java:49) ~[meter-analyzer-9.3.0.jar:9.3.0]
        at org.apache.skywalking.oap.meter.analyzer.dsl.SampleFamily.retagByK8sMeta(SampleFamily.java:368) ~[meter-analyzer-9.3.0.jar:9.3.0]
        at org.apache.skywalking.oap.meter.analyzer.dsl.SampleFamily$retagByK8sMeta$9.call(Unknown Source) ~[?:?]
        at Script1.run(Script1.groovy:1) ~[?:?]
        at org.apache.skywalking.oap.meter.analyzer.dsl.Expression.run(Expression.java:78) ~[meter-analyzer-9.3.0.jar:9.3.0]
        at org.apache.skywalking.oap.meter.analyzer.Analyzer.analyse(Analyzer.java:133) ~[meter-analyzer-9.3.0.jar:9.3.0]
        at org.apache.skywalking.oap.meter.analyzer.MetricConvert.toMeter(MetricConvert.java:93) ~[meter-analyzer-9.3.0.jar:9.3.0]
        at org.apache.skywalking.oap.meter.analyzer.prometheus.PrometheusMetricConverter.toMeter(PrometheusMetricConverter.java:84) ~[meter-analyzer-9.3.0.jar:9.3.0]
        at org.apache.skywalking.oap.server.receiver.otel.otlp.OpenTelemetryMetricHandler.lambda$null$7(OpenTelemetryMetricHandler.java:128) ~[otel-receiver-plugin-9.3.0.jar:9.3.0]
        at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_261]
        at org.apache.skywalking.oap.server.receiver.otel.otlp.OpenTelemetryMetricHandler.lambda$export$8(OpenTelemetryMetricHandler.java:128) ~[otel-receiver-plugin-9.3.0.jar:9.3.0]
        at java.util.ArrayList.forEach(ArrayList.java:1259) [?:1.8.0_261]
        at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1082) [?:1.8.0_261]
        at org.apache.skywalking.oap.server.receiver.otel.otlp.OpenTelemetryMetricHandler.export(OpenTelemetryMetricHandler.java:110) [otel-receiver-plugin-9.3.0.jar:9.3.0]
        at io.opentelemetry.proto.collector.metrics.v1.MetricsServiceGrpc$MethodHandlers.invoke(MetricsServiceGrpc.java:246) [receiver-proto-9.3.0.jar:9.3.0]
        at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) [grpc-stub-1.49.0.jar:1.49.0]
        at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:354) [grpc-core-1.49.0.jar:1.49.0]
        at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) [grpc-core-1.49.0.jar:1.49.0]
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) [grpc-core-1.49.0.jar:1.49.0]
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) [grpc-core-1.49.0.jar:1.49.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_261]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_261]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
Caused by: java.util.concurrent.ExecutionException: io.kubernetes.client.openapi.ApiException: Content type "text/html; charset=utf-8" is not supported for type: class io.kubernetes.client.openapi.models.V1Pod
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:547) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:113) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:240) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2317) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2283) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache.get(LocalCache.java:3966) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3989) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4950) ~[guava-31.1-jre.jar:?]
        at org.apache.skywalking.library.kubernetes.KubernetesPods.findByObjectID(KubernetesPods.java:81) ~[library-kubernetes-support-9.3.0.jar:9.3.0]
        at org.apache.skywalking.oap.meter.analyzer.k8s.K8sInfoRegistry.lambda$new$14(K8sInfoRegistry.java:80) ~[meter-analyzer-9.3.0.jar:9.3.0]
        at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:169) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3533) ~[guava-31.1-jre.jar:?]
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2282) ~[guava-31.1-jre.jar:?]
        ... 37 more

cluster

service

What you expected to happen

there should collecting datas in k8s service.

How to reproduce

steps refered by link https://skywalking.apache.org/docs/main/v9.3.0/en/setup/backend/backend-k8s-monitoring/ and https://github.com/apache/skywalking-showcase/tree/main/deploy/platform/kubernetes/feature-kubernetes-monitor

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@geniuslc11 geniuslc11 added the bug Something isn't working and you are sure it's a bug! label Jan 5, 2023
@wu-sheng
Copy link
Member

wu-sheng commented Jan 5, 2023

The demo.skywalking.apache.org is using this showcase without change. So, generally, showcase should be good.

From the logs, this should be key

Caused by: java.util.concurrent.ExecutionException: io.kubernetes.client.openapi.ApiException: Content type "text/html; charset=utf-8" is not supported for type: class io.kubernetes.client.openapi.models.V1Pod

Maybe your k8s is too old?

@wu-sheng wu-sheng added the backend OAP backend related. label Jan 5, 2023
@geniuslc11
Copy link
Author

Thanks so much for your quick reply.
Our k8s version are 1.20.6 ,1.21.7 and 1.17.4 ,they all failed with the same errors.
Maybe v1.17.4 is too old ,but I think the versions above 1.20.* is not too old.

@wu-sheng
Copy link
Member

wu-sheng commented Jan 5, 2023

We are using 1.23.12-gke.1600 at Google Cloud, and everything works. Also, we tested AWS EKS, it works well too.

@wu-sheng
Copy link
Member

wu-sheng commented Jan 5, 2023

If you check the k8s website, 1.22 is the oldest version, I am afraid.

https://kubernetes.io/docs/home/supported-doc-versions/

@geniuslc11
Copy link
Author

geniuslc11 commented Jan 5, 2023

Just now I tried k8s v1.24.6 ,still failed with the same error,it is very strange.please check the following screenshots.

1\ k8s version
k8s

2\kube state metrics collecting datas.
ksm

3\ skywalking ui
sk1
sk2

4\oap logs
sk3

@geniuslc11
Copy link
Author

This is my otel collector configurations.
cm

@wu-sheng wu-sheng added the TBD To be decided later, need more discussion or input. label Jan 5, 2023
@wu-sheng
Copy link
Member

What is the vendor of your k8s? Maybe you could ask them why they show differently from us.
Or, you could use KinD to test with official k8s locally. We did, and it works.

@wu-sheng wu-sheng closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2023
@geniuslc11
Copy link
Author

We install k8s cluster by RKE in rancher v2.6,I will test KinD in our environment.

@liyongxian
Copy link

liyongxian commented Feb 21, 2023

I meet the same prolem.
Skywalking Version: 9.3.0
OTel Version: opentelemetry-collector:0.71.0
Kubernetes Version:
image
K8s is installed by kubeadm.
Deployed by https://github.com/apache/skywalking-showcase/tree/main/deploy/platform/kubernetes/feature-kubernetes-monitor.
image
Thanks.

@liyongxian
Copy link

@geniuslc11 KIND Is OK ??

@innerpeacez
Copy link
Member

innerpeacez commented Mar 10, 2023

I also encountered this problem, and finally found out that the clusterRole was missing during install. I tried to fix it. apache/skywalking-helm#111. but I still find that oap often throws this exception. And at least found that the service data is incomplete.

I tried to debug this code KubernetesServices.list() got the complete service list.
and I see the missing service through kube-state-metrics:8080/metrics I don't see the service in SkyWalking UI (nor in ES storage).

I'm now confused as to where could be the problem. The strange thing is that some services have it, and some services don't.

opentelemetry-collector:0.72.0
kube-state-metrics:v2.8.0
skywalking:9.3.0
k8s:v1.22.10
kubernetes-java:1.16.0 https://github.com/kubernetes-client/java/wiki/2.-Versioning-and-Compatibility#compatibility

Can you give me some ideas, let me continue to investigate this problem. @wu-sheng @kezhenxu94

@wu-sheng
Copy link
Member

You could try showcase, and compare the version and deployment difference.

@innerpeacez
Copy link
Member

I also encountered this problem, and finally found out that the clusterRole was missing during install. I tried to fix it. apache/skywalking-kubernetes#111. but I still find that oap often throws this exception. And at least found that the service data is incomplete.

Sorry, I didn't find it and continue to throws this exception. But the service data is indeed incomplete.

You could try showcase, and compare the version and deployment difference.

I'll switch versions and give it a try. thx.

@wu-sheng
Copy link
Member

Including versions of apl components, otel collector was breaking many times.

@innerpeacez
Copy link
Member

The problem of missing service is due to cache wrong, independent of opentelemetry-collector and kube-state-metrics version.(At least the version I tested). see #10568

@innerpeacez
Copy link
Member

innerpeacez commented Mar 21, 2023

Two reasons:

  1. Lack of clusterRole during install causes oap to call k8s api without permission.
  2. Some K8s services are missing due to service cache wrong.

@geniuslc11
Copy link
Author

geniuslc11 commented May 11, 2023

@innerpeacez ,so I need upgrade the skywalking version to v9.5? when will skywalking v9.5 be released?
Or is there any workaround now to fix it?

@geniuslc11
Copy link
Author

@innerpeacez BTW,my skywalking OAP is intalled by binary package in centos 7,not depolyed in K8S

@innerpeacez
Copy link
Member

innerpeacez commented May 12, 2023

BTW,my skywalking OAP is intalled by binary package in centos 7,not depolyed in K8S

@geniuslc11
AFAIK, this requires your kubectl on centos to have corresponding permissions to access the k8s api. ref: https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/ . What resource permissions oap needs can be found here: https://github.com/apache/skywalking-kubernetes/blob/master/chart/skywalking/templates/oap-clusterrole.yaml

If you want to deploy sw in k8s, it is also possible to deploy (9.3.0|9.4.0) with the master branch https://github.com/apache/skywalking-kubernetes. Of course, this bug #10568 in (9.3.0|9.4.0) has not been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related. bug Something isn't working and you are sure it's a bug! TBD To be decided later, need more discussion or input.
Projects
None yet
Development

No branches or pull requests

5 participants