Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For questions, doubts, guidances please use Discussions. Don't open a new Issue. #91

Closed
carlosedp opened this issue Aug 20, 2020 · 121 comments
Labels
question Further information is requested

Comments

@carlosedp
Copy link
Owner

carlosedp commented Aug 20, 2020

Since I don't have too many resources and time to address all questions regarding the deployments, the Issues section is a place to report problems or improvements to the stack.

This issue is a place where you can add a comment in case of a question where me or any community member can answer in a best effort manner.

If you deployed the monitoring stack and some targets are not available or showing no metrics in Grafana, make sure you don't have IPTables rules or use a firewall on your nodes before deploying Kubernetes.

If you don't want to receive further notifications, click "Unsubscribe" in the right bar, right above the participants list.

@carlosedp carlosedp added the question Further information is requested label Aug 20, 2020
@carlosedp carlosedp pinned this issue Aug 20, 2020
@YushchenkoAndrew
Copy link

YushchenkoAndrew commented Aug 21, 2020

I faced with an Issue, in which I couldn't open grafana and prometheus applications (link https://grafana.192.168.0.106.nip.io)

 $ curl http://prometheus.192.168.0.106.nip.io
 curl: (7) Failed to connect to prometheus.192.168.0.106.nip.io port 80: Connection refused
 $ curl https://prometheus.192.168.0.106.nip.io
 curl: (7) Failed to connect to prometheus.192.168.0.106.nip.io port 443: Connection refused

In the browser I got same Issue "Unable to connect".

I'm using k3s and I configured my master ip address
192.168.0.106 - it's a local ip address one of my workers node

I managed to successfully deploy all pods but I don't know how do I need to connect to the applications

 $ kubectl get ingress -n monitoring
 NAME                CLASS    HOSTS                               ADDRESS   PORTS     AGE
 alertmanager-main   <none>   alertmanager.192.168.0.106.nip.io             80, 443   54s
 grafana             <none>   grafana.192.168.0.106.nip.io                  80, 443   54s
 prometheus-k8s      <none>   prometheus.192.168.0.106.nip.io               80, 443   53s

 $ kubectl get pods -n monitoring
 NAME                                   READY   STATUS    RESTARTS   AGE
 prometheus-operator-6b8868d698-6xlvg   2/2     Running   0          14m
 arm-exporter-wmm6r                     2/2     Running   0          14m
 arm-exporter-67jpd                     2/2     Running   0          14m
 node-exporter-fbltt                    2/2     Running   0          14m
 alertmanager-main-0                    2/2     Running   0          14m
 arm-exporter-zhd5m                     2/2     Running   0          14m
 node-exporter-pzz6z                    2/2     Running   0          14m
 node-exporter-74fwt                    2/2     Running   0          14m
 grafana-7466bcc7c5-4hvpj               1/1     Running   0          14m
 kube-state-metrics-96bf99844-g9ssn     3/3     Running   0          14m
 prometheus-adapter-f78c4f4ff-kccbq     1/1     Running   0          14m
 prometheus-k8s-0                       3/3     Running   0          14m

Do you have any suggestions ?

@carlosedp
Copy link
Owner Author

You need to troubleshoot the access to your K3s cluster ingress that bridges the outside HTTP/HTTPS traffic to the pods.

Here is a reference: https://rancher.com/docs/k3s/latest/en/networking/

Have you deployed any application that uses HTTP(like NGINX, Apache) and been able to access it from your computer? It's similar to access Prometheus, Grafana and AlertManager.

@YushchenkoAndrew
Copy link

Yes, I created own blog site on JS but I didn't use ingress, I configured externalIP on Service, so... I will try to troubleshoot this issue.
Thanks for replay!

@YushchenkoAndrew
Copy link

I solve this issue. Thank for advice, at the end I just installed nginx, configured it and after that I was able to access to prometheus and grafana.
Thanks a lot!

@johnfried
Copy link

johnfried commented Sep 8, 2020

Love this project! I am unable to access prometheus.*.nip.io . I can access both Grafana and Alert manager. My ingress shows Prometheus, and is setup correctly. The one odd thing is when I look at all my pods in Monitoring ns; I do not have Prometheus-K8s (or something along those lines that I have seen in videos). The pods I have are Prometheus Adapter and Operator. I have Re-ran make vendor and deployed them. Same thing, again no errors anywhere. And also Prometheus-K8s has a service as I just looked. Does this make any sense? TIA

@exArax
Copy link

exArax commented Sep 15, 2020

Is there a way to deploy grafana and prometheus pods to the master node only ? Because sometimes they are deployed to workers

@carlosedp
Copy link
Owner Author

@exArax You need to set your master nodes as schedulable. Even this way, Kubernetes can deploy the pods to other nodes. If you need to set to a specific set of nodes, you need pod affinity.

@carlosedp
Copy link
Owner Author

Love this project! I am unable to access prometheus.*.nip.io . I can access both Grafana and Alert manager. My ingress shows Prometheus, and is setup correctly. The one odd thing is when I look at all my pods in Monitoring ns; I do not have Prometheus-K8s (or something along those lines that I have seen in videos). The pods I have are Prometheus Adapter and Operator. I have Re-ran make vendor and deployed them. Same thing, again no errors anywhere. And also Prometheus-K8s has a service as I just looked. Does this make any sense? TIA

Doesn't make too much sense since the pods are created by the operator. Re-check your cluster and re-deploy the stack.

@johnfried
Copy link

I redeployed and all is well, thank you

@exArax
Copy link

exArax commented Sep 15, 2020

@exArax You need to set your master nodes as schedulable. Even this way, Kubernetes can deploy the pods to other nodes. If you need to set to a specific set of nodes, you need pod affinity.

In case of grafana, I have to add the node affinity on the grafana-deployment.yaml that is inside the manifests folder, right?

@ClauNav
Copy link

ClauNav commented Sep 23, 2020

Hello Carlos,
I've the same issue as YushchenkoAndrew.
I'm noob on Kubernetes (I built this cluster to learn about it)
Screenshot from 2020-09-23 00-13-04

The same issue on Alertmanager/Prometheus.

Could you please help me?

Thanks.

@carlosedp
Copy link
Owner Author

@exArax You need to set your master nodes as schedulable. Even this way, Kubernetes can deploy the pods to other nodes. If you need to set to a specific set of nodes, you need pod affinity.

In case of grafana, I have to add the node affinity on the grafana-deployment.yaml that is inside the manifests folder, right?

Yes, since the jsonnet code doesn't have the pod affinity for this.

@carlosedp
Copy link
Owner Author

Hello Carlos,
I've the same issue as YushchenkoAndrew.
I'm noob on Kubernetes (I built this cluster to learn about it)
Screenshot from 2020-09-23 00-13-04

The same issue on Alertmanager/Prometheus.

Could you please help me?

Thanks.

You need to make sure your Kubernetes cluster has an Ingress controller and can expose the applications. Check this first with something like an NGINX pod with a simple Hello World web page.

@Nenad13
Copy link

Nenad13 commented Sep 24, 2020

Hi Carlos,
Very cool project indeed. I am running Kubernetes on Ubuntu 20.04.1 (master) and a few of Raspberry Pi 4 (nodes) with raspbian on them. I installed Kubernetes with ansible playbook and it works fine.
I made all changes in vars.jsonnet as you suggested. The problem is after. make deploy I am getting this error:

root@asus:~/cluster-monitoring# make deploy
echo "Deploying stack setup manifests..."
Deploying stack setup manifests...
kubectl apply -f ./manifests/setup/
The connection to the server localhost:8080 was refused - did you specify the right host or port?
make: *** [Makefile:37: deploy] Error 1

Do you have any suggestions?

This is the configuration:
kubectl config view
apiVersion: v1
clusters:

  • cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://x.x.x.x:6443
    name: default
    contexts:
  • context:
    cluster: default
    user: default
    name: default
    current-context: default
    kind: Config
    preferences: {}
    users:
  • name: default
    user:
    password: xxxxxxx
    username: xxxxxxxx

Thank you in advance!

@ClauNav
Copy link

ClauNav commented Sep 25, 2020

Hello Carlos,
I've the same issue as YushchenkoAndrew.
I'm noob on Kubernetes (I built this cluster to learn about it)
Screenshot from 2020-09-23 00-13-04
The same issue on Alertmanager/Prometheus.
Could you please help me?
Thanks.

You need to make sure your Kubernetes cluster has an Ingress controller and can expose the applications. Check this first with something like an NGINX pod with a simple Hello World web page.

Hello Carlos, You're right!
Thanks for taking your time replaying our newbies questions.

@riolaf05
Copy link

riolaf05 commented Oct 2, 2020

Hello, I have some problems with installation on K3s.

After the deploy operation, not all the services are installed:

image

Also, I am getting this error from the prometheus adapted container:

image

Do you have any idea on what can I I do? Thank you.

@carlosedp
Copy link
Owner Author

Hello again,

I want to add some authentication and authorization on prometheus.192.168.1.x.nip.io. Is there a way to do something like this prometheus.io/docs/guides/tls-encryption on the prometheus.192.168.1.x.nip.io ?

You need an ingress controller that supports authentication. look at

// // Example external ingress with authentication
. It works with Traefik but might need a couple changes.

@carlosedp
Copy link
Owner Author

Hello, I have some problems with installation on K3s.

After the deploy operation, not all the services are installed:

image

Also, I am getting this error from the prometheus adapted container:

image

Do you have any idea on what can I I do? Thank you.

Sorry, so many variables that it's hard to know. Start deploying a test application, check your node IPs and so on.

@robmit68
Copy link

robmit68 commented Oct 7, 2020

Hi Carlos,
i have followup the Cluster Monitoring deployment step by step and is running successfully, i am trying to to utilize Prometheus generator withing the node prometheus.192.168.XXX.XXX.nip.io to generate a Cisco SNMP scrape config and i am not able to access the node via ssh.
How can i access the node to add scrapes/targets to the Prometheus k3s node?
i am newbie in k3s and looking forward to your response.
Regards

Robe

@carlosedp
Copy link
Owner Author

Hi Carlos,
i have followup the Cluster Monitoring deployment step by step and is running successfully, i am trying to to utilize Prometheus generator withing the node prometheus.192.168.XXX.XXX.nip.io to generate a Cisco SNMP scrape config and i am not able to access the node via ssh.
How can i access the node to add scrapes/targets to the Prometheus k3s node?
i am newbie in k3s and looking forward to your response.
Regards

Robe

To collect metrics from SNMP you need the snmp_exporter. It's out of scope of this stack but take a look at another project I have here: https://github.com/carlosedp/ddwrt-monitoring. It's not in Kubernetes but I use it for SNMP.

@robmit68
Copy link

robmit68 commented Oct 8, 2020

Thank you Carlos

@exArax
Copy link

exArax commented Oct 8, 2020

Hello again,

I want to add some authentication on prometheus.192.168.1.x.nip.io. Is there a way to do something like this https://prometheus.io/docs/guides/basic-auth/ or https://www.openshift.com/blog/adding-authentication-to-your-kubernetes-web-applications-with-keycloak on the prometheus.192.168.1.x.nip.io ? I do not know which file I have to edit to add authentication to Prometheus.

@carlosedp
Copy link
Owner Author

As I mentioned before, the stack doesn't have anything built-in to provide authentication but you could change the ingresses to you your ingress controller (Traefik, HAProxy, etc) to add a layer of authentication.

Another option is similar to the post you linked to but that would require adding the keycloak sidecar to every pod.

@justinwagg
Copy link

Firstly, thanks for all the work you put into this @carlosedp 👏🏻. Prometheus seems to be running into an error panic: mmap: cannot allocate memory, have you run into this before? Deleting the pod fixes the issue, and I do have memory available. Also - what is the best way to add additional targets? Thanks again

root@pi-master:/home/pi# kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5+k3s1", GitCommit:"58ebdb2a2ec5318ca40649eb7bd31679cb679f71", GitTreeState:"clean", BuildDate:"2020-05-06T23:42:31Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/arm"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5+k3s1", GitCommit:"58ebdb2a2ec5318ca40649eb7bd31679cb679f71", GitTreeState:"clean", BuildDate:"2020-05-06T23:42:31Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/arm"}
root@pi-master:/home/pi#
root@pi-master:/home/pi# cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
root@pi-master:/home/pi#

@exArax
Copy link

exArax commented Oct 9, 2020

@carlosedp to change the ingresses I have to edit only the ingress-XXXX.yaml files or are there more files that I have to edit ?

@jontg
Copy link
Contributor

jontg commented Oct 9, 2020

Hey @carlosedp I was wondering if you have any interest in seeing loki ("Prometheus, but for logs") added to this tech stack? I was thinking of taking a stab at it this coming Monday

@thomazBDRI
Copy link

Hey @carlosedp really thanks for this stack i am using this in a few clusters that i have! One question though, how do i add a new job into prometheus? I didn't find anything describing the jobs!

@urbaned121
Copy link

Hey @carlosedp really thanks for this stack i am using this in a few clusters that i have! One question though, how do i add a new job into prometheus? I didn't find anything describing the jobs!

I came here with the same question...
prometheus-config-reloader pod has directory /etc/prometheus/config where is prometheus.yaml.gz file
but I have no idea hot to update it to add new job.
I can not find config map related to that file.
@carlosedp any advise? :)
Thanks!

@lauchokyip
Copy link

lauchokyip commented Jun 16, 2021

@carlosedp Kubernetes maintainers changed Ingress from extensions/v1beta1 to networking.k8s.io/v1
A quick and dirty way is to open Ingress-*.yaml and change the networking.k8s.io/v1 to extensions/v1beta1

However, after K8s releases 1.22, this method will fail

For long term fix:

Ingress-alertmanager.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: alertmanager-main
  namespace: monitoring
spec:
  tls:
  - hosts:
    -  alertmanager.192.168.1.15.nip.io
  rules:
  - host:  alertmanager.192.168.1.15.nip.io
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: alertmanager-main
            port:
              name: web

Ingress-grafana.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana
  namespace: monitoring
spec:
  tls:
  - hosts:
    - grafana.192.168.1.15.nip.io
  rules:
  - host: grafana.192.168.1.15.nip.io
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: grafana
            port:
              name: http

Ingress-promethus.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus-k8s
  namespace: monitoring
spec:
  tls:
  - hosts:
    - prometheus.192.168.1.15.nip.io
  rules:
  - host: prometheus.192.168.1.15.nip.io
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-k8s
            port: 
              name: web

Make sure you change the host

@hugobloem
Copy link

Hi there,

I am learning kubernetes, so I deployed a small cluster on some Raspberry Pis. However, I cannot reach my grafana instance (grafana.192.168.1.100.nip.io). I updated the ingress files following the post above, but to no avail.

Does anyone have any suggestions on what to do?

Cheers!

@exArax
Copy link

exArax commented Jun 21, 2021

Hi,

I configured K3s with MetalLB and for some reason the ingress now doesn't work, is there a way to make the prometheus-k8s-0 to use the hostNetwork: true option? I have added in the prometheus-prometheus.yaml in the spec section but it doesn't seem to work

@Fred0211
Copy link

Fred0211 commented Jul 5, 2021

Hello all,
Thank you for making this project, especially for ARM users! I'm learning/running microk8s and have managed to get all nodes deployed and running. microk8s.kubectl get ingress --all-namespaces outputs that the hosts should be up and running.

image

However i'm not able to connect in browser. I'm aware microk8s isn't officially supported so unsure if it is an issue with this version of kubernetes. This has happened with and without applying the fixes for ingress-*.yml files.

Thank you!

@pomcho555
Copy link

Thank you for this amazing project.

I've followed the tutorial here:
https://kauri.io/#deploy-prometheus-and-grafana-to-monitor-a-kube/186a71b189864b9ebc4ef7c8a9f0a6b5/a

But I've found a fatal error while make deploy.
I disabled the ingress in the vars.jsonnet but I still get the same error:

error validating "manifests/ingress-alertmanager.yaml": error validating data: [ValidationError(Ingress.spec.rules[0].http.paths[0].backend): unknown field "serviceName" in io.k8s.api.networking.v1.IngressBackend, ValidationError(Ingress.spec.rules[0].http.paths[0].backend): unknown field "servicePort" in io.k8s.api.networking.v1.IngressBackend]; if you choose to ignore these errors, turn validation off with --validate=false
error validating "manifests/ingress-grafana.yaml": error validating data: [ValidationError(Ingress.spec.rules[0].http.paths[0].backend): unknown field "serviceName" in io.k8s.api.networking.v1.IngressBackend, ValidationError(Ingress.spec.rules[0].http.paths[0].backend): unknown field "servicePort" in io.k8s.api.networking.v1.IngressBackend]; if you choose to ignore these errors, turn validation off with --validate=false
error validating "manifests/ingress-prometheus.yaml": error validating data: [ValidationError(Ingress.spec.rules[0].http.paths[0].backend): unknown field "serviceName" in io.k8s.api.networking.v1.IngressBackend, ValidationError(Ingress.spec.rules[0].http.paths[0].backend): unknown field "servicePort" in io.k8s.api.networking.v1.IngressBackend]; if you choose to ignore these errors, turn validation off with --validate=false

I have K3s version v1.20.7+k3s1
Thanks

I had a same error on k3s version v1.21.2+k3s1 (5a67e8dc) go version go1.16.4

There are 3 master node on EC2 and jetson nano(arm64) via the VPN network, and else are Raspberry Pi arm64.

$sudo kubectl get node
NAME               STATUS     ROLES                  AGE   VERSION
ip-xxx-xxx-xxx-xxx   Ready      control-plane,master   14d   v1.21.1+k3s1
ip-yyy-yyy-yyy-yyy    Ready      control-plane,master   14d   v1.21.1+k3s1
pi4-node2          Ready      <none>                 33m   v1.21.2+k3s1
jetson-master      Ready      control-plane,master   14d   v1.21.2+k3s1
pi4-node1          Ready      <none>                 37m   v1.21.2+k3s1

Thanks

@onedr0p
Copy link

onedr0p commented Jul 12, 2021

I don't see how this can work with later version of k3s, since they disabled metrics listening on any interface other than 127.0.0.1

k3s-io/k3s#425
k3s-io/k3s@4808c4e

@jevy
Copy link

jevy commented Jul 13, 2021

All of my prometheus exports are working except for these two. I've tried turning the TLS settings on/off in the vars.json with no change. Cluster: 5 RPi4. One exporter from the same IP works, but another one doesn't. Any idea what to check?
swappy-20210713_175415

@ToMe25
Copy link
Contributor

ToMe25 commented Jul 13, 2021

I think those two were always dead when I looked at those stats too, they might just not work with k3s?
I couldn't find anything that doesn't work with them down, nor any missing stats, tho so maybe its fine.

@exArax
Copy link

exArax commented Jul 15, 2021

Hi,

I want to monitor Minio metrics, which files I have to edit to make Prometheus scrape data from it?

@carlosedp
Copy link
Owner Author

Hi,

First of all, thank you for this great project.
I've successfully deployed the monitoring and now I would like to add another rpi to Prometheus scrapping. This rpi is outside the k8s cluster and already have node_exporter installed.
How to I had this node to prometheus?

This stack is built to monitor cluster nodes. New added nodes will be automatically monitored.

@carlosedp
Copy link
Owner Author

Hi,
Everyhting works fine. Thanks a lot for this cool repo!

One question. Where can I add additionalScrapeConfigs?

Best,
Gregor

Check the modules directory where I have additional scrapers that can be enabled in vars.json.
https://github.com/carlosedp/cluster-monitoring/tree/master/modules

@carlosedp
Copy link
Owner Author

@DrumSergio @lauchokyip yes, since Ingress API changed, old version was deprecated in 1.22. I'd welcome a PR :)

@carlosedp
Copy link
Owner Author

@ToMe25 @jevy the endpoints changed a bit in the past versions, might need adjusts in the k3s-overrides.jsonnet that creates the endpoints for them.

@braucktoon
Copy link

Hi,

Nice work!

Is it possible to add the speed-test-exporter (https://docs.miguelndecarvalho.pt/projects/speedtest-exporter) to this deployment? Seems like I need to add the exporter to the prometheus.yml but this file is buried in the prometheus-k8s-0 pod and not exposed. Any guidance if someone has already done it?

Thanks

@jamessewell
Copy link

Love the project - I'm wondering how I go about updating though?

I want to get AlertmanagerConfig - which isn't available in the vendored versions which are locked (I know you can use the secret).

Basically I just want to know if it's safe to bump versions and what I should be looking out for - I haven't had much luck so far.

@mrimp
Copy link

mrimp commented Sep 23, 2021

arm-exporter-rhtfh 0/2 ContainerCreating 0 27m arm-exporter-h4kr7 0/2 ContainerCreating 0 27m arm-exporter-g5hsd 0/2 ContainerCreating 0 27m node-exporter-fdxwm 2/2 Running 0 27m arm-exporter-b69l4 2/2 Running 0 27m arm-exporter-hfbjk 2/2 Running 0 27m node-exporter-8rm9g 2/2 Running 0 27m prometheus-adapter-585b57857b-b54k9 1/1 Running 0 27m node-exporter-ck5j9 2/2 Running 0 27m arm-exporter-mxhrb 2/2 Running 0 27m node-exporter-ct28t 2/2 Running 0 27m node-exporter-nh2cr 2/2 Running 0 27m node-exporter-7n5jc 2/2 Running 0 27m grafana-7bc4784744-q6xzp 1/1 Running 0 27m prometheus-operator-67755f959-gp44r 1/2 CrashLoopBackOff 9 27m kube-state-metrics-6cb6df5d4-sv9zv 2/3 CrashLoopBackOff 9 27m
raspberry pi 4 its a reinstall was working prior thx
ARM64

@assapir
Copy link

assapir commented Sep 23, 2021

Hey, I am not sure where my problem is.

I am unable to connect to any of the ingresses.
My DNS is pi.hole server, which yields no answer for grafana.192.168.1.103.nip.io, so I manually added it as a DNS record, and I think it now returns the right answer:

dig  +short prometheus.192.168.1.103.nip.io
grafana.192.168.1.103.nip.io.
192.168.1.103

but opening the browser on that address fails.
I don't think I have DNS rebinding protection enabled.

All pods seems to be running:

❯ kubectl -n monitoring get pods -o wide
NAME                                  READY   STATUS    RESTARTS   AGE   IP              NODE          NOMINATED NODE   READINESS GATES
node-exporter-hx9hs                   2/2     Running   2          49m   192.168.1.103   k3s-master    <none>           <none>
arm-exporter-r8xtd                    2/2     Running   2          50m   10.42.0.9       k3s-master    <none>           <none>
prometheus-operator-67755f959-vrwrk   2/2     Running   4          50m   10.42.1.12      k3s-node-01   <none>           <none>
arm-exporter-vnw7f                    2/2     Running   4          50m   10.42.1.11      k3s-node-01   <none>           <none>
prometheus-k8s-0                      3/3     Running   4          28m   10.42.1.13      k3s-node-01   <none>           <none>
node-exporter-dnrrb                   2/2     Running   4          49m   192.168.1.106   k3s-node-01   <none>           <none>
alertmanager-main-0                   2/2     Running   2          50m   10.42.3.7       k3s-node-03   <none>           <none>
node-exporter-fwk5c                   2/2     Running   2          49m   192.168.1.115   k3s-node-03   <none>           <none>
arm-exporter-9gqxz                    2/2     Running   2          50m   10.42.3.5       k3s-node-03   <none>           <none>
prometheus-adapter-585b57857b-xcp2j   1/1     Running   1          49m   10.42.3.6       k3s-node-03   <none>           <none>
arm-exporter-98npp                    2/2     Running   4          50m   10.42.2.10      k3s-node-02   <none>           <none>
kube-state-metrics-6cb6df5d4-przcx    3/3     Running   6          49m   10.42.2.8       k3s-node-02   <none>           <none>
node-exporter-p2w6r                   2/2     Running   4          49m   192.168.1.111   k3s-node-02   <none>           <none>
grafana-7bc4784744-68nzs              1/1     Running   2          49m   10.42.2.9       k3s-node-02   <none>           <none>

@mrimp
Copy link

mrimp commented Oct 1, 2021

Hey, I am not sure where my problem is.

I am unable to connect to any of the ingresses. My DNS is pi.hole server, which yields no answer for grafana.192.168.1.103.nip.io, so I manually added it as a DNS record, and I think it now returns the right answer:

dig  +short prometheus.192.168.1.103.nip.io
grafana.192.168.1.103.nip.io.
192.168.1.103

but opening the browser on that address fails. I don't think I have DNS rebinding protection enabled.

All pods seems to be running:

❯ kubectl -n monitoring get pods -o wide
NAME                                  READY   STATUS    RESTARTS   AGE   IP              NODE          NOMINATED NODE   READINESS GATES
node-exporter-hx9hs                   2/2     Running   2          49m   192.168.1.103   k3s-master    <none>           <none>
arm-exporter-r8xtd                    2/2     Running   2          50m   10.42.0.9       k3s-master    <none>           <none>
prometheus-operator-67755f959-vrwrk   2/2     Running   4          50m   10.42.1.12      k3s-node-01   <none>           <none>
arm-exporter-vnw7f                    2/2     Running   4          50m   10.42.1.11      k3s-node-01   <none>           <none>
prometheus-k8s-0                      3/3     Running   4          28m   10.42.1.13      k3s-node-01   <none>           <none>
node-exporter-dnrrb                   2/2     Running   4          49m   192.168.1.106   k3s-node-01   <none>           <none>
alertmanager-main-0                   2/2     Running   2          50m   10.42.3.7       k3s-node-03   <none>           <none>
node-exporter-fwk5c                   2/2     Running   2          49m   192.168.1.115   k3s-node-03   <none>           <none>
arm-exporter-9gqxz                    2/2     Running   2          50m   10.42.3.5       k3s-node-03   <none>           <none>
prometheus-adapter-585b57857b-xcp2j   1/1     Running   1          49m   10.42.3.6       k3s-node-03   <none>           <none>
arm-exporter-98npp                    2/2     Running   4          50m   10.42.2.10      k3s-node-02   <none>           <none>
kube-state-metrics-6cb6df5d4-przcx    3/3     Running   6          49m   10.42.2.8       k3s-node-02   <none>           <none>
node-exporter-p2w6r                   2/2     Running   4          49m   192.168.1.111   k3s-node-02   <none>           <none>
grafana-7bc4784744-68nzs              1/1     Running   2          49m   10.42.2.9       k3s-node-02   <none>           <none>

Try kubectl get ingress -n monitoring
if your getting this error it will never load.

Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress

still exist today

@assapir
Copy link

assapir commented Oct 4, 2021

How does one can create a custom scarper? I have my own metrics endpoint and I want Prometheus to scrape it, how can I add that?

@Cian911
Copy link
Contributor

Cian911 commented Oct 8, 2021

Hi,

Nice work!

Is it possible to add the speed-test-exporter (https://docs.miguelndecarvalho.pt/projects/speedtest-exporter) to this deployment? Seems like I need to add the exporter to the prometheus.yml but this file is buried in the prometheus-k8s-0 pod and not exposed. Any guidance if someone has already done it?

Thanks

I've created a new PR for just this here: #116 cc: @carlosedp

@Cian911
Copy link
Contributor

Cian911 commented Oct 8, 2021

It seems that the ksonnet-lib project which this project heavily utilises is no longer active and the project has since been archived.

I came across a newer project which has been kept up to date https://github.com/jsonnet-libs/k8s-libsonnet and gave it a shot porting this project over to using it, but with not much success (I'm not very familiar with jsonnet). By the looks of things it might be a bit of work to do it. Wondering what your thoughts are @carlosedp and if you might have the time to update?

@ToMe25
Copy link
Contributor

ToMe25 commented Oct 12, 2021

@carlosedp I have locally updated all dependencies, once to the latest kube-prometheus version, and once to version 0.8, which is the last one that supports Kubernetes 1.20.
I have tweaked it until it runs, and collects some data, however a lot of data seems missing and many of the dashboards aren't working because of it.
Fixing this is something I am unfortunately unable to do.

Should I push this as a Draft PR as a starting point for someone else to continue, or discard it since I couldn't get it to work?
Also if I do push it, which of the two versions should I push?

@radicalgeek
Copy link

@ToMe25 @jevy the endpoints changed a bit in the past versions, might need adjusts in the k3s-overrides.jsonnet that creates the endpoints for them.

@carlosedp Any idea what the end points need to look like? It seems they now only respond when accessed over localhost/127.0.0.1 and not over the cluster ip for the manager (even locally). I tried to change the IP to 127.0.0.1 in the endpoint manifest for kube control manager but it won't allow it. Any idea how we can fix this?

@exArax
Copy link

exArax commented Oct 22, 2021

Hi @carlosedp ,

I want to add in this stack a pod thati s actually an exporter for Kubevirt. The procedure requires to add a deployment file and service file for sure, but how would I configure Prometheus to scrape from kubevirt-prometheus-metrics? Which file should I edit?

Update If found that I don't need an exporter. Based on thishttps://kubevirt.io/user-guide/operations/component_monitoring/ I can integrate Kubevirt with Prometheus Operator. The thing is that I am applying to Kubevirt the namespace monitoring and the service account of the Operator (prometheus-operator) and I can't see the metrics that start with kubevirt_vmi.

@ToMe25
Copy link
Contributor

ToMe25 commented Oct 22, 2021

@ToMe25 @jevy the endpoints changed a bit in the past versions, might need adjusts in the k3s-overrides.jsonnet that creates the endpoints for them.

@carlosedp Any idea what the end points need to look like? It seems they now only respond when accessed over localhost/127.0.0.1 and not over the cluster ip for the manager (even locally). I tried to change the IP to 127.0.0.1 in the endpoint manifest for kube control manager but it won't allow it. Any idea how we can fix this?

@radicalgeek I actually locally tweaked modules/k3s-overrides.jsonnnet until the generated manifests looked perfect to me, and the result was this:
grafik
So unfortunately I don't think I know enough about this issue to fix this.

I compared the generated manifests with those for kube-dns, and couldn't find any difference that I could explain this with.
This is my modified modules/k3s-overrides.jsonnet file in case you want to play around with it:

modules/k3s-overrides.jsonnet

local utils = import '../utils.libsonnet';
local vars = import '../vars.jsonnet';
local k = import 'ksonnet/ksonnet.beta.4/k.libsonnet';
local service = k.core.v1.service;
local servicePort = k.core.v1.service.mixin.spec.portsType;

{
  prometheus+:: {
    kubeControllerManagerPrometheusDiscoveryService:
      service.new('kube-controller-manager-prometheus-discovery', { 'k8s-app': 'kube-controller-manager' }, servicePort.newNamed('http-metrics', 10252, 10252)) +
      service.mixin.metadata.withNamespace('kube-system') +
      service.mixin.metadata.withLabels({ 'k8s-app': 'kube-controller-manager' }) +
      service.mixin.spec.withClusterIp('None'),

    kubeSchedulerPrometheusDiscoveryService:
      service.new('kube-scheduler-prometheus-discovery', { 'k8s-app': 'kube-scheduler' }, servicePort.newNamed('http-metrics', 10251, 10251)) +
      service.mixin.metadata.withNamespace('kube-system') +
      service.mixin.metadata.withLabels({ 'k8s-app': 'kube-scheduler' }) +
      service.mixin.spec.withClusterIp('None'),
  },
}

I also tried modifying the endpoint manifests to reference the right k8s-app instead of removing them, but that didn't fix it either.

@exArax
Copy link

exArax commented Oct 29, 2021

Hi again,

I am deploying a VM through Kubevirt, which has cirros inside with node exporter. How can I add an endpoint for this exporter to Prometheus?

@carlosedp
Copy link
Owner Author

Hi all, I've just enabled Discussions on the project so we can have a proper forum for talking about it.

Please head to https://github.com/carlosedp/cluster-monitoring/discussions.

I won't close this issue so it can be a pointer for new people.
Thanks everyone!

Repository owner locked as resolved and limited conversation to collaborators Nov 10, 2021
@carlosedp carlosedp changed the title Questions, doubts, guidances goes into a comment here. Don't open a new Issue. For questions, doubts, guidances please use Discussions. Don't open a new Issue. Nov 10, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests