Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to fetch snapscheduler metrics #116

Closed
prasanjit-enginprogam opened this issue Mar 30, 2021 · 11 comments · Fixed by #117
Closed

Not able to fetch snapscheduler metrics #116

prasanjit-enginprogam opened this issue Mar 30, 2021 · 11 comments · Fixed by #117
Assignees
Labels
bug Something isn't working

Comments

@prasanjit-enginprogam
Copy link

Describe the bug
Wanted to scrape the snapscheduler metrics from prometheus , metrics seems to be not working.

Steps to reproduce
Had vanilla install of snapscheduler

$ kubectl describe service/snapscheduler-metrics -n abcns
Name:              snapscheduler-metrics
Namespace:         cloudops
Labels:            name=snapscheduler
Annotations:       <none>
Selector:          name=snapscheduler
Type:              ClusterIP
IP Families:       <none>
IP:                172.20.219.85
IPs:               <none>
Port:              http-metrics  8383/TCP
TargetPort:        8383/TCP
Endpoints:         <none>
Port:              cr-metrics  8686/TCP
TargetPort:        8686/TCP
Endpoints:         <none>
Session Affinity:  None
Events:            <none>
$

Using port-forwarding, it's giving me a timeout error.

kubectl --kubeconfig ABC.config port-forward svc/snapscheduler-metrics -n cloudops 9100:8686
kubectl --kubeconfig ABC.config port-forward svc/snapscheduler-metrics -n cloudops 9100:8383

Expected behavior
I should be able to see metrics

Actual results
getting below error

error: timed out waiting for the condition

Please help

@prasanjit-enginprogam prasanjit-enginprogam added the bug Something isn't working label Mar 30, 2021
@JohnStrunk JohnStrunk self-assigned this Mar 30, 2021
@JohnStrunk
Copy link
Member

I can confirm the issue, at least w/ the Helm chart. Is that how you deployed?

@prasanjit-enginprogam
Copy link
Author

@JohnStrunk : Yes i used Helm to deploy.

@prasanjit-enginprogam
Copy link
Author

prasanjit-enginprogam commented Mar 30, 2021

@JohnStrunk : Let me know what version/tag should I use after your fix.

@JohnStrunk
Copy link
Member

Ok. I'll put this into a release in the next couple days.

@JohnStrunk
Copy link
Member

A new release has been published: https://artifacthub.io/packages/helm/backube-helm-charts/snapscheduler
The metrics port should be accessible.

@prasanjit-enginprogam
Copy link
Author

@JohnStrunk : are you going to add a new tag in https://quay.io/repository/cloudops/snapscheduler?tab=tags .. currently i can only see 1.1.1

@JohnStrunk
Copy link
Member

cloudops isn't an official source. I don't know what those images are.
The snapscheduler container repo is in the backube org on quay:

$ skopeo list-tags docker://quay.io/backube/snapscheduler
{
    "Repository": "quay.io/backube/snapscheduler",
    "Tags": [
        "1.0.0",
        "1.1.0",
        "1.1.1",
        "latest",
        "1.2.0"
    ]
}

v1.2.0 is from today, as is the helm chart v1.3.0 on artifacthub.

@prasanjit-enginprogam
Copy link
Author

Great thanks

@prasanjit-enginprogam
Copy link
Author

prasanjit-enginprogam commented Apr 5, 2021

@JohnStrunk: i was thinking if there is a way to add backup status in the metrics endpoint.. so that we can hook this up to grafana dashboard and hook up pagerduty(oncall) alerts based on it. what are your thoughts?

if you feel this makes sense and is something which is doable, I can create a new feature request for the same.

@JohnStrunk
Copy link
Member

If there are particular metrics your looking for, please open an issue describing them (or a discussion thread).

Right now, snapscheduler doesn't monitor the snapshots that it creates. It just creates the VolumeSnapshot object and walks away. I'm guessing you'd want something to ensure it becomes readyToUse within some timeframe, but I have no idea how to bound that (AWS can take a looong time for big volumes).

Simple metrics about how many snapshots were created are much easier, but I'm not sure how useful that is.

@prasanjit-enginprogam
Copy link
Author

@JohnStrunk: yes, correct, have some sort of a watcher pod that keeps track of backups happening and checks the readyToUse flag and exposes it as metrics from the endpoint which then can be scrapped by Prometheus or other observability tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants