Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sapstartsrv is going [defunc] after sap_host_exporter is active #31

Closed
pirat013 opened this issue Mar 26, 2020 · 1 comment · Fixed by #39
Closed

sapstartsrv is going [defunc] after sap_host_exporter is active #31

pirat013 opened this issue Mar 26, 2020 · 1 comment · Fixed by #39
Assignees
Labels
bug Something isn't working

Comments

@pirat013
Copy link

Hi,

I experience massive impact on my 3 node cluster running 5 SAP SID's. If all exporters are active before the SAP systems are running the sapstartsrv will go in defunct and this blocks all other start procedures of the cluster.
I was able to to reproduce that with stop sap_host_exporter. That looks for me that we have an issue with the sapstartsrv in case to many request are happen.
First of all I'll get in touch with SAP to verify the situation. Second we may should thing if we reduce the data which we are collecting or using a different method. Maybe the socket direct instead of a http request.

@stefanotorresi stefanotorresi self-assigned this Apr 14, 2020
@stefanotorresi
Copy link
Collaborator

stefanotorresi commented Apr 14, 2020

Just to keep track of the developments about this:

This is 99.9% an upstream issue: some SAPControl methods will cause the start service to crash, especially if they return a 500 response.
The problem is very difficult to reproduce, it most probably involves some internal race condition in the start service, and it replicates in many different environments, with or without suse-cluster-connector involved. There is nothing inherently wrong with the exporter, other than the assumption that sending HTTP requests wouldn't cause an entire SAP cluster to go haywire.

#36 partly works around this issue, and implementing #22 should completely avoid it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants