Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: karavi-metrics-powerscale pod gets an segmentation violation error during start #1019

Closed
N1K68 opened this issue Oct 25, 2023 · 9 comments
Assignees
Labels
area/csm-observability Issue pertains to the CSM Observability module type/bug Something isn't working. This is the default label associated with a bug issue.
Milestone

Comments

@N1K68
Copy link

N1K68 commented Oct 25, 2023

Bug Description

After installing the CSM CSI driver for PowerScale version 2.8.0 the karavi-metrics-powerscale pod gets an segmentation violation error during start.

Logs

time="2023-10-25T13:21:52Z" level=warning msg="endpoint port is empty, use default EndpointPort: 8080"
time="2023-10-25T13:21:52Z" level=warning msg="IsiPath is empty, use defaultIsiPath: /ifs/data/csi"
time="2023-10-25T13:21:52Z" level=warning msg="IsiVolumePathPermissions is empty, use defaultIsiPermission: 0777"
time="2023-10-25T13:21:52Z" level=info msg="setting client options" authtype=1 endpoint="https://sust070a-01-325.sebank.se:8080" insecure=true username=d827724601@corp1.ad1.seb.net verbose=0
I1025 13:21:52.572983 1 leaderelection.go:248] attempting to acquire leader lease karavi/karavi-metrics-powerscale...
I1025 13:22:10.053555 1 leaderelection.go:258] successfully acquired lease karavi/karavi-metrics-powerscale
time="2023-10-25T13:22:12Z" level=info msg="function duration" duration="34.859µs" function=gatherClusterPerformanceStatsMetrics
time="2023-10-25T13:22:12Z" level=info msg="function duration" duration="3.668µs" function=pushClusterPerformanceStatsMetrics
time="2023-10-25T13:22:12Z" level=info msg="function duration" duration=112.605258ms function=ExportClusterPerformanceMetrics
time="2023-10-25T13:22:22Z" level=info msg="function duration" duration="10.908µs" function=gatherClusterCapacityStatsMetrics
time="2023-10-25T13:22:22Z" level=info msg="function duration" duration="3.436µs" function=pushClusterCapacityStatsMetrics
time="2023-10-25T13:22:22Z" level=info msg="function duration" duration=105.993103ms function=ExportClusterCapacityMetrics
time="2023-10-25T13:22:22Z" level=info msg="function duration" duration=8.757663ms function=ExportQuotaMetrics
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x125bc3b]

goroutine 1 [running]:
github.com/dell/csm-metrics-powerscale/internal/k8s.VolumeFinder.GetPersistentVolumes({{0x19c80e0, 0xc000011f80}, {0xc00003f780, 0x1, 0x1}, 0xc00045ce80}, {0x0?, 0x0?})
/viewsvn/lovip262/jenkins/workspace/CSM/csm-observability/Release_Activities_Observability/csm-metrics-powerscale/internal/k8s/volume_finder.go:84 +0x51b
github.com/dell/csm-metrics-powerscale/internal/service.(*PowerScaleService).ExportQuotaMetrics(0xc000090600, {0x19de150?, 0x25702e0})
/viewsvn/lovip262/jenkins/workspace/CSM/csm-observability/Release_Activities_Observability/csm-metrics-powerscale/internal/service/service.go:145 +0x142
github.com/dell/csm-metrics-powerscale/internal/entrypoint.Run({0x19de150, 0x25702e0}, 0xc0000905a0, {0x19ccfe0?, 0xc000143da0}, {0x19d2880, 0xc000090600})
/viewsvn/lovip262/jenkins/workspace/CSM/csm-observability/Release_Activities_Observability/csm-metrics-powerscale/internal/entrypoint/run.go:146 +0x722
main.main()
/viewsvn/lovip262/jenkins/workspace/CSM/csm-observability/Release_Activities_Observability/csm-metrics-powerscale/cmd/metrics-powerscale/main.go:141 +0x974

Screenshots

image

Additional Environment Information

The configuration we are using are:
csm-isilon.txt

Steps to Reproduce

Install the CSM CSI driver for PowerScale version 2.8.0 and enable the modules, metrics-powerscale, otel-collector , topology.

Expected Behavior

I'm expects the karavi-metrics-powerscale pod to be able start up.

CSM Driver(s)

CSIDRIVERTYPE: isilon CONFIGVERSION : v2.8.0

Installation Type

Dell Container Storage Modules 1.3.0

Container Storage Modules Enabled

isilon v2.8.0
resiliency v1.7.0
observability v1.6.0

Container Orchestrator

OpenShift 4.12.31

Operating System

RHEL 8.6

@N1K68 N1K68 added needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue. labels Oct 25, 2023
@csmbot
Copy link
Collaborator

csmbot commented Oct 25, 2023

@N1K68: Thank you for submitting this issue!

The issue is currently awaiting triage. Please make sure you have given us as much context as possible.

If the maintainers determine this is a relevant issue, they will remove the needs-triage label and respond appropriately.


We want your feedback! If you have any questions or suggestions regarding our contributing process/workflow, please reach out to us at container.storage.modules@dell.com.

@shefali-malhotra shefali-malhotra added the area/csm-observability Issue pertains to the CSM Observability module label Oct 26, 2023
@rajendraindukuri rajendraindukuri removed the needs-triage Issue requires triage. label Oct 26, 2023
@rajendraindukuri
Copy link
Collaborator

Hi @N1K68 Can you please provide info on if there are any unbound PVs in your env that belong to PowerScale?

@N1K68
Copy link
Author

N1K68 commented Oct 27, 2023

Hi,

No there is no unbound or bound PVs in the env that belong to PowerScale.

If I set the logging level to trace it seems however to complain on a an PV that "not provisioned by a CSI driver":

2023-10-27T14:32:05.223568701Z time="2023-10-27T14:32:05Z" level=info msg="function duration" duration=112.869542ms function=ExportClusterPerformanceMetrics
2023-10-27T14:32:05.236332314Z time="2023-10-27T14:32:05Z" level=debug msg="The PV, nfs.isilon.registry , is not provisioned by a CSI driver\n"
2023-10-27T14:32:05.236380996Z time="2023-10-27T14:32:05Z" level=info msg="function duration" duration=12.777632ms function=ExportQuotaMetrics
2023-10-27T14:32:05.239458779Z panic: runtime error: invalid memory address or nil pointer dereference
2023-10-27T14:32:05.239458779Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x125bc3b]
2023-10-27T14:32:05.239458779Z

@rajendraindukuri
Copy link
Collaborator

rajendraindukuri commented Oct 31, 2023

@N1K68 This issue reported that is "not provisioned by a CSI driver" is fine. It is checking for all the volumes and ignoring the pv s that are not provisioned by CSI driver.

We tried to replicate the scenario by making the ReclaimPolicy as "Retain" and deleting the PVC after creation but even then we are not able to replicate the scenario as even after deletion, it shows the PVC claim in PV output and also it metrics pod is running as expected. See the screenshot below:

image

Can you please help give the output of following command, we will try to replicate the PV scenarios accordingly to root cause the scenario

kubectl get pv -ALL
kubectl get pvc -ALL

Thanks.

@N1K68
Copy link
Author

N1K68 commented Nov 3, 2023

@rajendraindukuri

Hi, I have now uploaded the output of the get commands that you asked for.

pv-all.txt
pvc-all.txt

@rajendraindukuri
Copy link
Collaborator

Hi @N1K68 ,
Thanks for sharing the info. As per the details shared, I see few PV s in "Available" status and are not bound by any PVC.
Can you please check by deleting these PV s or by binding them to any PVC s?

image

Thanks
Rajendra

@N1K68
Copy link
Author

N1K68 commented Nov 7, 2023

Hi @rajendraindukuri,

I have now bound the PVs that was in status "Available" to PVCs and the karavi-metrics-powerscale pod now seems to be able to start. I'm not sure what conclusion to draw from this however. Since there will be PV's in status "Available" in all of our clusters.

Regards // Niklas

@rajendraindukuri
Copy link
Collaborator

rajendraindukuri commented Nov 8, 2023

Hi @N1K68 ,
This is a bug. Will create an internal ticket to handle this scenario and correct the same.

Thanks

@shefali-malhotra
Copy link
Collaborator

Issue is fixed in latest code . Will be available as part of CSM 1.9 release .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csm-observability Issue pertains to the CSM Observability module type/bug Something isn't working. This is the default label associated with a bug issue.
Projects
None yet
Development

No branches or pull requests

5 participants