Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trident CSI iscsi volumes go to read-only at ~50% usage #555

Open
thomasmeeus opened this issue Mar 24, 2021 · 4 comments
Open

Trident CSI iscsi volumes go to read-only at ~50% usage #555

thomasmeeus opened this issue Mar 24, 2021 · 4 comments
Labels

Comments

@thomasmeeus
Copy link

Hi,

we're using Trident CSI to provision iscsi volumes on a Netapp appliance.
After some time this volume goes read-only because the volume is full, probably due to insufficient space for metadata.

We see that Trident uses the same sizing when provisioning an iscsi lun on a volume. e.g. 40GB iscsi lun on a 40GB volume.
Is there an option we can specify to Trident to automatically provision 5% more than requested & reserve it for the metadata?

We already opened a Netapp ticket for this, but we were redirected to Github issues for further assistance as according to Netapp support it's more like a Trident issue.

We were pointed in the direction of enabling space allocation on the LUNS. According to #136 this is enabled by default and we were able to confirm this by looking at the properties of a LUN created by Trident.

When creating this issue my eye also caught #554. The behaviour seems to be the same. Could we be facing the same issue?

Feedback from our storage team was that Trident should allow specifying LUN sizing different from the volume size via e.g. a percentaul setting. But I don't think this is possibel at the moment.

The situation is that I myself am an Openshift admin and I know a little bit of storage. Our storage team has lots of knowledge about Netapp, but isn't used to using Trident or Openshift.
Currently we're both out of options on how to pinpoint the exact issue and how we fix this behaviour for our customers.

Let me know what kind of debugging info is required. Any help is appreciated.

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Trident version: netapp/trident:20.07.0 (CSI)
  • Trident installation flags used: none
  • Kubernetes orchestrator: OKD 4.5.0-0.okd-2020-09-18-202631
  • Kubernetes enabled feature gates: [e.g. CSINodeInfo]
  • NetApp backend types: ontap select

To Reproduce
Create an iscsi pv and write file to it. At about 50% usage the volume will go into read-only. We've seen this behaviour only at 1 customer and can't pinpoint the actual cause.

Expected behavior
Be able to use the full requested persistent volume capacity

@gnarl
Copy link
Contributor

gnarl commented Mar 24, 2021

Hi @thomasmeeus,

We've talked to ONTAP about this issue previously. There isn't actually a set percentage that we can increase the volume by where you wouldn't still run into this issue. At one time ONTAP recommended creating a SAN volume 50% larger than the requested size in order to avoid the space allocation issue. This kind of practice isn't acceptable for customers that expect that the actual volume size match the requested volume size in the PVC.

If you are running into this issue at ~50% usage then the most likely problem is that snapshots of that volume are taking up most of the usable space. Every Trident volume has the capability of being expanded by issuing a resize request in Kubernetes. Requested volume sizes should accommodate the amount of data the application needs to write along with an appropriate amount of space for the snapshot policy. You may want to review the snapshot policy being used to make sure that snapshots are being periodically deleted.

Cloud Insights can also be used to monitor volume capacity and volumes that are nearing the volume capacity limit can be expanded.

I hope this helps. Let us know your thoughts.

@thomasmeeus
Copy link
Author

Hi,

Thanks for the reply,
I forgot to mention that we disabled snapshot policies because we don't use them at the moment and were already afraid that they consumed disk space behind our backs.
I doublechecked again and no disk space is being consumed by snapshots.

We can resize the volume via a resize request, but it seems we're only able to use about 50% of the entered storage.
E.g. a volume has been requested in Kubernetes with a size of 40GB. When we write ~20GB of data towards that volume it will go offline. (because the metadata is full?)

As I understand the situation you say there is no real fix for this issue then?

@balaramesh
Copy link
Contributor

balaramesh commented Apr 5, 2021

@thomasmeeus can you share your backend configuration? Have you enforced the snapshotReserve config option?

@benzaidfoued
Copy link

i've similar issues on v22.10 running on OCP 4.12.x ,@thomasmeeus do you have any updates on your issue ?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants