Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBM Cloud image variant should be 120GB #931

Closed
rvanderp3 opened this issue Aug 23, 2021 · 18 comments
Closed

IBM Cloud image variant should be 120GB #931

rvanderp3 opened this issue Aug 23, 2021 · 18 comments
Labels

Comments

@rvanderp3
Copy link

Describe the bug
Requesting the IBM Cloud image variant virtual size be 120GB to match the minimum requirement mentioned in the OpenShift and OKD documentation.

Reproduction steps
Steps to reproduce the behavior:
1.
2.
3.

Expected behavior
The image should be 120GB to match the OpenShift/OKD minimum requirements.

Actual behavior
The image size is 100GB per coreos/coreos-assembler#2041

System details

  • Fedora CoreOS
  • RH CoreOS

Ignition config

Additional information

@dustymabe dustymabe added the meeting topics for meetings label Aug 23, 2021
@cgwalters
Copy link
Member

To clarify, is this somehow specific to OpenShift, or is it really "the default IBM Cloud guidance is bumped to 120G" (e.g. traditional Fedora Cloud, Ubuntu etc. should also bump to 120G)?

@rvanderp3
Copy link
Author

To clarify, is this somehow specific to OpenShift, or is it really "the default IBM Cloud guidance is bumped to 120G" (e.g. traditional Fedora Cloud, Ubuntu etc. should also bump to 120G)?

This is scoped to OpenShift nodes provisioned in the IBM cloud adhering to the OpenShift minimum requirements.

@cgwalters
Copy link
Member

Hmm. I guess I'd repeat my argument from here then: coreos/coreos-assembler#2041 (comment)

But I think in the future if someone comes along and says "It really should be 150GB" or whatever we should instead tell them "configure it at the infrastructure side" - the cloud should have that knob and their tooling they use to provision the cloud (Terraform/ansible/openshift-machine-api-operator/etc.) should support it.

(The future is now)

So, I think the better approach is to change openshift-install to configure this.

Ultimately IMO, there is no single "right" disk size, in the same way there's not a single RAM size or number of vCPUs. People will want to run smaller clusters, and tune things down. Or, for larger clusters they want things bigger.

And the right place to configure these sizes is in something like openshift-install's install-config.yaml, or equivalent for UPI. And for non-OpenShift FCOS use cases, whatever they use to provision nodes, e.g. scripting a CLI like aws or Terraform/Ansible/whatever.

@jeffnowicki
Copy link

jeffnowicki commented Aug 24, 2021

I would assert that the value should at least meet RH OCP minimum storage requirements. Not suggesting that we keep 'bumping' it up. Rather, the value should at least align with RH requirements.

https://docs.openshift.com/container-platform/4.8/installing/installing_bare_metal/installing-bare-metal.html#minimum-resource-requirements_installing-bare-metal

100% agree that if the user wants a larger boot volume size, that could perhaps be an installer capability. That being said, the 'default' should at least meet RH OCP requirements. IBM Cloud VPC has recently added support to parse that value and provision the boot volume size accordingly.

@bgilbert
Copy link
Contributor

Fedora CoreOS has use cases beyond OCP/OKD, and those may need far less space than an OpenShift cluster. In general, we ship minimum-size images and encourage users to size their boot disk to meet their needs. IBM Cloud documents a minimum size of 100 GB, so that's what we ship on that platform.

@jeffnowicki
Copy link

jeffnowicki commented Aug 24, 2021

@jeffnowicki
Copy link

jeffnowicki commented Aug 24, 2021

Motivation behind this is supporting OCP IPI on IBM Cloud. In general, the minimum storage requirements (from RH) are stated to be 120gb - https://docs.openshift.com/container-platform/4.8/installing/installing_platform_agnostic/installing-platform-agnostic.html#minimum-resource-requirements_installing-platform-agnostic

@bgilbert thanks for the IBM Cloud reference... and as you stated, IBM Cloud reference indicates a different minimum for a generic custom Linux image (which may not be appropriate for RHCOS).

In the end, I'm looking for a 'reconciliation' such that whatever the value is, both RH and IBM 'officially' support it.

I don't want to see an issue arise and RH engineers telling client, your deployment is not supported (from RH) due to boot volume size not meeting RH minimum requirement.

@Prashanth684
Copy link

Honestly that 120GB number was used way back when we did UPI installations on baremetal systems. I am not sure where that number came from, but in our testing all this time, i have never even seen half of it being used. I guess in case where the logs fill up disk etc..it would be justified, but normally i haven't seen a need for 120G.

Also in case of baremetal deploys, the partition is grown dynamically during the install and the metal image itself is not sized for 120G.

@bgilbert
Copy link
Contributor

@jeffnowicki It is possible that RHEL CoreOS images should have a different default, since those images are exclusively intended to support OCP. (Though, as @cgwalters said in #931 (comment), it would be better for OCP to size the disk appropriately at provisioning time.) However, IMO that's a separate discussion from what Fedora CoreOS should ship.

@relyt0925
Copy link

So I think the only confusion here is we need a statement of support from Openshift that at a minimum for IBM Cloud 100GB boot disks are supported. We already use this today across our openshift offerings so that shouldn't be a problem. To be specific these doc references need to change:
https://docs.openshift.com/container-platform/4.8/installing/installing_bare_metal/installing-bare-metal.html#minimum-resource-requirements_installing-bare-metal

If they cannot be changed by default it makes sense to me for this image to be baked by default to the supported size of OCP. But I think getting the doc updated with a caveat for IBM at a minimum will do the trick

@relyt0925
Copy link

Note we already do this across all IBM Cloud Openshift offerings today in case there are concerns on potential impacts.

@cgwalters
Copy link
Member

Hmm. In the IBM Cloud VPC console, it isn't letting me change the 100G size for the default centos image. Is that intentional?

It looks to me like the API supports it in https://cloud.ibm.com/apidocs/vpc#create-instance with volume_attachments? Or is that only for secondary volumes?

IOW if automation (e.g. custom Terraform/Ansible or openshift-installer IPI) wanted to change the default image size from 100G, it'd need to create a copy of that disk at the desired size, and then use it for instances?

@relyt0925
Copy link

relyt0925 commented Aug 25, 2021

It's detected off the custom image that gets imported @cgwalters . You just need to build a custom image with a 120 GB boot disk
(qemu-img resize PATH 120 GB)
Then push to cos
Then create image pointing to COS
Then boot machine with image

@travier
Copy link
Member

travier commented Aug 25, 2021

Fedora CoreOS has use cases beyond OpenShift/OKD so the default image size is not controlled by the OCP/OKD requirements. I would argue that we should not bump the default image size beyond the minimum supported by the platform or we would create unnecessary costs for non OCP/OKD users.

I find it really strange that there would be no feature in the IBM Cloud to specify a different size for the boot disk when importing an image. If this is really the case then this should really be taken to them to discuss as I don't think we will be the only ones impacted by this issue: anyone else using images from other distributions would be impacted too.

@miabbott
Copy link
Member

The conversation seems to be pointing to an update to the OCP docs that reduces the documented minimums to 100GB for all platforms or perhaps specially note that the 100GB minimum applies to IBM Cloud.

I've created an issue in the OCP docs repo (openshift/openshift-docs#35793) for further discussion.

If we don't believe any changes will be made to the disk image for FCOS/RHCOS, I believe we can close this issue.

@cgwalters
Copy link
Member

It is possible that RHEL CoreOS images should have a different default, since those images are exclusively intended to support OCP.

I'd personally like to keep FCOS and RHCOS aligned though, since small deltas like this add up to maintenance pain.

@cgwalters
Copy link
Member

I have a related question, and this is probably a good place to ask.

Basically, would it be fair to say that in IBM Cloud VPC, it's expected that most systems use the boot disk just for "operating system stuff (binaries, journal, etc.)", and e.g. if you want to add something like database storage, the it's effectively required to create a separate block device and set it up as a separate filesystem mount in the OS?

Obviously in other IaaS clouds (GCP/AWS/etc.) this can also be a good pattern, but it's not really required because they make it easy to have a nearly arbitrary size for the root volume. And in some cases, clearly "lifecycle binding" this data and the OS is more ergonomic (e.g. you don't want the data to outlive the instance).

However, there are advantages to such a split because e.g. one can allocate block devices with different levels of performance for such data. In OpenShift unfortunately our support for splitting "OS stuff" into separate block devices is mediocre. It's absolutely supported to do so for e.g. /var/lib/containers, but e.g. openshift/machine-config-operator#1720 makes it more painful than it has to be.

@dustymabe
Copy link
Member

The conversation seems to be pointing to an update to the OCP docs that reduces the documented minimums to 100GB for all platforms or perhaps specially note that the 100GB minimum applies to IBM Cloud.

I've created an issue in the OCP docs repo (openshift/openshift-docs#35793) for further discussion.

Fixed in openshift/openshift-docs#36226

If we don't believe any changes will be made to the disk image for FCOS/RHCOS, I believe we can close this issue.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants