Skip to content

release-25.2: roachprod/roachtest: uniform storage capabilities#158809

Merged
golgeek merged 2 commits intocockroachdb:release-25.2from
golgeek:backport25.2-156821
Dec 18, 2025
Merged

release-25.2: roachprod/roachtest: uniform storage capabilities#158809
golgeek merged 2 commits intocockroachdb:release-25.2from
golgeek:backport25.2-156821

Conversation

@golgeek
Copy link
Contributor

@golgeek golgeek commented Dec 4, 2025

Backport 2/2 commits from #156821.

/cc @cockroachdb/release


Until now, each cloud provider implementation had its own capabilities with regards to storage options. GCE was the only first class citizen with the most available options exposed in roachtest.

This patch attempts to bridge the feature parity gap between the cloud providers (up to what's exposed by each providers), bringing support for the following options in roachprod and roachtest:

  • GCE:
    • local SSD
    • network disk size
    • network disk type (pd-standard, pd-ssd)
    • network disk count
    • RAID0 or multiple stores
  • AWS:
    • local SSD
    • network disk size
    • network disk throughput
    • network disk IOPS
    • NEW: network disk type (gp2, gp3, io1, io2, st1, sc1, standard)
    • NEW: network disk count
    • NEW: RAID0 or multiple stores
  • Azure:
    • local SSD
    • network disk size
    • NEW: network disk IOPS (ultra-disk only)
    • NEW: network disk type (standard-ssd, premium-ssd, premium-ssd-v2,
      ultra-disk)
    • NEW: network disk count
    • NEW: RAID0 or multiple stores
  • IBM:
    • network disk size
    • network disk IOPS
    • network disk type (general-purpose, 5iops-tier, 10iops-tier, custom)
    • network disk count
    • RAID0 or multiple stores

This patch also splits the disk setup startup script snippets, with:

  • a provider-specific way of detecting the attached disks
  • a common logic to mount, format and aggregate the disks

This allows to offer the following filesystems across the board in roachprod (and roachtest):

  • Ext4
  • ZFS
  • XFS
  • F2FS (available for AWS, Azure and GCP, pending newer kernel for IBM)
  • Btrfs

Epic: none
Closes: #123775
Informs: #146661, #113869
Release note: None

Release justification: Test only change

Until now, each cloud provider implementation had its own capabilities
with regards to storage options. GCE was the only first class citizen
with the most available options exposed in roachtest.

This patch attempts to bridge the feature parity gap between the cloud
providers (up to what's exposed by each providers), bringing support for
the following options in roachprod and roachtest:
- GCE:
  - local SSD
  - network disk size
  - network disk type (pd-standard, pd-ssd)
  - network disk count
  - RAID0 or multiple stores
AWS:
  - local SSD
  - network disk size
  - network disk throughput
  - network disk IOPS
  - NEW: network disk type (gp2, gp3, io1, io2, st1, sc1, standard)
  - NEW: network disk count
  - NEW: RAID0 or multiple stores
- Azure:
  - local SSD
  - network disk size
  - NEW: network disk IOPS (ultra-disk only)
  - NEW: network disk type (standard-ssd, premium-ssd, premium-ssd-v2,
    ultra-disk)
  - NEW: network disk count
  - NEW: RAID0 or multiple stores
- IBM:
  - network disk size
  - network disk IOPS
  - network disk type (general-purpose, 5iops-tier, 10iops-tier, custom)
  - network disk count
  - RAID0 or multiple stores

This patch also splits the disk setup startup script snippets, with:
- a provider-specific way of detecting the attached disks
- a common logic to mount, format and aggregate the disks

This allows to offer the following filesystems across the board in
roachprod (and roachtest):
- Ext4
- ZFS
- XFS
- F2FS (pending newer kernel for IBM)
- Btrfs

Epic: none
Closes: cockroachdb#123775
Informs: cockroachdb#146661, cockroachdb#113869
Release note: None
Prior to this patch, the supported machine families in GCE was partial.
Only t2a was considered ARM64, and local SSD was only supported for n1
and n2 families.

This patch brings machine family parsing and capabilities deduction for
all machine types available in GCE as of today.

Epic: none
Release note: None
@blathers-crl
Copy link

blathers-crl bot commented Dec 4, 2025

Thanks for opening a backport.

Before merging, please confirm that it falls into one of the following categories (select one):

  • Non-production code changes. Includes test-only changes, build system changes, etc.
  • Fixes for serious issues. Defined in the policy as correctness, stability, or security issues, data corruption/loss, significant performance regressions, breaking working and widely used functionality, or an inability to detect and debug production issues.
  • Other approved changes. These changes must be gated behind a disabled-by-default feature flag unless there is a strong justification not to.

Add a brief release justification to the PR description explaining your selection.

Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy.

All backports must be reviewed by the TL and EM for the owning area.

@blathers-crl blathers-crl bot added backport Label PR's that are backports to older release branches T-testeng TestEng Team labels Dec 4, 2025
@blathers-crl
Copy link

blathers-crl bot commented Dec 4, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl
Copy link

blathers-crl bot commented Dec 4, 2025

✅ PR #158809 is compliant with backport policy

Confidence: high
Backward compatible: true
Explanation: This pull request is compliant with the CockroachDB backport policy as it introduces changes exclusively in the non-production directory patterns specified for development tools and testing infrastructure. This exemption is identified based on the modifications to files within the pkg/cmd/roachprod/ and pkg/cmd/roachtest/ directories. These directories are explicitly listed under the Non-Production File Patterns in the backport policy as they pertain to development tools and testing environments specifically designed for the enhancement of testing capabilities and infrastructure without affecting the production codebase. No critical bug assessment or feature flag assessment is necessary as the changes do not affect the production environment and there is a release justification provided stating it is a 'Test only change' which aligns with the allowed exemptions for test-only modifications.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@blathers-crl blathers-crl bot added backport-failed backport-test-only Used to denote the backport has only non-production changes and removed backport-failed labels Dec 4, 2025
@golgeek
Copy link
Contributor Author

golgeek commented Dec 8, 2025

Uniform storage capabilities was requested by the storage team on 25.2.
This required a somewhat involved manual backport because we didn't backport a lot of other changes.

I triggered the following TC runs:

Confidence level seems high.

@golgeek golgeek marked this pull request as ready for review December 9, 2025 21:36
@golgeek golgeek requested a review from a team as a code owner December 9, 2025 21:36
@golgeek golgeek requested review from DarrylWong and srosenberg and removed request for a team December 9, 2025 21:36
Copy link
Contributor

@nameisbhaskar nameisbhaskar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@golgeek
Copy link
Contributor Author

golgeek commented Dec 18, 2025

TFTRs!

@golgeek golgeek merged commit b1cb3b7 into cockroachdb:release-25.2 Dec 18, 2025
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches backport-test-only Used to denote the backport has only non-production changes T-testeng TestEng Team v25.2.11

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants