Skip to content

✨ Rosa Config implementaiton #5499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PanSpagetka
Copy link
Contributor

@PanSpagetka PanSpagetka commented May 21, 2025

Based on proposal #5451
Adding RosaRoleConfig API with implementation. that should create Account/Operator roles and OIDC config/provider necessary to create ROSA cluster.

We need to move RosaMachinePoolAutoScaling definition to controlplane, because otherwise there would be circular dependency.

What type of PR is this?
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:


@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 21, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from Ankitasw and serngawy May 21, 2025 13:37
@k8s-ci-robot k8s-ci-robot added needs-priority size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 21, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @PanSpagetka. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@PanSpagetka PanSpagetka marked this pull request as draft May 21, 2025 13:37
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2025
@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch from 11dac0b to 6056618 Compare May 28, 2025 13:43
@serngawy serngawy mentioned this pull request May 29, 2025
5 tasks
@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch 6 times, most recently from 1587db4 to 9121ec2 Compare June 9, 2025 13:40
@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch 3 times, most recently from 07b73a3 to 097252f Compare June 16, 2025 11:47
Dockerfile Outdated
@@ -28,12 +28,17 @@ WORKDIR /workspace
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
COPY ./rosa /workspace/rosa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we adding this ? what is the rosa file

Dockerfile Outdated
# Cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
RUN --mount=type=cache,target=/root/.local/share/golang \
--mount=type=cache,target=/go/pkg/mod \
go mod download

# RUN go mod download
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this line

PROJECT Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just add the RosaRoleConfig item without changing the order of other items

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this file ?

@@ -38,6 +39,7 @@ patchesStrategicMerge:
- patches/webhook_in_awsmanagedcontrolplanes.yaml
- patches/webhook_in_eksconfigs.yaml
- patches/webhook_in_eksconfigtemplates.yaml
#- patches/webhook_in_rosaroleconfigs.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you need to uncomment this line

}
}

if scope.RosaRoleConfig.Status.OIDCID == "" {
Copy link
Contributor

@serngawy serngawy Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you should check/set the RosaRoleConfig condition first , then get the oidc using OCM client if it is not exist then create it.
Same applied for account-roles and operator-roles

}
}

err = r.deleteOperatorRoles(ocmClient, awsClient, scope.RosaRoleConfig.Spec.AccountRoleConfig.Prefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to delete the operator roles before the oidc-provider ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, a changed it so it matches reverse creation order.

return ocmClient.DeleteOidcConfig(oidcConfigID)
}

type reporter struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this to another file , better to be under pkg/.../rosa

@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch from 097252f to 0c9fa93 Compare June 18, 2025 11:40
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 19, 2025
@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch 3 times, most recently from b3aded3 to 23fa4cb Compare June 24, 2025 11:29
@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch from 23fa4cb to 2515be2 Compare July 2, 2025 08:09
@k8s-ci-robot k8s-ci-robot requested a review from serngawy July 15, 2025 10:55
@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch 2 times, most recently from 5997fac to 3f9bd3a Compare July 21, 2025 07:46
@@ -179,6 +183,24 @@ func (r *ROSAControlPlane) validateExternalAuthProviders() *field.Error {
return nil
}

func (r *ROSAControlPlane) validateRosaRoleConfig() *field.Error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the logic here is valid, hasDirectRoleFields will be true even if one condition is true and others are false. Ex; r.Spec.OIDCID can be true but all others fields can be false that case hasDirectRoleFields will be true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended, but we need both && and || of all values to make behavior you described in comment below.

r.Spec.RolesRef.NetworkARN != "" || r.Spec.RolesRef.KubeCloudControllerARN != "" || r.Spec.RolesRef.NodePoolManagementARN != "" ||
r.Spec.RolesRef.ControlPlaneOperatorARN != "" || r.Spec.RolesRef.KMSProviderARN != ""

if hasRosaRoleConfigRef && hasDirectRoleFields {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make the logic as below;
if RosaRoleConfigRef is set, we use the roleConfigRef to get all roles (ignore all other role fields even if set) just log warning.
if RosaRoleConfigRef not set and all other Role fields are set, we use the roles fields
if RosaRoleConfigRef not set and some Roles fields are missing we raise error
if RosaRoleConfigRef not set and all Roles fields are missing we raise error


conditions.MarkTrue(rosaScope.ControlPlane, rosacontrolplanev1.ROSARoleConfigReadyCondition)

// Update spec fields from RosaRoleConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is not correct, If the user is already setting those fields value, we are updating those values. I believe we should define internal roleConfig in rosacontrolPlane scope and then check which roles to use based on availability

OperatorRoleConfig OperatorRoleConfig `json:"operatorRoleConfig"`
OIDCConfig OIDCConfig `json:"oidcConfig"`
IdentityRef *infrav1.AWSIdentityReference `json:"identityRef,omitempty"`
Region string `json:"region,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing description and kbuilder tags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we discussed no need for region

Comment on lines 118 to 119
Region string `json:"region,omitempty"`
Prefix string `json:"prefix"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing description and kbuilder tags.

// User-defined prefix for generated AWS operator policies.
// +kubebuilder:validation:MaxLength:=4
// +kubebuilder:validation:Required
Prefix string `json:"prefix"`
Copy link
Contributor

@serngawy serngawy Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix must be immutable cause changing the prefix will create new role

// User-defined prefix for all generated AWS resources
// +kubebuilder:validation:MaxLength:=4
// +kubebuilder:validation:Required
Prefix string `json:"prefix"`
Copy link
Contributor

@serngawy serngawy Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix must be immutable cause changing the prefix will create new roles

)

// SetupWebhookWithManager will setup the webhooks for the ROSARoleConfig.
func (r *ROSARoleConfig) SetupWebhookWithManager(mgr ctrl.Manager) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This routine is never called from main.go, which is why I'm getting:

$ kubectl apply -f rosa-roleconfig-01.yaml
Error from server (InternalError): error when creating "rosa-roleconfig-01.yaml": Internal error occurred: failed calling webhook "default.rosaroleconfig.infrastructure.cluster.x-k8s.io": failed to call webhook: the server could not find the requested resource

With this change in place the above error goes away:

$ git diff
diff --git a/main.go b/main.go
index c65ff7356..6348fd0de 100644
--- a/main.go
+++ b/main.go
@@ -285,6 +285,11 @@ func main() {
                        setupLog.Error(err, "unable to create webhook", "webhook", "ROSAMachinePool")
                        os.Exit(1)
                }
+
+               if err := (&expinfrav1.ROSARoleConfig{}).SetupWebhookWithManager(mgr); err != nil {
+                       setupLog.Error(err, "unable to create webhook", "webhook", "ROSARoleConfig")
+                       os.Exit(1)
+               }
        }
 
        if err = (&expcontrollers.ROSARoleConfigReconciler{

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I don't quite understand the point of this webhook, since it's currently not doing anything.

err = r.createOIDCConfig(roleConfig, scope, ocmClient)
if err != nil {
conditions.MarkFalse(scope.RosaRoleConfig, expinfrav1.RosaRoleConfigReadyCondition, expinfrav1.RosaRoleConfigReconciliationFailedReason, clusterv1.ConditionSeverityError, "Failed to create OIDC Config: %v", err)
return ctrl.Result{RequeueAfter: time.Second * 60}, fmt.Errorf("failed to OICD Config: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above line (and all the other returns with non empty result and non-nil error) will produce the following log entry:

I0722 11:52:38.495821      15 controller.go:345] "Warning: Reconciler returned both a non-zero result and a non-nil error. The result will always be ignored if the error is non-nil and the non-nil error causes requeuing with exponential backoff. For more details, see: https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/reconcile#Reconciler" controller="rosaroleconfig" 

Comment on lines 40 to 41
// - ocmToken: eyJhbGciOiJIUzI1NiIsI....
// - ocmApiUrl: Optional, defaults to 'https://api.openshift.com'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove those 2 lines. Currently the secret is having other info to authenticate

r.Spec.RolesRef.NetworkARN != "" && r.Spec.RolesRef.KubeCloudControllerARN != "" && r.Spec.RolesRef.NodePoolManagementARN != "" &&
r.Spec.RolesRef.ControlPlaneOperatorARN != "" && r.Spec.RolesRef.KMSProviderARN != ""

if hasRosaRoleConfigRef && hasAnyDirectRoleFields {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you making this check; if the rosaRoleConfig is defined we use it ignore the other roles fields check here

@@ -179,6 +183,29 @@ func (r *ROSAControlPlane) validateExternalAuthProviders() *field.Error {
return nil
}

func (r *ROSAControlPlane) validateRosaRoleConfig() *field.Error {
hasRosaRoleConfigRef := r.Spec.RosaRoleConfigRef != nil
hasAnyDirectRoleFields := r.Spec.OIDCID != "" || r.Spec.InstallerRoleARN != "" || r.Spec.SupportRoleARN != "" || r.Spec.WorkerRoleARN != "" ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check with || having 1 field missing should raise error if the rosaRoleConfig not set

@@ -179,6 +183,29 @@ func (r *ROSAControlPlane) validateExternalAuthProviders() *field.Error {
return nil
}

func (r *ROSAControlPlane) validateRosaRoleConfig() *field.Error {
hasRosaRoleConfigRef := r.Spec.RosaRoleConfigRef != nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hasRosaRoleConfigRef := r.Spec.RosaRoleConfigRef != nil
if r.Spec.RosaRoleConfigRef != nil {
return nil
}

return field.Invalid(field.NewPath("spec.rosaRoleConfigRef"), r.Spec.RosaRoleConfigRef, "rosaRoleConfigRef and direct role fields (oidcID, installerRoleARN, supportRoleARN, workerRoleARN, rolesRef) are mutually exclusive")
}

if !hasRosaRoleConfigRef && !hasAllDirectRoleFields {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !hasRosaRoleConfigRef && !hasAllDirectRoleFields {
if !hasAllDirectRoleFields {
// raise error here specifying which fields are missing
// OR do check for every field once it is missing return field.invalid
}

Region string `json:"region,omitempty"`
// Prefix is the prefix for the OIDC config.
// +immutable
Prefix string `json:"prefix"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make all the prefix fields have limit length 4

IdentityRef *infrav1.AWSIdentityReference `json:"identityRef,omitempty"`
// Region is the AWS region for the OIDC config.
// +immutable
Region string `json:"region,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The region is defined above for all accountRole, operatorRole and oidcConfig

}

if scope.RosaRoleConfig.Status.OIDCID == "" {
err = r.createOIDCConfig(roleConfig, scope, ocmClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic need to change as follow;
First you get the oidcConfig using ocm client;
1- if the oidcConfig exist we set the oidcConfig status info and condition
2- if the oidcConfig not exist, we create the oidcConfig then set the oidcConfig stats info and condition
Same applied to accountRoles, operatorRoles and oidcProvider

rosacontrolplanev1.ROSARoleConfigNotReadyReason,
clusterv1.ConditionSeverityWarning,
"RosaRoleConfig %s/%s is not ready", rosaScope.ControlPlane.Namespace, rosaScope.ControlPlane.Spec.RosaRoleConfigRef.Name)
return ctrl.Result{}, fmt.Errorf("RosaRoleConfig %s/%s is not ready", rosaScope.ControlPlane.Namespace, rosaScope.ControlPlane.Spec.RosaRoleConfigRef.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more suitable to log an info message here and just requeue without returning the error.

@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch from b600808 to fedd967 Compare July 29, 2025 09:58
}

// OperatorRoleConfig defines cluster-specific operator IAM roles based on your cluster configuration.
type OperatorRoleConfig struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the proposal definition for the OperatorRoleConfig. OperatorRoleConfig must have the option to assign the oidc-Id AND it should be mutual exclusive with the oidcConfig->createManagedOIDC

Copy link
Contributor

@serngawy serngawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the OidcConfig is missing in the ROSARoleConfig API .

@@ -907,7 +958,7 @@ func validateControlPlaneSpec(ocmClient rosa.OCMClient, rosaScope *scope.ROSACon
return "", nil
}

func buildOCMClusterSpec(controlPlaneSpec rosacontrolplanev1.RosaControlPlaneSpec, creator *rosaaws.Creator) (ocm.Spec, error) {
func buildOCMClusterSpec(controlPlaneSpec rosacontrolplanev1.RosaControlPlaneSpec, roleConfig *expinfrav1.ROSARoleConfig, creator *rosaaws.Creator) (ocm.Spec, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExternalAuth need to be defined as well in the ROSARoleConfig and assigned to the cluster spec similar to here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have discussed this on meeting and we dont need to include ExternalAuth in RosaRoleConfig.

@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch from fedd967 to a428b51 Compare August 5, 2025 11:12
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 5, 2025
@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch from a428b51 to c2e82ca Compare August 5, 2025 11:31
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 5, 2025
@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch 3 times, most recently from 4761247 to fe144c5 Compare August 5, 2025 12:13
@PanSpagetka
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e-blocking

1 similar comment
@PanSpagetka
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e-blocking

// +optional
// +immutable
SharedVPCConfig SharedVPCConfig `json:"sharedVPCConfig,omitempty"`
// OIDCID is the ID of the OIDC config that will be used to create the operator roles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// OIDCID is the ID of the OIDC config that will be used to create the operator roles.
// OIDCID is the ID of the OIDC config that will be used to create the operator roles. A managed OIDC-provider will be created if the OIDCID not specified

OperatorRoleConfig OperatorRoleConfig `json:"operatorRoleConfig"`
OIDCConfig OIDCConfig `json:"oidcConfig"`
IdentityRef *infrav1.AWSIdentityReference `json:"identityRef,omitempty"`
Region string `json:"region,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we discussed no need for region

}

oidcID := scope.RosaRoleConfig.Status.OIDCID
err = r.deleteOIDCProvider(ocmClient, awsClient, oidcID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't delete the oidc-provider if the user set it under the spec.operatorRole.OIDCConfig

@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch from fe144c5 to 60be2ee Compare August 7, 2025 12:02
@PanSpagetka
Copy link
Contributor Author

/retest-required

@PanSpagetka
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-test

1 similar comment
@PanSpagetka
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-test

@PanSpagetka PanSpagetka force-pushed the rosa-roles-implementations branch from 60be2ee to bc8e7af Compare August 8, 2025 06:44
@PanSpagetka
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-apidiff-main

@k8s-ci-robot
Copy link
Contributor

@PanSpagetka: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-apidiff-main bc8e7af link false /test pull-cluster-api-provider-aws-apidiff-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants