Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: NodePool's queuedProvisioning field doesn't take effect #570

Closed
1 task done
vfarcic opened this issue Jul 14, 2024 · 13 comments · Fixed by #588
Closed
1 task done

[Bug]: NodePool's queuedProvisioning field doesn't take effect #570

vfarcic opened this issue Jul 14, 2024 · 13 comments · Fixed by #588
Labels
bug Something isn't working needs:triage

Comments

@vfarcic
Copy link

vfarcic commented Jul 14, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Affected Resource(s)

container.gcp.upbound.io/v1beta2 - NodePool

Resource MRs required to reproduce the bug

No response

Steps to Reproduce

Create a nodepool attached to a GKE cluster.

What happened?

Since recently (one of the newer releases of the provider), after a nodepool is created, it continuously tries to use queued_provisioning. The same setup worked in older versions of the provider. According to the docs, there is no parameter that can be used to define queued provisioning.

Relevant Error Output Snippet

From events:


  Warning  CannotUpdateExternalResource     2s (x4 over 12s)     managed/container.gcp.upbound.io/v1beta1, kind=nodepool  async update failed: refuse to update the external resource because the following update requires replacing it: cannot change the value of the argument "queued_provisioning.#" from "" to ""


### Crossplane Version

v1.16.0

### Provider Version

v1.5.0

### Kubernetes Version

_No response_

### Kubernetes Distribution

_No response_

### Additional Info

_No response_
@vfarcic vfarcic added bug Something isn't working needs:triage labels Jul 14, 2024
@miraai
Copy link

miraai commented Jul 15, 2024

Hello! I am facing the same issue, using the same versions. I use v1beta1 version of a provider. In v1beta2 field actually exists, see here.

But I would like to know how to resolve this in v1beta1. If its a change related to v1beta2, why would we see this is v1beta1?

My error output for reference:

Warning  CannotUpdateExternalResource  1s (x2 over 1s)  managed/container.gcp.upbound.io/v1beta1, kind=nodepool  async update failed: refuse to update the external resource because the following update requires replacing it: cannot change the value of the argument "queued_provisioning.#" from "" to ""

Update: same thing happening on v1beta2 :/

@vfarcic
Copy link
Author

vfarcic commented Jul 15, 2024

My bad... I thought that the marketplace docs always show the latest version, which is true for the provider version but false for API version. The default is v1beta1 even though the latest is v1beta2.

@miraai
Copy link

miraai commented Jul 16, 2024

Did you manage to make it work? Because I do keep getting the same error even on v1beta2. I did set queuedProvisioning.enabled: false though.

@mergenci
Copy link
Collaborator

@vfarcic I believe that the marketplace lists storage version in the provider page. All versions of a resource could be found under the version option box in the resource page. It would be clearer, in my humble opinion, to list both the storage version and the latest version in the provider page.

@miraai I haven't tested, but I don't expect queuedProvisioning to work, unless you're using the default value. Here's why: queuedProvisioning was a beta field in the Terraform provider v5.19.0 — beta fields in the Terraform provider are excluded from our APIs. We upgraded to Terraform provider v5.28.0, in which queuedProvisioning is no longer a beta field. You can see that the generated API began to include queuedProvisioning at that time. Our storage version is still v1beta1, though. Therefore, unless we implemented a custom conversion webhook implementation, the field will be lost, when the API server converts the field from v1beta2 to v1beta1 to store.

I don't know how we are supposed to handle cases like this. I'll investigate deeper.

@miraai
Copy link

miraai commented Jul 16, 2024

@mergenci totally understand, but no matter what I do, either if its v1beta1 or v1beta2, setting up queuedProvisioning or omitting it, I still get the error and my node status is always in ReconcileError. :/

Cluster and node pool work properly, everything is connected as it should. Do you have any advice how I can deal with this in the meantime? Using default node pool does not bring this error up, but I need autoscaling abilities for some clusters.

Thank you!

@mergenci
Copy link
Collaborator

@miraai I see 🤔 At the moment, I don't have anything in mind that might help. Can you post example manifests here, so that it's easier for us to reproduce your issue?

@miraai
Copy link

miraai commented Jul 16, 2024

Sure thing!

These are the resources, just using this composition for testing:

resources:
  - name: cluster
    base:
      apiVersion: container.gcp.upbound.io/v1beta2
      kind: Cluster
      spec:
        providerConfigRef:
          name: gcp-provider-config
        writeConnectionSecretToRef:
          namespace: crossplane
          name: gkecluster
        forProvider:
          location: us-central1
          initialNodeCount: 1
          removeDefaultNodePool: true
          deletionProtection: false
          enableIntranodeVisibility: true
          network: idp-network
          management:
            autoRepair: true
            autoUpgrade: true
    patches:
    - fromFieldPath: spec.name
      toFieldPath: metadata.name
    - fromFieldPath: spec.name
      toFieldPath: spec.writeConnectionSecretToRef.name
      transforms:
      - type: string
        string:
          fmt: '%s-secrets'
    - fromFieldPath: spec.claimRef.namespace
      toFieldPath: spec.writeConnectionSecretToRef.namespace
    - fromFieldPath: spec.parameters.version
      toFieldPath: spec.forProvider.minMasterVersion
    - type: ToCompositeFieldPath
      fromFieldPath: metadata.name
      toFieldPath: status.clusterName
    - type: ToCompositeFieldPath
      fromFieldPath: status.conditions[0].reason
      toFieldPath: status.controlPlaneStatus
    connectionDetails:
    - fromConnectionSecretKey: kubeconfig
  - name: nodepool
    base:
      apiVersion: container.gcp.upbound.io/v1beta2
      kind: NodePool
      spec:
        providerConfigRef:
          name: gcp-provider-config
        forProvider:
          nodeLocations:
            - us-central1-a
            - us-central1-b
            - us-central1-c
          clusterSelector:
            matchControllerRef: true
          nodeConfig:
            oauthScopes:
              - https://www.googleapis.com/auth/cloud-platform
          autoscaling:
            maxNodeCount: 3
            minNodeCount: 1
          management:
            autoRepair: true
            autoUpgrade: true
    patches:
    - fromFieldPath: spec.name
      toFieldPath: metadata.name
    - fromFieldPath: spec.claimRef.namespace
      toFieldPath: spec.writeConnectionSecretToRef.namespace
    - fromFieldPath: spec.parameters.version
      toFieldPath: spec.forProvider.version
    - fromFieldPath: spec.parameters.minNodeCount
      toFieldPath: spec.forProvider.initialNodeCount
    - fromFieldPath: spec.parameters.minNodeCount
      toFieldPath: spec.forProvider.autoscaling.minNodeCount
    - fromFieldPath: spec.parameters.nodeSize
      toFieldPath: spec.forProvider.nodeConfig.machineType
      transforms:
        - type: map
          map:
            small: n2-standard-4
            medium: n2-standard-8
            large: n2-standard-16
    - type: ToCompositeFieldPath
      fromFieldPath: status.conditions[0].reason
      toFieldPath: status.nodePoolStatus

Composite resource if it matters:

kind: XCompositeCluster
metadata:
  name: np-cluster
spec:
  name: np-cluster
  compositionSelector:
    matchLabels:
      provider: google
  parameters:
    version: 1.29.4-gke.1043004
    nodeSize: small
    minNodeCount: 1

We are still in early stages of adopting crossplane for GCP, but it would be cool if i did not have to worry about this going forward. It could of course be that I am missing something small that's causing this :D

@mergenci
Copy link
Collaborator

Regarding my above comment that

beta fields in the Terraform provider are excluded from our APIs

@ulucinar rightfully asked how beta fields are represented in the Terraform schema. It really seems like there's no such thing as a beta field in the schema. The PR that introduced queuedProvisioning field has no implementation! It certainly wasn't in the resource's schema at the time it was introduced. There might be another mechanism to handle such fields, even though that sounds rather contrived. queuedProvisioning was implemented in a later PR.

@mergenci
Copy link
Collaborator

@miraai, this issue is similar to #561. We will address as I explained in my comment there. We might have a special case for this resource, because we have to handle the beta queuedProvisioning field for v1beta1. I'm not sure how we can do so, but I'm rather confident that v1beta2 would work as intended after the fix.

@mergenci mergenci changed the title [Bug]: [Bug]: NodePool's queuedProvisioning field doesn't take effect Jul 18, 2024
@withnale
Copy link

Do we know when there is likely to be a fix for this? Upgrading to the latest provider breaks all of our existing nodepools.

@mergenci
Copy link
Collaborator

@withnale, I'm not able to share a timeline. Conversion webhook issues, like this issue, are our top priority at the moment.

@miraai
Copy link

miraai commented Jul 22, 2024

@mergenci Thank you for looking into this! I will stick with default node pool for the PoC phase we are in, so hopefully this gets resolved by the time I get out of PoC. Will we get notified here when it is done or should i follow the other issue as well?

@jeanduplessis
Copy link
Collaborator

@miraai you can keep an eye on the releases section for this provider or the #announcements channel in the Crossplane Slack workspace to get notified when the fix is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs:triage
Projects
None yet
5 participants