Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all controllers - races with vpcRefs, subnetRefs and other references. #1898

Closed
gecube opened this issue Sep 14, 2023 · 5 comments
Closed

all controllers - races with vpcRefs, subnetRefs and other references. #1898

gecube opened this issue Sep 14, 2023 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. service/ec2 Indicates issues or PRs that are related to ec2-controller. service/eks Indicates issues or PRs that are related to eks-controller.

Comments

@gecube
Copy link

gecube commented Sep 14, 2023

Good day!

I am playing with FluxCD GitOps approach and EKS controller.

I tried to create the cluster with the following manifest:

apiVersion: eks.services.k8s.aws/v1alpha1
kind: Cluster
metadata:
  name: production
  namespace: infra-production
spec:
  name: production
  roleARN: arn:aws:iam::966321756598:role/eks-production-role
  logging:
    clusterLogging:
      - enabled: true
        types:
          - api
          - audit
          - authenticator
          - controllerManager
          - scheduler
  resourcesVPCConfig:
    endpointPrivateAccess: true
    endpointPublicAccess: false
    subnetRefs:
      - from:
          name: production-private-eu-west-2a
      - from:
          name: production-private-eu-west-2b
  tags:
    Name: production

The cluster was created successfully, but now I am observing the next state of the object:

apiVersion: eks.services.k8s.aws/v1alpha1
kind: Cluster
metadata:
  resourceVersion: '73958425'
  name: production
  uid: 8290c91e-ae11-4ebb-b068-fef1d21f85a8
  creationTimestamp: '2023-07-25T06:08:08Z'
  generation: 3
  managedFields:
    - apiVersion: eks.services.k8s.aws/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:labels':
            'f:kustomize.toolkit.fluxcd.io/name': {}
            'f:kustomize.toolkit.fluxcd.io/namespace': {}
        'f:spec':
          'f:logging':
            'f:clusterLogging': {}
          'f:name': {}
          'f:resourcesVPCConfig':
            'f:endpointPrivateAccess': {}
            'f:endpointPublicAccess': {}
            'f:subnetRefs': {}
          'f:roleARN': {}
          'f:tags':
            'f:Name': {}
      manager: kustomize-controller
      operation: Apply
      time: '2023-07-25T06:13:09Z'
    - apiVersion: eks.services.k8s.aws/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:finalizers':
            .: {}
            'v:"finalizers.eks.services.k8s.aws/Cluster"': {}
        'f:spec':
          'f:kubernetesNetworkConfig':
            .: {}
            'f:ipFamily': {}
            'f:serviceIPv4CIDR': {}
          'f:resourcesVPCConfig':
            'f:subnetIDs': {}
          'f:version': {}
      manager: controller
      operation: Update
      time: '2023-07-25T06:08:09Z'
    - apiVersion: eks.services.k8s.aws/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          .: {}
          'f:ackResourceMetadata':
            .: {}
            'f:arn': {}
            'f:ownerAccountID': {}
            'f:region': {}
          'f:certificateAuthority': {}
          'f:conditions': {}
          'f:createdAt': {}
          'f:status': {}
      manager: controller
      operation: Update
      subresource: status
      time: '2023-09-14T08:38:40Z'
  namespace: infra-production
  finalizers:
    - finalizers.eks.services.k8s.aws/Cluster
  labels:
    kustomize.toolkit.fluxcd.io/name: infra-management
    kustomize.toolkit.fluxcd.io/namespace: flux-system
spec:
  kubernetesNetworkConfig:
    ipFamily: ipv4
    serviceIPv4CIDR: 172.20.0.0/16
  logging:
    clusterLogging:
      - enabled: true
        types:
          - api
          - audit
          - authenticator
          - controllerManager
          - scheduler
  name: production
  resourcesVPCConfig:
    endpointPrivateAccess: true
    endpointPublicAccess: false
    subnetIDs:
      - subnet-0f06902b47c880118
      - subnet-0c72af713be937dcc
    subnetRefs:
      - from:
          name: production-private-eu-west-2a
      - from:
          name: production-private-eu-west-2b
  roleARN: 'arn:aws:iam::966321756598:role/eks-production-role'
  tags:
    Name: production
  version: '1.27'
status:
  ackResourceMetadata:
    arn: 'arn:aws:eks:eu-west-2:966321756598:cluster/production'
    ownerAccountID: '966321756598'
    region: eu-west-2
  certificateAuthority: {}
  conditions:
    - lastTransitionTime: '2023-09-14T08:38:40Z'
      message: Reference resolution failed
      reason: >-
        both resource reference wrapper and ID cannot be used together:
        ResourcesVPCConfig.SubnetIDs,ResourcesVPCConfig.SubnetRefs
      status: Unknown
      type: ACK.ReferencesResolved
  createdAt: '2023-07-25T06:08:09Z'
  status: CREATING

So my idea is that EKS controller (or EC2 - it does not matter at this moment) takes

    subnetRefs:
      - from:
          name: production-private-eu-west-2a
      - from:
          name: production-private-eu-west-2b

block from YAML, removes it and add this one:

    subnetIDs:
      - subnet-0f06902b47c880118
      - subnet-0c72af713be937dcc

resolving which subnet by name has which ID.

But on the next reconciliation cycle FluxCD sees that the fields were removed and adds them back.

So finally we are getting in the next status:

    - lastTransitionTime: '2023-09-14T08:38:40Z'
      message: Reference resolution failed
      reason: >-
        both resource reference wrapper and ID cannot be used together:
        ResourcesVPCConfig.SubnetIDs,ResourcesVPCConfig.SubnetRefs
      status: Unknown
      type: ACK.ReferencesResolved

It looks like that no controller from ACK library must change the source spec of object, but rather make all such a changes in status field and maintain some internal state of the controller - no idea. What is interesting - the same issue is not observer with VPC, Subnets and other objects. No idea why.

Also once I observed the next yaml:
Screenshot 2023-08-16 at 8 07 37

I wondered too much how I could get it. But possibly it could be linked to incorrect paste of IDs instead of references as well.

@a-hilaly a-hilaly added kind/bug Categorizes issue or PR as related to a bug. service/ec2 Indicates issues or PRs that are related to ec2-controller. service/eks Indicates issues or PRs that are related to eks-controller. labels Sep 14, 2023
@ack-bot
Copy link
Collaborator

ack-bot commented Mar 12, 2024

Issues go stale after 180d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 60d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle stale

@ack-prow ack-prow bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 12, 2024
@gecube
Copy link
Author

gecube commented Mar 13, 2024

/remove-lifecycle stale

@a-hilaly
Copy link
Member

@gecube we have a PR attempting to fix this issue aws-controllers-k8s/eks-controller#128

ack-prow bot pushed a commit to aws-controllers-k8s/eks-controller that referenced this issue Sep 19, 2024
Issue [#1898](aws-controllers-k8s/community#1898)

Description of changes:
Similarly as the ec2-controller, the ref stored in a 
struct is being overwritten during create/update. 
With this fix we ensure the ref would not be discarded.

Currently there isn't a way to test this fix, as we can't 
run two controllers simultaneously and reference subnet 
names in cluster directly.
Instead I will document an example with the fix working in this PR

provided:
```yaml
apiVersion: eks.services.k8s.aws/v1alpha1
kind: Cluster
metadata:
  name: my-clust
spec:
  name: my-clust
  roleARN: <removed>
  version: "1.30"
  resourcesVPCConfig:
    endpointPrivateAccess: true
    endpointPublicAccess: false
    subnetRefs:
      - from:
          name: sub1
      - from:
          name: sub2
```

Creating .spec and status:
```yaml
spec:
  accessConfig:
    authenticationMode: CONFIG_MAP
    bootstrapClusterCreatorAdminPermissions: true
  kubernetesNetworkConfig:
    ipFamily: ipv4
    serviceIPv4CIDR: 172.20.0.0/16
  logging:
    clusterLogging:
    - enabled: false
      types:
      - api
      - audit
      - authenticator
      - controllerManager
      - scheduler
  name: my-clust
  resourcesVPCConfig:
    endpointPrivateAccess: true
    endpointPublicAccess: true
    publicAccessCIDRs:
    - 0.0.0.0/0
    subnetRefs:
    - from:
        name: sub1
    - from:
        name: sub2
  roleARN: <removed>
  version: "1.30"
status:
  ackResourceMetadata:
    arn: <removed>
    ownerAccountID: <removed>
    region: us-west-2
  certificateAuthority: {}
  conditions:
  - lastTransitionTime: "2024-09-18T20:56:50Z"
    status: "True"
    type: ACK.ReferencesResolved
  - lastTransitionTime: "2024-09-18T20:56:51Z"
    status: "False"
    type: ACK.ResourceSynced
  createdAt: "2024-09-18T20:56:50Z"
  health: {}
  platformVersion: eks.8
  status: CREATING
```

Active:
```yaml
spec:
  accessConfig:
    authenticationMode: CONFIG_MAP
    bootstrapClusterCreatorAdminPermissions: true
  kubernetesNetworkConfig:
    ipFamily: ipv4
    serviceIPv4CIDR: 172.20.0.0/16
  logging:
    clusterLogging:
    - enabled: false
      types:
      - api
      - audit
      - authenticator
      - controllerManager
      - scheduler
  name: my-clust
  resourcesVPCConfig:
    endpointPrivateAccess: true
    endpointPublicAccess: true
    publicAccessCIDRs:
    - 0.0.0.0/0
    subnetRefs:
    - from:
        name: sub1
    - from:
        name: sub2
  roleARN: <removed>
  version: "1.30"
status:
  ackResourceMetadata:
    <removed>
    ownerAccountID: <removed>
    region: us-west-2
  certificateAuthority:
    data: <removed>
  conditions:
  - lastTransitionTime: "2024-09-18T22:00:27Z"
    status: "True"
    type: ACK.ReferencesResolved
  - lastTransitionTime: "2024-09-18T22:00:28Z"
    status: "True"
    type: ACK.ResourceSynced
  createdAt: "2024-09-18T20:56:50Z"
  endpoint: https://DFD284F7337766E87670192E4EB46565.gr7.us-west-2.eks.amazonaws.com
  health: {}
  identity:
    oidc:
      issuer: https://oidc.eks.us-west-2.amazonaws.com/id/DFD284F7337766E87670192E4EB46565
  platformVersion: eks.8
  status: ACTIVE
```
Updating:
```yaml
spec:
  accessConfig:
    authenticationMode: CONFIG_MAP
    bootstrapClusterCreatorAdminPermissions: true
  kubernetesNetworkConfig:
    ipFamily: ipv4
    serviceIPv4CIDR: 172.20.0.0/16
  logging:
    clusterLogging:
    - enabled: false
      types:
      - api
      - audit
      - authenticator
      - controllerManager
      - scheduler
  name: my-clust
  resourcesVPCConfig:
    endpointPrivateAccess: true
    endpointPublicAccess: false
    publicAccessCIDRs:
    - 0.0.0.0/0
    subnetRefs:
    - from:
        name: sub1
    - from:
        name: sub2
  roleARN: <removed>
  version: "1.30"
status:
  ackResourceMetadata:
    arn: <removed>
    ownerAccountID: <removed>
    region: us-west-2
  certificateAuthority:
    data: <removed>
  conditions:
  - lastTransitionTime: "2024-09-18T22:04:38Z"
    status: "True"
    type: ACK.ReferencesResolved
  - lastTransitionTime: "2024-09-18T22:04:38Z"
    message: Cluster is in 'UPDATING' status
    status: "False"
    type: ACK.ResourceSynced
  - message: cluster in 'UPDATING' state, cannot be modified until 'ACTIVE'
    status: "True"
    type: ACK.Recoverable
  createdAt: "2024-09-18T20:56:50Z"
  endpoint: https://DFD284F7337766E87670192E4EB46565.gr7.us-west-2.eks.amazonaws.com
  health: {}
  identity:
    oidc:
      issuer: https://oidc.eks.us-west-2.amazonaws.com/id/DFD284F7337766E87670192E4EB46565
  platformVersion: eks.8
  status: UPDATING
```

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
@a-hilaly
Copy link
Member

a-hilaly commented Oct 9, 2024

@gecube This is now fixed, in 1.5.0~

@a-hilaly
Copy link
Member

a-hilaly commented Oct 9, 2024

Please reopen the Github issue if you still see this :)

@a-hilaly a-hilaly closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. service/ec2 Indicates issues or PRs that are related to ec2-controller. service/eks Indicates issues or PRs that are related to eks-controller.
Projects
None yet
Development

No branches or pull requests

3 participants