Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: SecurityGroupIngressRule / SecurityGroupEgressRules not reconciling and crashing EC2 Pod #1242

Open
1 task done
vibe opened this issue Mar 28, 2024 · 2 comments
Open
1 task done
Labels

Comments

@vibe
Copy link

vibe commented Mar 28, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Affected Resource(s)

ec2.aws.upbound.io/v1beta1 - SecurityGroupIngressRule
ec2.aws.upbound.io/v1beta1 - SecurityGroupEgressRule
ec2.aws.upbound.io/v1beta1 - SecurityGroup

Resource MRs required to reproduce the bug

apiVersion: ec2.aws.upbound.io/v1beta1
kind: SecurityGroup
metadata:
labels:
id: some-id
region: us-west-2
name: sample-security-group
spec:
deletionPolicy: Delete
forProvider:
vpcIdSelector:
matchControllerRef: true
matchLabels:
region: us-west-2
managementPolicies:

  • '*'

apiVersion: ec2.aws.upbound.io/v1beta1
kind: SecurityGroupIngressRule
metadata:
name: some-rule
spec:
deletionPolicy: Delete
forProvider:
fromPort: 8078
ipProtocol: tcp
referencedSecurityGroupIdSelector:
matchControllerRef: true
matchLabels:
region: us-west-2
region: us-east-1
securityGroupId: sg-0eb2d04c577f1db47 #this gets autofilled on cluster after first resolution but never updates if parent security group changes
securityGroupIdRef:
name: sample-security-group ## this gets autofilled on cluster after first resolution but never updates if parent security group changes
securityGroupIdSelector:
matchControllerRef: true
matchLabels:
id: some-id
region: us-west-2
toPort: 8078
initProvider: {}
managementPolicies:

  • '*'

Steps to Reproduce

Create Security Group
Create SecurityGroupIngressRule

  • Recreate Security Group (so security id changes)

  • SecurityGroupIngressRule still points to old id

  • Update policy to resolve always on ingress rule

  • Ec2 pod crashes

  • no way out of this other than to delete security group ingress rules.

What happened?

Security Group Rules should get recreated.

SecurityGroupId is also missing from the securitygroupingress rule spec after changing policy to "always". (suspecting this causes the crash)

Relevant Error Output Snippet

{"level":"info","ts":"2024-03-28T10:05:47Z","logger":"provider-aws","msg":"Beta feature enabled","flag":"EnableBetaManagementPolicies"}
panic: runtime error: index out of range [0] with length 0

goroutine 144506 [running]:
github.com/hashicorp/terraform-provider-aws/internal/service/ec2.(*resourceSecurityGroupIngressRule).createSecurityGroupRule(0xc013192960, {0x148846f8, 0xc00e828600}, 0xc00856a420)
    github.com/hashicorp/terraform-provider-aws@v0.0.0-00010101000000-000000000000/internal/service/ec2/vpc_security_group_ingress_rule.go:66 +0x19c
github.com/hashicorp/terraform-provider-aws/internal/service/ec2.(*resourceSecurityGroupRule).Create(0xc013192960, {0x148846f8, 0xc00e828600}, {{{{0x14899350, 0xc00e5fb800}, {0x100a8a80, 0xc00e5fb230}}, {0x148a4a98, 0xc00a9f4be0}}, {{{0x14899350, ...}, ...}, ...}, ...}, ...)
    github.com/hashicorp/terraform-provider-aws@v0.0.0-00010101000000-000000000000/internal/service/ec2/vpc_security_group_ingress_rule.go:172 +0x298
github.com/hashicorp/terraform-provider-aws/internal/provider/fwprovider.(*wrappedResource).Create.func1({0x148846f8?, 0xc00e828600?}, {{{{0x14899350, 0xc00e5fb800}, {0x100a8a80, 0xc00e5fb230}}, {0x148a4a98, 0xc00a9f4be0}}, {{{0x14899350, 0xc00df7b620}, ...}, ...}, ...}, ...)
    github.com/hashicorp/terraform-provider-aws@v0.0.0-00010101000000-000000000000/internal/provider/fwprovider/intercept.go:222 +0x6b
github.com/hashicorp/terraform-provider-aws/internal/provider/fwprovider.(*wrappedResource).Create.interceptedHandler[...].func3({{{{0x14899350, 0xc00e5fb800}, {0x100a8a80, 0xc00e5fb230}}, {0x148a4a98, 0xc00a9f4be0}}, {{{0x14899350, 0xc00df7b620}, {0x100a8a80, 0xc00df7af90}}, ...}, ...}, ...)
    github.com/hashicorp/terraform-provider-aws@v0.0.0-00010101000000-000000000000/internal/provider/fwprovider/intercept.go:119 +0x231
github.com/hashicorp/terraform-provider-aws/internal/provider/fwprovider.(*wrappedResource).Create(0xc0077661c0, {0x14884730?, 0xc00e85d720?}, {{{{0x14899350, 0xc00e5fb800}, {0x100a8a80, 0xc00e5fb230}}, {0x148a4a98, 0xc00a9f4be0}}, {{{0x14899350, ...}, ...}, ...}, ...}, ...)
    github.com/hashicorp/terraform-provider-aws@v0.0.0-00010101000000-000000000000/internal/provider/fwprovider/intercept.go:226 +0x17c
github.com/hashicorp/terraform-plugin-framework/internal/fwserver.(*Server).CreateResource(0xc013b3be40, {0x14884730, 0xc00e85d720}, 0xc00decfca8, 0xc00decfc48)
    github.com/hashicorp/terraform-plugin-framework@v1.4.2/internal/fwserver/server_createresource.go:101 +0x578
github.com/hashicorp/terraform-plugin-framework/internal/fwserver.(*Server).ApplyResourceChange(0xc00decfe00?, {0x14884730, 0xc00e85d720}, 0xc00e85d770, 0xc00decfe00)
    github.com/hashicorp/terraform-plugin-framework@v1.4.2/internal/fwserver/server_applyresourcechange.go:57 +0x4a5
github.com/hashicorp/terraform-plugin-framework/internal/proto5server.(*Server).ApplyResourceChange(0xc013b3be40, {0x148847a0?, 0xc00ed65420?}, 0xc00e85d6d0)
    github.com/hashicorp/terraform-plugin-framework@v1.4.2/internal/proto5server/server_applyresourcechange.go:55 +0x3e5
github.com/crossplane/upjet/pkg/controller.(*terraformPluginFrameworkExternalClient).Create(0xc0130fca20, {0x148847a0, 0xc00ed65420}, {0x148fe8f0?, 0xc007988900})
    github.com/crossplane/upjet@v1.3.0-rc.0.0.20240314111422-84abd051e678/pkg/controller/external_tfpluginfw.go:408 +0x193
github.com/crossplane/upjet/pkg/controller.(*terraformPluginFrameworkAsyncExternalClient).Create.func1()
    github.com/crossplane/upjet@v1.3.0-rc.0.0.20240314111422-84abd051e678/pkg/controller/external_async_tfpluginfw.go:141 +0xb5
created by github.com/crossplane/upjet/pkg/controller.(*terraformPluginFrameworkAsyncExternalClient).Create in goroutine 6854
    github.com/crossplane/upjet@v1.3.0-rc.0.0.20240314111422-84abd051e678/pkg/controller/external_async_tfpluginfw.go:137 +0x168
Stream closed EOF for crossplane-system/provider-aws-ec2-3d66ea2d7903-864d8c9f6-22hth (package-runtime)

Crossplane Version

1.15.1

Provider Version

1.2.1

Kubernetes Version

1.29

Kubernetes Distribution

EKS

Additional Info

No response

@vibe vibe added the bug Something isn't working label Mar 28, 2024
@vibe
Copy link
Author

vibe commented Mar 28, 2024

I wanted to call out this behavior also shows up for ManagedPrefixLists and Route.
Causing reconciliation errors that are tedious to recover from, since they require manual intervention.

@mbbush
Copy link
Collaborator

mbbush commented Mar 29, 2024

@vibe Thanks for the bug report. Anything that causes the provider pod to crash is concerning.

Could you provide a bit more detail? In particular, it would be useful to be able to see the output of kubectl get -o yaml for the relevant resources at each step. You can replace any ip addresses, aws account ids, or anything else you deem sensitive if you feel that's necessary; we don't need that information for debugging.

I'm particularly interested to know what precisely you are doing with

Update policy to resolve always on ingress rule

and what you mean by

SecurityGroupId is also missing from the securitygroupingress rule spec after changing policy to "always". (suspecting this causes the crash)

I think I know what you mean, but a yaml manifest is the most clear way to explain that without any ambiguity.

Also, I'm curious to know what happens if you add "either wait for 10 minutes or make some edit to any annotation on the SecurityGroupIngressRule" to your STRs in between recreating the security group and setting the policy to required. I think that might trigger the Ref field to resolve to the new SecurityGroup, although that may end up producing a different invalid state. It would be good to know what happens regardless.

Finally, please put your yaml manifests in ``` triple backticks, so that github will preserve the whitespace. Otherwise they become very difficult to read.

My first impression is that the panic is a bug in the terraform aws provider, but perhaps one that we can avoid triggering through better validation logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants