Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(nodeadm): containerd BaseRuntimeSpec config not merged correctly #1928

Closed
vedantrathore opened this issue Aug 19, 2024 · 3 comments · Fixed by #2003
Closed

bug(nodeadm): containerd BaseRuntimeSpec config not merged correctly #1928

vedantrathore opened this issue Aug 19, 2024 · 3 comments · Fixed by #2003

Comments

@vedantrathore
Copy link

What happened:
While setting custom rlimits to containerd containers as mentioned in the documentation you can use a custom NodeConfig passed into the user data to override it. However we are using karpenter for provisioning nodes and it adds a custom NodeConfig to the user data as well to establish cluster connectivity. When adding our custom NodeConfig to the user data, the baseRuntimeSpec config key is not reflected in the merged NodeConfig.

User data script:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"

--BOUNDARY
Content-Type: application/node.eks.aws

# Karpenter Generated NodeConfig
---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
metadata:
  creationTimestamp: null
spec:
  cluster:
    apiServerEndpoint: xxxxxxxxxx
    certificateAuthority: xxxxxxxxxxx
    cidr: xxxxxxxxxx
    name: xxxxxxxxxx
  containerd: {}
  instance:
    localStorage: {}
  kubelet:
    config:
      clusterDNS:
      - xxxxxxxxxx
      maxPods: xx
    flags:
    - --node-labels="karpenter.sh/capacity-type=on-demand"

--BOUNDARY
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  containerd:
    baseRuntimeSpec:
      process:
        rlimits:
          - type: RLIMIT_NOFILE
            soft: 1024
            hard: 1024

--BOUNDARY--

What you expected to happen:
Expected merged NodeConfig

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    apiServerEndpoint: xxxxxxxxxx
    certificateAuthority: xxxxxxxxxxx
    cidr: xxxxxxxxxx
    name: xxxxxxxxxx
  instance:
    localStorage: {}
  kubelet:
    config:
      clusterDNS:
      - xxxxxxxxxx
      maxPods: xx
    flags:
    - --node-labels="karpenter.sh/capacity-type=on-demand"
  containerd:
    baseRuntimeSpec:
      process:
        rlimits:
          - type: RLIMIT_NOFILE
            soft: 1024
            hard: 1024

But the result is

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    apiServerEndpoint: xxxxxxxxxx
    certificateAuthority: xxxxxxxxxxx
    cidr: xxxxxxxxxx
    name: xxxxxxxxxx
  instance:
    localStorage: {}
  containerd: {}
  kubelet:
    config:
      clusterDNS:
      - xxxxxxxxxx
      maxPods: xx
    flags:
    - --node-labels="karpenter.sh/capacity-type=on-demand"

How to reproduce it (as minimally and precisely as possible):

Pass the above mentioned node config into user data of any EC2 instance, ssh and run nodeadm init
The merged node config should be visible in the logs.

Anything else we need to know?:
I dived into nodeadm source code to understand why it is happening. My understanding is, when we call the config.Merge(nodeConfig) here we setup our merger with some custom transformers using mergo.WithTransformers(nodeConfigTransformer{})

The transformer sets up custom transformers for containerd and kubelet config using type reflection but we only setup the transformer for Config key of containerd config and not BaseRuntimeSpec. I reused the logic from the func transformKubeletConfig and tested by making the following changes to the transformer to see if the merge functionality is working as expected. Please let me know this makes sense, would be happy to send out a PR for this fix.

const (
	kubeletFlagsName  = "Flags"
	kubeletConfigName = "Config"

	containerdBaseRuntimeSpecName = "BaseRuntimeSpec"
	containerdConfigName = "Config"
)

type nodeConfigTransformer struct{}

func (t nodeConfigTransformer) Transformer(typ reflect.Type) func(dst, src reflect.Value) error {
	if typ == reflect.TypeOf(ContainerdOptions{}) {
		return func(dst, src reflect.Value) error {
			t.transformContainerdConfig(
				dst.FieldByName(containerdConfigName),
				src.FieldByName(containerdConfigName),
			)

			if err := t.transformContainerdBaseRuntimeSpec(
				dst.FieldByName(containerdBaseRuntimeSpecName),
				src.FieldByName(containerdBaseRuntimeSpecName),
			); err != nil {
				return err
			}

			return nil
		}
	} else if typ == reflect.TypeOf(KubeletOptions{}) {
		return func(dst, src reflect.Value) error {
			t.transformKubeletFlags(
				dst.FieldByName(kubeletFlagsName),
				src.FieldByName(kubeletFlagsName),
			)

			if err := t.transformKubeletConfig(
				dst.FieldByName(kubeletConfigName),
				src.FieldByName(kubeletConfigName),
			); err != nil {
				return err
			}

			return nil
		}
	}
	return nil
}
@cartermckinnon
Copy link
Member

@vedantrathore makes sense to me, feel free to open a PR!

@vedantrathore
Copy link
Author

Sounds good, would there be any different merge strategy in case of transformContainerdBaseRuntimeSpec ? Currently I just reused the same code as for transformKubeletConfig

@cartermckinnon
Copy link
Member

@vedantrathore makes sense to me, you can add a test case to confirm

@cartermckinnon cartermckinnon changed the title Nodeadm does not merge containerd BaseRuntimeSpec config correctly bug(nodeadm): containerd BaseRuntimeSpec config not merged correctly Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants