Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] ignore yaml_incluster field from kubectl provider #54

Open
ankitcharolia opened this issue Sep 25, 2023 · 11 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@ankitcharolia
Copy link

We are getting actually lot of noise during terraform plans.

  ~ resource "kubectl_manifest" "argo_application" {
        id                      = "/apis/argoproj.io/v1alpha1/namespaces/xxxxxx/xxxx/xxx"
        name                    = "xxxx"
      ~ yaml_incluster          = (sensitive value)
        # (13 unchanged attributes hidden)
    }

it would be better of we could add yaml_incluster to ignore_fields

@ankitcharolia ankitcharolia changed the title yaml_incluster from kubectl provider ? [enhancement] ignore yaml_incluster field from kubectl provider Sep 25, 2023
@alekc
Copy link
Owner

alekc commented Sep 25, 2023

@ankitcharolia I feel your pain! Getting the same issue as well.

What I suspect is happening is that most (all?) of the noise in yaml_incluster is caused by the property managedFields on the objects. The scope of the ignore_fields is to ignore fields on k8s objects, and not the terraform ones. You should be able to see what's causing the drift in the yaml_incluster if you run your plan with TF_LOG=debug (see this issue for reference as well gavinbunney/terraform-provider-kubectl#266)

Having said that I have 2 tasks in front of me

  1. Evaluate if its safe to add managedFields to the list of ignored values by default (should be)
  2. Check the overall logic related to the comparison with the live version on k8s. Potentially fixes Terraform state is updated when update apply fails gavinbunney/terraform-provider-kubectl#265

@alekc alekc added the enhancement New feature or request label Sep 25, 2023
@haiwu
Copy link

haiwu commented Jan 5, 2024

Any progress?

@alekc
Copy link
Owner

alekc commented Jan 10, 2024

Not yet. Turns out the ignore_fields is just a hash, so the overall change in how the kubectl treats differences is required :(

It's a bit of work, I will tackle it once I have some bandwidth available.

@ArronaxKP
Copy link

ArronaxKP commented Mar 12, 2024

Been dealing with this for a bit of time, and normally its fine, as kubectl apply will result in no changes so I haven't added it to the the ignore_fields in TF.

However today I had an application that detects if kubectl is ran against any manifest, that it depends upon. This means even if no changes are detected it still restarts applications. Its an annoying app.

Adding this to ignore_fields should have no impact as it will still apply when it detects changes in the Terraform vs what was applied last time Terraform ran. However, if this is in ignore_fields it won't detect if the YAML in the cluster has drifted for some reason (manual changes to the manifests directly). Just wanted to check my understanding is correct with this field before I add it to ignore_fields?

@alekc
Copy link
Owner

alekc commented Mar 13, 2024

sadly ignore_fields will not work, basically the incluster_yaml is a hash which is used by the kubectl to detect any changes in the live object. The issue is that any change (i.e. annotation, label, etc) will trigger such update.

What needs to be done (and will likely be addressed in version 3 due to possible breaking changes it would bring) is the check between the manifest which is desired and the corrispondence of it on live object. There are some challenges mainly linked to the scenario where we remove some properties (what to do with it, try to delete remotely, leave it as it is, etc). So a bit of a minefield really...

@alekc
Copy link
Owner

alekc commented Mar 15, 2024

So, I have looked a bit more into the issue. My statement from before

sadly ignore_fields will not work

was partially wrong.

So, this is the code which deals with the yaml_incluster

func getLiveManifestFingerprint(d *schema.ResourceData, userProvided *yaml.Manifest, liveManifest *yaml.Manifest) string {
	var ignoreFields []string = nil
	ignoreFieldsRaw, hasIgnoreFields := d.GetOk("ignore_fields")
	if hasIgnoreFields {
		ignoreFields = expandStringList(ignoreFieldsRaw.([]interface{}))
	}

	fields := getLiveManifestFields_WithIgnoredFields(ignoreFields, userProvided, liveManifest)
	return getFingerprint(fields)
}
func getLiveManifestFields_WithIgnoredFields(ignoredFields []string, userProvided *yaml.Manifest, liveManifest *yaml.Manifest) string {

	// there is a special user case for secrets.
	// If they are defined as manifests with StringData, it will always provide a non-empty plan
	// so we will do a small lifehack here
	if userProvided.GetKind() == "Secret" && userProvided.GetAPIVersion() == "v1" {
		if stringData, found := userProvided.Raw.Object["stringData"]; found {
			// move all stringdata values to the data
			for k, v := range stringData.(map[string]interface{}) {
				encodedString := base64.StdEncoding.EncodeToString([]byte(fmt.Sprintf("%v", v)))
				meta_v1_unstruct.SetNestedField(userProvided.Raw.Object, encodedString, "data", k)
			}
			// and unset the stringData entirely
			meta_v1_unstruct.RemoveNestedField(userProvided.Raw.Object, "stringData")
		}
	}

	flattenedUser := flatten.Flatten(userProvided.Raw.Object)
	flattenedLive := flatten.Flatten(liveManifest.Raw.Object)

	// remove any fields from the user provided set or control fields that we want to ignore
	fieldsToTrim := append(kubernetesControlFields, ignoredFields...)
	for _, field := range fieldsToTrim {
		delete(flattenedUser, field)

		// check for any nested fields to ignore
		for k, _ := range flattenedUser {
			if strings.HasPrefix(k, field+".") {
				delete(flattenedUser, k)
			}
		}
	}

	// update the user provided flattened string with the live versions of the keys
	// this implicitly excludes anything that the user didn't provide as it was added by kubernetes runtime (annotations/mutations etc)
	var userKeys []string
	for userKey, userValue := range flattenedUser {
		normalizedUserValue := strings.TrimSpace(userValue)

		// only include the value if it exists in the live version
		// that is, don't add to the userKeys array unless the key still exists in the live manifest
		if _, exists := flattenedLive[userKey]; exists {
			userKeys = append(userKeys, userKey)
			normalizedLiveValue := strings.TrimSpace(flattenedLive[userKey])
			flattenedUser[userKey] = normalizedLiveValue
			if normalizedUserValue != normalizedLiveValue {
				log.Printf("[TRACE] yaml drift detected in %s for %s, was: %s now: %s", userProvided.GetSelfLink(), userKey, normalizedUserValue, normalizedLiveValue)
			}
		} else {
			if normalizedUserValue != "" {
				log.Printf("[TRACE] yaml drift detected in %s for %s, was %s now blank", userProvided.GetSelfLink(), userKey, normalizedUserValue)
			}
		}
	}

	sort.Strings(userKeys)
	returnedValues := []string{}
	for _, k := range userKeys {
		returnedValues = append(returnedValues, fmt.Sprintf("%s=%s", k, flattenedUser[k]))
	}

	return strings.Join(returnedValues, ",")
}
func getFingerprint(s string) string {
	fingerprint := sha256.New()
	fingerprint.Write([]byte(s))
	return fmt.Sprintf("%x", fingerprint.Sum(nil))
}

Given the fact that it's unlikely that changes in yaml_incluster are being driven by something in the yaml_body it means that if we were to add the changing field (fingers crossed it would be only one), that would prevent the yaml_incluster from coming over and over.

I have thought about storing a real data in the state file and perform changes based on it. But it does have some challenges:

  • Potentially sensitive data being stored in the terraform state
  • Dealing with "sensitive fields" setting (where we mask properties based on the parameters passed to kubectl manifest

Any thoughts?

@ArronaxKP
Copy link

ArronaxKP commented Mar 16, 2024

I will run my plans in trace when I get a chance in work. I was doing the upgrades from 1.26 to 1.27 K8S so that may be playing a factor on one of the fields changing in an odd way. I will run it in trace and we should be able to see which field it is that is causing this noise. I have seen it in a few different manifests types as well so should be a useful thing to check.

On the state secrets:

  • State already contains a lot of sensitive data in the state so I don't think its the end of the world to include but I personally prefer if more wasn't added. If the data was hashed then that is probably more acceptable solution than storing the raw data directly.
  • OpenTofu supporting state encryption makes that a more acceptable compromise but that is not released yet and still very early days for that functionality.

Being able to see the drift without trace would be nice. The only option I could think of is getting the live manifest and "hashing" the data values. Although hash is not guaranteed, it is better than the raw real secrets. You "could" hash the data values and persist the YAML with hashed values instead of real values. Overhead running that one each plan may not be ideal but that's the best I could think of. Hashing isn't guaranteed to be non-reservable unfortunately either.

@alekc
Copy link
Owner

alekc commented Mar 16, 2024

Good points. I think I will proceed in a following way (and release a beta version to try things out first):

  • Yaml_in_cluster will be changed from the simple hash to the key->val text/yaml with individual values being hashed
  • in case of diff, a DEBUG message will be emitted showing either the difference or before and after texts (still need to evaluate which).

Hopefully this will allow to target and ignore certain repeated entries (potentially declaring them globally on the provider level might be not a bad idea).

@alekc
Copy link
Owner

alekc commented Mar 16, 2024

Right. So, a beta release has been rolled out::

terraform {
  required_providers {
    kubectl = {
      source = "alekc/kubectl"
      version = "2.0.4-incluster1"
    }
  }
}

What changed: pretty much as I wrote above. First time you run the provider it will change yaml incluster of all kubectl manifests (which will trigger the equivalent of kubectl apply operation).

Form that point on, you can easily see what has triggered the change in yaml_incluster by running the process with either TF_LOG=debug (will show only the difference)

[DEBUG] DETECTED YAML INCLUSTER STATE DIFFERENCE. Patch diff: @@ -169,71 +169,71 @@
 ar=d
-4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35
+29d53701d3c859e29e1b90028eec1ca8e2f29439198b6e036c60951fb458aa1

Or with the trace which will show much more verbose data

 [TRACE] DETECTED YAML STATE DIFFERENCE apiVersion=3bfc269594ef649228e9a74bab00f042efc91d5acc6fbee31a382e80d42388fe
kind=d677190e0a9990e7d5fa9e4c1bbde44271fb8959c4acb6d43e02ed991128b4bf
metadata.annotations.bar=d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35
metadata.annotations.baz=ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d
metadata.annotations.foo=6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b
metadata.name=8f4574be43f2c514a034027d8bb05f7ebceaaaf90259d99d98cca51f01714af6
metadata.namespace=37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f
spec.ports.#=6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b
spec.ports.0.name=e0603c499aae47eb89343ad0ef3178e044c62e70ae2309b35591d1d49a3211ec
spec.ports.0.port=48449a14a4ff7d79bb7a1b6f3d488eba397c36ef25634c111b49baf362511afc
spec.ports.0.targetPort=c4876de490dcf38b74d6c0d4f120cf01126c3d6a3a49b93ec81caae38ea1497e vs apiVersion=3bfc269594ef649228e9a74bab00f042efc91d5acc6fbee31a382e80d42388fe
kind=d677190e0a9990e7d5fa9e4c1bbde44271fb8959c4acb6d43e02ed991128b4bf
metadata.annotations.bar=d29d53701d3c859e29e1b90028eec1ca8e2f29439198b6e036c60951fb458aa1
metadata.annotations.baz=ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d
metadata.annotations.foo=6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b
metadata.name=8f4574be43f2c514a034027d8bb05f7ebceaaaf90259d99d98cca51f01714af6
metadata.namespace=37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f
spec.ports.#=6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b
spec.ports.0.name=e0603c499aae47eb89343ad0ef3178e044c62e70ae2309b35591d1d49a3211ec
spec.ports.0.port=48449a14a4ff7d79bb7a1b6f3d488eba397c36ef25634c111b49baf362511afc
spec.ports.0.targetPort=c4876de490dcf38b74d6c0d4f120cf01126c3d6a3a49b93ec81caae38ea1497e: timestamp="2024-03-16T20:27:24.079+0100"
image

If you manage to isolate the changing field please let me know (in case its a sensitive one to add to the default ignore values)

Cheers

@alekc alekc self-assigned this Mar 17, 2024
@ArronaxKP
Copy link

ArronaxKP commented Mar 17, 2024

Updated with more examples. Apologies for the double post, removed the first one for better readability.

So I found more causes. I will look for the others in work week. I didn't test the new beta yet, but will check that when I get a chance as well.

Memory and CPU resource one is obvious the cause and should be a simple fix my side.

yaml drift detected in /apis/apps/v1/namespaces/istio-system/deployments/istiod-<version> for spec.template.spec.containers.0.resources.requests.memory, was: 2048Mi now: 2Gi: timestamp=2024-03-17T10:22:12.597Z

The other is cluster wide resources where I set the override_namespace. For readability you can see we never actually set the metadata namespace directly. However, I am setting override_namespace, but as this resource is not deployed in a namespace, the initial metadata has it, but then when a re-run runs it detects the drift. I will check if the code allows me to just stop setting the override_namespace next week. Only way to fix this in the provider would be code the override_namespace to be context aware of what resource is cluster wide and namespace specific to prevent it injecting into cluster wide resources.

yaml drift detected in /apis/apiextensions.k8s.io/v1/namespaces/<namespace>/customresourcedefinitions/<crd-name> for metadata.namespace, was <namespace>now blank: timestamp=2024-03-17T10:22:10.158Z
yaml drift detected in /apis/rbac.authorization.k8s.io/v1/namespaces/<namespace>/clusterrolebindings/<resource-name> for metadata.namespace, was <namespace> now blank: timestamp=2024-03-17T10:22:09.954Z
yaml drift detected in /apis/rbac.authorization.k8s.io/v1/namespaces/<namespace>/clusterroles/<resource-name> for metadata.namespace, was <namespace>now blank: timestamp=2024-03-17T10:22:09.982Z
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: <resource-name>
rules:
- apiGroups:
...
...

The final error is Istio helm deployment related. Dug into this and these updates are happening as K8S is updating these webhooks with cloud specific field exclusions that are not part of the core Istio deployment. So will just have to add these to the ignore_fields

yaml drift detected in /apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations/istio-revision-tag-asm-<version-tag> for webhooks.1.namespaceSelector.matchExpressions.#, was: 2 now: 4: timestamp=2024-03-17T10:22:12.632Z
yaml drift detected in /apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations/istio-revision-tag-asm-<version-tag> for webhooks.0.namespaceSelector.matchExpressions.#, was: 2 now: 4: timestamp=2024-03-17T10:22:12.632Z
yaml drift detected in /apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations/istio-sidecar-injector-<version> for webhooks.0.namespaceSelector.matchExpressions.#, was: 2 now: 4: timestamp=2024-03-17T10:22:12.858Z
yaml drift detected in /apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations/istio-sidecar-injector-<version> for webhooks.1.namespaceSelector.matchExpressions.#, was: 2 now: 4: timestamp=2024-03-17T10:22:12.858Z

@ArronaxKP
Copy link

Updated the above with the cause for Istio. Work has used a lot of my time this week so not had a chance to test the beta out yet. Spoke it through with some colleagues and we all came to the same conclusion:

  • Having this drift information in TRACE or DEBUG makes little difference. So the pre-beta release gives us the same visibility.
  • Would be nice if we could identify the drift just in normal plan mode, instead of with elevated logging. Something like below. However, to show this, we would need to make the field not secret, which feels like the wrong option longer term.
  ~ resource "kubectl_manifest" "argo_application" {
        id                      = "xxxx"
        name                    = "xxxx"
      ~ yaml_incluster          = <<-EOT
          apiVersion: 3bfc269594ef649228e9a74bab00f042efc91d5acc6fbee31a382e80d42388fe
          kind: 37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f
          metadata: 
            name: d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35 
            namespace: d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35 => ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d
    }

I think considering this issue (or all cases are specific to my implementations I don't think the provider needs any changes at the moment for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants