Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot apply CustomResourceDefinitions from namespaced install eventhough RBAC allows it #5886

Closed
wilmardo opened this issue Mar 26, 2021 · 13 comments
Labels
bug Something isn't working

Comments

@wilmardo
Copy link

wilmardo commented Mar 26, 2021

Describe the bug

We are using ArgoCD in a managed cluster where we have rights within our namespaces. So we use the namespaced ArgoCD install with a cluster with namespaces defined:

---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: cluster-kubernetes.default.svc
  namespace: argocd
  annotations:
    managed-by: argocd.argoproj.io
  labels:
    argocd.argoproj.io/secret-type: cluster
stringData:
  config: '{"tlsClientConfig":{"insecure":false}}'
  name: iks-cluster
  namespaces: argocd,playground,tools
  server: https://kubernetes.default.svc

We also have the right to create CRDs in the clusterscope, but just CRDs nothing else. This gives us the ability to install applications within our namespaces that define CRDs, like ArgoCD or Gloo.
This gives us an issue with Argo, in the namespaced setup it errors when applying CRDs:

Cluster level ClusterIssuer "letsencrypt-production" can not be managed when in namespaced mode

When I set namespaces: * Argo cannot connect to the cluster anymore since RBAC doesn't allow clusterresources other then the CRDs:

failed to sync cluster https://10.43.0.1:443: failed to load initial state of resource Deployment.apps: deployments.apps is forbidden: User "system:serviceaccount:argocd:argocd-application-controller" cannot list resource "deployments" in API group "apps" in the namespace "*

I traced the change back to this PR:
argoproj/gitops-engine#143

It shipped with ArgoCD 1.7.7, we are now pinned to 1.7.6. In that version it applies the CRDs fine, it just has no diff and is permanently OutOfSync

To Reproduce

  • Setup ArgoCD as a namespaced install with a namespace set for the cluster.
  • Create the RBAC underneath to give ArgoCD permission to apply CRDs at the clusterscope.
  • Apply an CRD, I included a ClusterIssuer as an example.
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
  name: argocd:admin
rules:
- apiGroups:
  - ""
  resources:
  - secrets
  - configmaps
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - argoproj.io
  resources:
  - applications
  - appprojects
  verbs:
  - create
  - get
  - list
  - watch
  - update
  - patch
  - delete
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
  - list
---
apiVersion: cert-manager.io/v1alpha3
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    email: info@example.org
    privateKeySecretRef:
      name: le-account-key
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    solvers:
    - http01:
        ingress:
            class: nginx

Expected behavior

ArgoCD to apply the CRD and have a Healthy status.

Version

argocd: v1.4.2+48cced9
  BuildDate: 2020-01-24T01:04:04Z
  GitCommit: 48cced9d925b5bc94f6aa9fa4a8a19b2a59e128a
  GitTreeState: clean
  GoVersion: go1.12.6
  Compiler: gc
  Platform: linux/amd64
argocd-server: v1.7.4+f8cbd6b
  BuildDate: 2020-09-05T02:45:44Z
  GitCommit: f8cbd6bf432327cc3b0f70d23b66511bb906a178
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: {Version:kustomize/v3.6.1 GitCommit:c97fa946d576eb6ed559f17f2ac43b3b5a8d5dbd BuildDate:2020-05-27T20:47:35Z GoOs:linux GoArch:amd64}
  Helm Version: version.BuildInfo{Version:"v3.2.0", GitCommit:"e11b7ce3b12db2941e90399e874513fbd24bcb71", GitTreeState:"clean", GoVersion:"go1.13.10"}
  Kubectl Version: v1.17.8
@wilmardo wilmardo added the bug Something isn't working label Mar 26, 2021
@alexmt
Copy link
Collaborator

alexmt commented Oct 1, 2021

As of v2.1 it is possible to enable cluster level resources in a namespaced mode using:

argocd cluster add <context> --cluster-resources=true --namespace my-namespace

@alexmt alexmt closed this as completed Oct 1, 2021
@wilmardo
Copy link
Author

I just gave it a try but that feature doesn't cover this issue. It tries to add a new cluster level service-account:

# argocd cluster add staging-1 --cluster-resources=true --namespace test
WARNING: This will create a service account `argocd-manager` on the cluster referenced by context `imax-staging-1` with full cluster level admin privileges. Do you want to continue [y/N]? y
FATA[0002] Failed to create service account "argocd-manager" in namespace "kube-system": serviceaccounts is forbidden: User "u-gglgl5os6k" cannot create resource "serviceaccounts" in API group "" in the namespace "kube-system"

Which doesn't work and also isn't needed. The current serviceAccount for ArgoCD has the RBAC rights to adds some clusterlevel resources. It currently still complains with the same error:

Cluster level ClusterIssuer "letsencrypt-staging" can not be managed when in namespaced mode

Which isn't true, if Argo is enabled (with RBAC) to manage the CRDs.

@nnachefski
Copy link

I'm getting the same issue....

ComparisonError: Cluster level Namespace "elastic-test" can not be managed when in namespaced mode

ServiceAccount has cluster-wide policy. this whole thing worked fine until the auto-update on my argocd cluster/operator that occurred today.

@nnachefski
Copy link

Ugh, i just checked on my other "prod" clusters for a different customer... same issue. broke everything.

@magf
Copy link

magf commented Nov 29, 2021

Ugh, i just checked on my other "prod" clusters for a different customer... same issue. broke everything.

are you done that?

Add a label to the namespace your application is deployed in so that the Argo CD instance in the openshift-gitops namespace can manage it:

$ oc label namespace spring-petclinic argocd.argoproj.io/managed-by=openshift-gitops

it was described here

@Cuball0
Copy link

Cuball0 commented Nov 29, 2021

ServiceAccount has cluster-wide policy. this whole thing worked fine until the auto-update on my argocd cluster/operator that occurred today.

We are experiencing the sames issues, after the auto-update of ArgoCd everything is messed up, apparently ArgoCD cannot manage the namespaces anymore.

Namespace "yyyy" for ConfigMap "jenkins-rrrrr" is not managed

Did you find a solution for this? We are running ArgoCD v2.1.6 in Openshift 4.6

@jaipalkomakula
Copy link

Hello, Any solution for this issue: Namespace "yyyy" for "resources" "jenkins-rrrrr" is not managed

@gkmahendran09
Copy link

We are able to fix this issue by adding the below key/value in the argocd-default-cluster-config secret.

clusterResources: true

image

We are running ArgoCD v2.1.6 in Openshift 4.7.30

@Cuball0
Copy link

Cuball0 commented Nov 30, 2021

We are using the ArgoCD Operator (in Openshift). I'm not sure, but it could be that the operator reconciles the default secret.

We solved it by adding

config:
    env:
      - name: ARGOCD_CLUSTER_CONFIG_NAMESPACES
        value: argocd

the following env variable to the Subscription of the Operator (where the value is the namespace of the argocd installation)

@patchon
Copy link

patchon commented Nov 30, 2021

We are able to fix this issue by adding the below key/value in the argocd-default-cluster-config secret.

clusterResources: true

image

We are running ArgoCD v2.1.6 in Openshift 4.7.30

Thanks a lot for the tip @gkmahendran09, really saved our day.
We have been struggling with this issue for some time now, and what it boiled down to in our case, was actually two parameters.

1 ) The one you are mentioning here, setting the "clusterResources: true" in the argocd-cluster secret. This, to my understanding, "tells" argocd that it's allowed to manage resources outside its own namespace.

However, doing so, we were faced with a new error, telling us,
"message\":\"Namespace \\\"foo\\\" for LimitRange \\\"bar\\\" is not managed\",\"type\":\"ComparisonError\" which is becuase its still configured via the operator, to only "manage" resources within the namespace it recides in. So even if its allowed via the "clusterResources: true" parameter, it doesn't have any knowledge about the resources outside its own namespace.

2 ) To give argocd this knowledge, you have to provide the operator with the environment variable, as stated by user Cuball0 above. Ie., edit the subscription for the operator like

$ > oc get subscriptions.operators.coreos.com argocd-operator 

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: argocd
  namespace: argocd
spec:
  channel: alpha
  name: argocd-operator
  source: operatorhubio-catalog
  sourceNamespace: olm
  config:
    env:
    - name: ARGOCD_CLUSTER_CONFIG_NAMESPACES
      value: <argocd-namespaces(s)>

where <argocd-namespaces(s)> should be replaced by a comma separated list of namespaces in which you have argocd-instances that should manage cluster-wide-resources (ie. it should not be a list of namespaces you want to manage).

I've had issues finding this in the documentaion, but thanks to you we are all good again with our "namespaced argocd which handles cluster-wide resources".

Hope this helps somebody else and please correct me if I'm wrong about something.

😄 👍 💯 🚀 🥇 🍰

@patchon
Copy link

patchon commented Dec 7, 2021

We are able to fix this issue by adding the below key/value in the argocd-default-cluster-config secret.

clusterResources: true

Hm, I noticed today that this secret actually gets deleted when setting the environment variable in the subscription.
So it seems that you only would need to update the subsciption to get desired functionality.

Just thought I should share this knowledge.

mbaldessari added a commit to mbaldessari/common that referenced this issue Dec 14, 2021
…nt namespaces

Otherwise we can error out with:

  Cluster level ClusterRoleBinding vault-server-binding cannot be managed
  when in namespaced mode

By setting ARGOCD_CLUSTER_CONFIG_NAMESPACES to the namespaces where
argocd instance is installed we allow it to create cluster wide
resources for all namespaces.

Tested this and now I am correctly able to invoke the vault helm chart
from argo without the above error.

References:
- argoproj/argo-cd#5886
- argoproj-labs/argocd-operator#385
claudiol added a commit to hybrid-cloud-patterns/example that referenced this issue Mar 1, 2022
* Provide more subscription defaults

* Review and simplify argo sync options

* Ignore objects derived from our ACM policies

* See if we can set SkipDryRunOnMissingResource att the application level

* Revert "See if we can set SkipDryRunOnMissingResource att the application level"

It was not possible

This reverts commit 2ca42ed.

* Add more subscription defaults

* Support to kustomize targets

* Support applications using plugins

* Cannot assume the subscription name and csv prefix match

* Allow custom ignoreDifferences values

* Helper for deploying the repo accurately

* Include the site name in the initial app

* Simple site to test argo plugins

* Pass site value files when using the helm-with-kustomize plugin

* Use the app name when rendering the helm template

* Add missing file

* kustomize debugging

* The helm-with-kustomize example is useful to keep around

* Additional help targets

* Support loading just the site application on install

* Make it easy to drive the bootstrap option from the helper

* Make it easy to kustomize the secrets location from the helper

* Limit helm's template name to under 53 chars

* Create the application name ordered by significance

* Site name is included in the site's .Release.Name

* Try to stay under helm's name template max of 53 chars

* Fix the application role binding to refer to the actual policy

* Use the managed site name for the application and values file

* Enforce quoting of helmOverrides for acm policy applications

* Provide a default path to the site chart

* Drop references to manuela in common

* Allow namespaces without operatorgroups for charts that already include them

* Add argocd secret handling

* Add argosecret target

* Remove unneeded util dir

* Explain how to use common makefile

* Allow namespaces without operatorgroups for charts that already include them

* Add template toplevel makefile

* Allow templates to know which namespace they're being deployed into

* Refresh the kustomize-renderer example

* Refresh the sample datacenter from manuela

* Don't assume the install symlink is present

* Attempt new template format

* Ensure password has length before exiting wait loop

* Replace makefile template and make embedded shell wait for password to have length as well as 0 exit

* Add then

* Make script explain what it's doing

* make output clearer

* Revert "Attempt new template format"

This reverts commit c463cbc.

* Try putting applications into a dedicated argo instance

* All namespaces are created by the site chart

* Fix default applicaiton namespace handling

* Make the pattern name availble to argo applications

* Give preference to whatever was specified in the secrets file

* Strip off any auth tokens when calculating the target repo

* Ensure there are no spaces in the namespace

* Fix namespace for applications

* Fix namespace for application acm policies

* Include the pattern name to ensure uniqueness

* Update repo name

* Try putting applications into a dedicated argo instance

* All namespaces are created by the site chart

* Fix default applicaiton namespace handling

* Make the pattern name availble to argo applications

* Give preference to whatever was specified in the secrets file

* Strip off any auth tokens when calculating the target repo

* Ensure there are no spaces in the namespace

* Fix namespace for applications

* Fix namespace for application acm policies

* Include the pattern name to ensure uniqueness

* Fix the destination namespace for datacenter manifests

* Try a simpler argo name

* Use a shorter namespace

* acm: Fix the target namespace for the site application policy

* Fix application namespace

* The application name is already unique to the namespace

* Restore gitops to the name of the argocd CR

* Match the service accounts to the argocd name

* Document what argocd values need to be kept in sync

* Updated note regarding argo name

* Change the default 'name' label for namespaces

* Update common Makefile to have more parameters for secret making

* Re-factor install to not require .ONESHELL features as Mac doesn't support them out of the box

* Update Makefile doc and SECRET_NAME parameter

* Don't hardcode SECRET_NAME here

* Move script to scripts dir and call it from there

* New s3 secrets file for central-s3 support

* Took care of merge conflicts with s3-secret.yaml

* Adding functionalist to have a list of namespaces for a particular subscription

* Enhance compatibility with older versions of git

* Trim the example datacenter site

* Support real helm charts too

* Adds the if control to force booleans to be string type for argo on helm/vault overrides

* Improved secrets handling in pipelines (#10)

* Update common with new secret management

* Add values-global to Makefile

Co-authored-by: Wolfgang Kulhanek <WolfgangKulhanek@gmail.com>

* Add note regarding tekton annotation

* Add note regarding tekon annotation

* Ensure updated secret template variables are defined

* Missing template variable

* Update values.yaml

* Avoid assumptions about the site being called datacenter leaking into patterns

* Add missing template variables

* Standardize on Values.secrets for usernames as well as passwords

* Sync the example global values file with the i-e pattern

* Sync the plugin example application with the i-e pattern

* Ensure helm options stay in sync and add a simple test

* Make the test more stable and add missing values

* Extend the unit tests to the acm and site charts

* Ensure the global.repoURL variable is set consistently in tests

* Add some elements to .gitignore

* Fix whitespace in repoURL value in a POSIX-friendly way

* Remove manuela-specific elements and secrets

* Modify tests to match removal of secrets and namespace

* Remove cruft from makefile

* Loosen regex to extract target repo

* Add structure for vault

* Remove vault subdir to prep for alternate inclusion

* Squashed 'vault/' content from commit 9fa25e9

git-subtree-dir: vault
git-subtree-split: 9fa25e97c806073c7dd3274a851181cbb3d67868

* Change site to clustername to allow for multiple clusters in a config group

* Remove staging and adjust tests to reflect that

* Update examples for recent cleanups

* Support ocp authentication for namespaced argos

* Update examples for recent cleanups

* Support ocp authentication for namespaced argos

* Make sure that argo can manage cluster-wide resources from the relevant namespaces

Otherwise we can error out with:

  Cluster level ClusterRoleBinding vault-server-binding cannot be managed
  when in namespaced mode

By setting ARGOCD_CLUSTER_CONFIG_NAMESPACES to the namespaces where
argocd instance is installed we allow it to create cluster wide
resources for all namespaces.

Tested this and now I am correctly able to invoke the vault helm chart
from argo without the above error.

References:
- argoproj/argo-cd#5886
- argoproj-labs/argocd-operator#385

* Add some vault utilities and add a gitignore entry

* Add Makefile target to init and unseal vault

* Add an unseal target and provide it a default in the script

* Initial import of chart from 9348383 of https://github.com/external-secrets/external-secrets

* Remove vault and external secrets - we can install from helm directly

* Protect ourselves from calling vault-init twice

Without this we risk to easily overwrite the seal + token file and hence
lose future access to the vault.

* Add script for copying vault-token

* Add Hub cluster domain as a global variable based on ingress config

* Add code to extract root token

* Add function to wrap an exec including the token

* Add pki init to vault init makefile target

* Expand the PKI domain (knock off the first two domains for the hub cluster, e.g. apps and the next one up to allow the PKI to be bigger

* Correct pki role and domain creation

* Add more functions for secrets management

* pki init is done in vault_init, no need to have a separate makefile task

* Fix the name of the function to initialize the kubernetes backend

Otherwise we'll error out with:
common/scripts/vault-utils.sh: line 85: vault_k8s_init: command not found

* Add --interactive to the oc vault exec calls

This allows us to read stdin and push a file via stdin.
This is particularly useful when configuring the vault policy

* Add a policy init function to setup initial policy for the vault

* Add variable qualification to prevent helm template errors

* Add vault-route to workaround hard-coding of passthrough mode in vault helm chart 0.18.0

* Correct route resource, remove namespace and spell variable correctly

* Fix TTL lease typo in vault-init

Current wrong command:
bash-4.4$ vault secrets tune -max-lease=8760h pki
flag provided but not defined: -max-lease

Add -ttl at the end to fix it:
bash-4.4$ vault secrets tune --max-lease-ttl=8760h pki
Success! Tuned the secrets engine at: pki/

* Remove extra duplicate subcription YAML and force quoting in application install for consistency

* Add local domain to ACM policy

* Propogate localdomain setting to non-privileged argo

* Fix some tests

* Fix remaining tests

* Remove manuela tag from clustergroup chart

* Add extra framework options to level with clustergroup implementation

* Remove vault-route application

Now that vault-helm v0.19.0 has been released we can use it directly to
create the vault route with tls.termination set to 'edge', hence this is
not needed anymore.

* Remove vault-route application

Now that vault-helm v0.19.0 has been released we can use it directly to
create the vault route with tls.termination set to 'edge', hence this is
not needed anymore.

* Supply hubClusterDomain for localHubCluster where we don't have ACM to populate the lookup

* Don't conditionalize lookups when we know we need them

* Remove bashism in vault_exec

We currently uase "bash -c" inside vault_exec. This only works when
using UBI-based images. Let's move to 'sh -c' to be a bit more robust
in case we're in the presence of the upstream vault images which do not
have bash installed.

Tested a 'vault-init' on UBI images and it correctly worked with no
errors whatsoever in the log.

* Add namespace support to the regional gitops installations

This allows argo on regional clusters to have more rights.
Specifically this is needed to create the clusterbindingrole needed
for the k8s-external-secret operator.

As a first pass we'll use the '*' namespace. In a second iteration
we'll need to look at restricting that to openshift-gitops and the
namespace of the regional gitops instance. At this point of time
such a change is too invasive and is at risk of breaking existing
patterns.

Tested this on multicloud gitops and I correctly get argo to create
the clusterrolebinding in the k8s-external-secret:

  $ oc get clusterrolebinding | grep k8s-extern
  k8s-external-secrets-kubernetes-external-secrets          ClusterRole/k8s-external-secrets-kubernetes-external-secrets
  k8s-external-secrets-kubernetes-external-secrets-auth     ClusterRole/system:auth-delegator

Previously this would fail with:
  Cluster level CustomResourceDefinition "externalsecrets.kubernetes-client.io" can not be managed when in namespaced mode

* Add code to validate origin for install/upgrade

* Add better domain alternation logic and Makefile validation

* Stop using echo when returning a string in a function

This is not portable [1] and in some shells and also on bash depending
on the version [2] it may or may not automatically interpret switches (like -n).
Let's switch to printf which is the POSIX-blessed way of doing things
[3].

Tested this on my environment and was able to still do a vault-init
without any errors.

[1] https://wiki.bash-hackers.org/scripting/nonportable#echo_command
[2] https://stackoverflow.com/questions/11193466/echo-n-prints-n
[3] https://wiki.bash-hackers.org/commands/builtin/printf

* Add support for pushing the kube-root-ca.crt from the HUB to the managed clusters

By default this ACM templates is inactive and will only be activated if
asked explicitely via the .pushHubCA parameter.
It will pull the ca.crt field from the the kube-root-ca.crt ConfigMap on
the hub into a secret on the managed cluster. This will then be used
by the external-secrets pod so it can trust the https://vault-vault.apps.hub-domain...
API endpoint of the vault.

Tested with this change and once enabled via .pushHubCA the
kubernetes-external-secrets pod could correctly connect to the vault
running on the HUB (without this we'd get self-signed certs errors)

* Fix the TARGET_REPO calculation

* Fix common/ make test

Currently we fail in a bunch of tests. Let's fix these up so we'll be
able to introduce some unit testing CI soon.

Tested with:
  $ make test |grep FAIL
  Testing install chart (naked)
  Testing clustergroup chart (naked)
  Testing acm chart (naked)
  Testing install chart (normal)
  Testing clustergroup chart (normal)
  Testing acm chart (normal)

* Remove clusterselector for cases where we want the vault-ca installed on the hub cluster as well

* Fix common/ make test

Currently we fail in a bunch of tests. Let's fix these up so we'll be
able to introduce some unit testing CI soon.

Tested with:
  $ make test |grep FAIL
  Testing install chart (naked)
  Testing clustergroup chart (naked)
  Testing acm chart (naked)
  Testing install chart (normal)
  Testing clustergroup chart (normal)
  Testing acm chart (normal)

* Replicate validatedpatterns/multicloud-gitops#36

* Remove policy to deploy vault CA as unnecessary

* Changes to vault-utils to support vault/external-secrets combo

* Rename the namespace and serviceaccounts to the name of the new golang-based external secrets operator

We're moving to the newer golang-based external secrets operator at https://external-secrets.io/
To be more explicit about our intention we name namespaces and
serviceaccounts golang-external-secrets. Let's rename it in the hub
config as well.

Tested with the other related golang changes and everything worked as
expected.

* Add golang-external-secrets chart

* Add script to push vault secrets

* Fix test error in clustergroup example

This fixed the following:
Testing clustergroup chart (naked)
--- tests/clustergroup-naked.expected.yaml      2022-02-27 18:06:11.474410537 +0100
+++ tests/.clustergroup-naked.expected.yaml     2022-02-27 19:29:17.534554450 +0100
@@ -61,6 +61,7 @@
   # Changing the name affects the ClusterRoleBinding, the generated secret,
   # route URL, and argocd.argoproj.io/managed-by annotations
   name: example-gitops
+  namespace: common-example
   annotations:
     argocd.argoproj.io/compare-options: IgnoreExtraneous
 spec:
FAIL on clustergroup naked with opts []
make: *** [Makefile:34: test] Error 1

* Fix acm naked example test

Currently errors out with:
Testing acm chart (naked)
--- tests/acm-naked.expected.yaml       2022-02-27 18:06:11.474410537 +0100
+++ tests/.acm-naked.expected.yaml      2022-02-27 19:38:40.898341311 +0100
@@ -2,13 +2,6 @@
 # Source: acm/templates/policies/application-policies.yaml
 # TODO: Also create a GitOpsCluster.apps.open-cluster-management.io
 ---
-# Source: acm/templates/policies/hub-certificate-authority.yaml
-# We only push the hub CA to the regional clusters when the user explicitely tells us so
-# This template fetches "ca.crt" from the "kube-root-ca.crt" configMap from the hub
-# (this configmap is present in all namespaces) and puts it in the vault-ca secret inside
-# the k8s-external-secrets namespace so the external-secrets pod knows how to trust
-# the https://vault-vault.apps.hub-domain... endpoint
----
 # Source: acm/templates/multiclusterhub.yaml
 apiVersion: operator.open-cluster-management.io/v1
 kind: MultiClusterHub
FAIL on acm naked with opts []
make: *** [Makefile:34: test] Error 1

This was just a leftover for when we removed the hub-ca app that pushed
it around to managed clusters.

* Fix tests/clustergroup-normal.expected.yaml test

* Add initial checking github action on every pull/push

At the moment this only runs "make test".
We'll later expand this to do some additional linting.

* Add a helmlint target that runs helm lint over the charts

❯ make helmlint
==> Linting install
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed
==> Linting clustergroup
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed
==> Linting acm
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed

* Run make helmlint on every push/pull request

* Add golang-external-secrets to the charts being tested

* Add right params to helmlint

❯ make helmlint
==> Linting install
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed
==> Linting clustergroup
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed
==> Linting acm
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed
==> Linting golang-external-secrets
[INFO] Chart.yaml: icon is recommended
[WARNING] /home/michele/Engineering/cloud-patterns/common/golang-external-secrets: chart directory is missing these dependencies: external-secrets

1 chart(s) linted, 0 chart(s) failed

* Move make_common_subtree.sh to common from multicloud-gitops

* Updated values-datacenter example values file with working content

* adding working information to values-global

* updated readme with useable Makefile

* Removed previous version of common to convert to subtree from https://github.com/hybrid-cloud-patterns/common.git main

* updating options in values-datacenter

* generalizing README

Co-authored-by: Andrew Beekhof <andrew@beekhof.net>
Co-authored-by: Martin Jackson <martjack@redhat.com>
Co-authored-by: Martin Jackson <mhjacks@redhat.com>
Co-authored-by: Lester Claudio <claudiol@redhat.com>
Co-authored-by: Wolfgang Kulhanek <WolfgangKulhanek@gmail.com>
Co-authored-by: Michele Baldessari <michele@acksyn.org>
Co-authored-by: day0hero <jonny@redhat.com>
mbaldessari added a commit to validatedpatterns/medical-diagnosis that referenced this issue Mar 22, 2022
* Add missing file

* kustomize debugging

* The helm-with-kustomize example is useful to keep around

* Additional help targets

* Support loading just the site application on install

* Make it easy to drive the bootstrap option from the helper

* Make it easy to kustomize the secrets location from the helper

* Limit helm's template name to under 53 chars

* Create the application name ordered by significance

* Site name is included in the site's .Release.Name

* Try to stay under helm's name template max of 53 chars

* Fix the application role binding to refer to the actual policy

* Use the managed site name for the application and values file

* Enforce quoting of helmOverrides for acm policy applications

* Provide a default path to the site chart

* Drop references to manuela in common

* Allow namespaces without operatorgroups for charts that already include them

* Add argocd secret handling

* Add argosecret target

* Remove unneeded util dir

* Explain how to use common makefile

* Allow namespaces without operatorgroups for charts that already include them

* Add template toplevel makefile

* Allow templates to know which namespace they're being deployed into

* Refresh the kustomize-renderer example

* Refresh the sample datacenter from manuela

* Don't assume the install symlink is present

* Attempt new template format

* Ensure password has length before exiting wait loop

* Replace makefile template and make embedded shell wait for password to have length as well as 0 exit

* Add then

* Make script explain what it's doing

* make output clearer

* Revert "Attempt new template format"

This reverts commit c463cbc.

* Try putting applications into a dedicated argo instance

* All namespaces are created by the site chart

* Fix default applicaiton namespace handling

* Make the pattern name availble to argo applications

* Give preference to whatever was specified in the secrets file

* Strip off any auth tokens when calculating the target repo

* Ensure there are no spaces in the namespace

* Fix namespace for applications

* Fix namespace for application acm policies

* Include the pattern name to ensure uniqueness

* Update repo name

* Try putting applications into a dedicated argo instance

* All namespaces are created by the site chart

* Fix default applicaiton namespace handling

* Make the pattern name availble to argo applications

* Give preference to whatever was specified in the secrets file

* Strip off any auth tokens when calculating the target repo

* Ensure there are no spaces in the namespace

* Fix namespace for applications

* Fix namespace for application acm policies

* Include the pattern name to ensure uniqueness

* Fix the destination namespace for datacenter manifests

* Try a simpler argo name

* Use a shorter namespace

* acm: Fix the target namespace for the site application policy

* Fix application namespace

* The application name is already unique to the namespace

* Restore gitops to the name of the argocd CR

* Match the service accounts to the argocd name

* Document what argocd values need to be kept in sync

* Updated note regarding argo name

* Change the default 'name' label for namespaces

* Update common Makefile to have more parameters for secret making

* Re-factor install to not require .ONESHELL features as Mac doesn't support them out of the box

* Update Makefile doc and SECRET_NAME parameter

* Don't hardcode SECRET_NAME here

* Move script to scripts dir and call it from there

* New s3 secrets file for central-s3 support

* Took care of merge conflicts with s3-secret.yaml

* Adding functionalist to have a list of namespaces for a particular subscription

* Enhance compatibility with older versions of git

* Trim the example datacenter site

* Support real helm charts too

* Adds the if control to force booleans to be string type for argo on helm/vault overrides

* Improved secrets handling in pipelines (#10)

* Update common with new secret management

* Add values-global to Makefile

Co-authored-by: Wolfgang Kulhanek <WolfgangKulhanek@gmail.com>

* Add note regarding tekton annotation

* Add note regarding tekon annotation

* Ensure updated secret template variables are defined

* Missing template variable

* Update values.yaml

* Avoid assumptions about the site being called datacenter leaking into patterns

* Add missing template variables

* Standardize on Values.secrets for usernames as well as passwords

* Sync the example global values file with the i-e pattern

* Sync the plugin example application with the i-e pattern

* Ensure helm options stay in sync and add a simple test

* Make the test more stable and add missing values

* Extend the unit tests to the acm and site charts

* Ensure the global.repoURL variable is set consistently in tests

* Add some elements to .gitignore

* Fix whitespace in repoURL value in a POSIX-friendly way

* Remove manuela-specific elements and secrets

* Modify tests to match removal of secrets and namespace

* Remove cruft from makefile

* Loosen regex to extract target repo

* Add structure for vault

* Remove vault subdir to prep for alternate inclusion

* Squashed 'vault/' content from commit 9fa25e9

git-subtree-dir: vault
git-subtree-split: 9fa25e97c806073c7dd3274a851181cbb3d67868

* Change site to clustername to allow for multiple clusters in a config group

* Remove staging and adjust tests to reflect that

* Update examples for recent cleanups

* Support ocp authentication for namespaced argos

* Update examples for recent cleanups

* Support ocp authentication for namespaced argos

* Make sure that argo can manage cluster-wide resources from the relevant namespaces

Otherwise we can error out with:

  Cluster level ClusterRoleBinding vault-server-binding cannot be managed
  when in namespaced mode

By setting ARGOCD_CLUSTER_CONFIG_NAMESPACES to the namespaces where
argocd instance is installed we allow it to create cluster wide
resources for all namespaces.

Tested this and now I am correctly able to invoke the vault helm chart
from argo without the above error.

References:
- argoproj/argo-cd#5886
- argoproj-labs/argocd-operator#385

* Add some vault utilities and add a gitignore entry

* Add Makefile target to init and unseal vault

* Add an unseal target and provide it a default in the script

* Initial import of chart from 9348383 of https://github.com/external-secrets/external-secrets

* Remove vault and external secrets - we can install from helm directly

* Protect ourselves from calling vault-init twice

Without this we risk to easily overwrite the seal + token file and hence
lose future access to the vault.

* Add script for copying vault-token

* Add Hub cluster domain as a global variable based on ingress config

* Add code to extract root token

* Add function to wrap an exec including the token

* Add pki init to vault init makefile target

* Expand the PKI domain (knock off the first two domains for the hub cluster, e.g. apps and the next one up to allow the PKI to be bigger

* Correct pki role and domain creation

* Add more functions for secrets management

* pki init is done in vault_init, no need to have a separate makefile task

* Fix the name of the function to initialize the kubernetes backend

Otherwise we'll error out with:
common/scripts/vault-utils.sh: line 85: vault_k8s_init: command not found

* Add --interactive to the oc vault exec calls

This allows us to read stdin and push a file via stdin.
This is particularly useful when configuring the vault policy

* Add a policy init function to setup initial policy for the vault

* Add variable qualification to prevent helm template errors

* Add vault-route to workaround hard-coding of passthrough mode in vault helm chart 0.18.0

* Correct route resource, remove namespace and spell variable correctly

* Fix TTL lease typo in vault-init

Current wrong command:
bash-4.4$ vault secrets tune -max-lease=8760h pki
flag provided but not defined: -max-lease

Add -ttl at the end to fix it:
bash-4.4$ vault secrets tune --max-lease-ttl=8760h pki
Success! Tuned the secrets engine at: pki/

* Remove extra duplicate subcription YAML and force quoting in application install for consistency

* Add local domain to ACM policy

* Propogate localdomain setting to non-privileged argo

* Fix some tests

* Fix remaining tests

* Remove manuela tag from clustergroup chart

* Add extra framework options to level with clustergroup implementation

* Remove vault-route application

Now that vault-helm v0.19.0 has been released we can use it directly to
create the vault route with tls.termination set to 'edge', hence this is
not needed anymore.

* Remove vault-route application

Now that vault-helm v0.19.0 has been released we can use it directly to
create the vault route with tls.termination set to 'edge', hence this is
not needed anymore.

* Supply hubClusterDomain for localHubCluster where we don't have ACM to populate the lookup

* Don't conditionalize lookups when we know we need them

* Remove bashism in vault_exec

We currently uase "bash -c" inside vault_exec. This only works when
using UBI-based images. Let's move to 'sh -c' to be a bit more robust
in case we're in the presence of the upstream vault images which do not
have bash installed.

Tested a 'vault-init' on UBI images and it correctly worked with no
errors whatsoever in the log.

* Add namespace support to the regional gitops installations

This allows argo on regional clusters to have more rights.
Specifically this is needed to create the clusterbindingrole needed
for the k8s-external-secret operator.

As a first pass we'll use the '*' namespace. In a second iteration
we'll need to look at restricting that to openshift-gitops and the
namespace of the regional gitops instance. At this point of time
such a change is too invasive and is at risk of breaking existing
patterns.

Tested this on multicloud gitops and I correctly get argo to create
the clusterrolebinding in the k8s-external-secret:

  $ oc get clusterrolebinding | grep k8s-extern
  k8s-external-secrets-kubernetes-external-secrets          ClusterRole/k8s-external-secrets-kubernetes-external-secrets
  k8s-external-secrets-kubernetes-external-secrets-auth     ClusterRole/system:auth-delegator

Previously this would fail with:
  Cluster level CustomResourceDefinition "externalsecrets.kubernetes-client.io" can not be managed when in namespaced mode

* Add code to validate origin for install/upgrade

* Add better domain alternation logic and Makefile validation

* Stop using echo when returning a string in a function

This is not portable [1] and in some shells and also on bash depending
on the version [2] it may or may not automatically interpret switches (like -n).
Let's switch to printf which is the POSIX-blessed way of doing things
[3].

Tested this on my environment and was able to still do a vault-init
without any errors.

[1] https://wiki.bash-hackers.org/scripting/nonportable#echo_command
[2] https://stackoverflow.com/questions/11193466/echo-n-prints-n
[3] https://wiki.bash-hackers.org/commands/builtin/printf

* Add support for pushing the kube-root-ca.crt from the HUB to the managed clusters

By default this ACM templates is inactive and will only be activated if
asked explicitely via the .pushHubCA parameter.
It will pull the ca.crt field from the the kube-root-ca.crt ConfigMap on
the hub into a secret on the managed cluster. This will then be used
by the external-secrets pod so it can trust the https://vault-vault.apps.hub-domain...
API endpoint of the vault.

Tested with this change and once enabled via .pushHubCA the
kubernetes-external-secrets pod could correctly connect to the vault
running on the HUB (without this we'd get self-signed certs errors)

* Fix the TARGET_REPO calculation

* Fix common/ make test

Currently we fail in a bunch of tests. Let's fix these up so we'll be
able to introduce some unit testing CI soon.

Tested with:
  $ make test |grep FAIL
  Testing install chart (naked)
  Testing clustergroup chart (naked)
  Testing acm chart (naked)
  Testing install chart (normal)
  Testing clustergroup chart (normal)
  Testing acm chart (normal)

* Remove clusterselector for cases where we want the vault-ca installed on the hub cluster as well

* Fix common/ make test

Currently we fail in a bunch of tests. Let's fix these up so we'll be
able to introduce some unit testing CI soon.

Tested with:
  $ make test |grep FAIL
  Testing install chart (naked)
  Testing clustergroup chart (naked)
  Testing acm chart (naked)
  Testing install chart (normal)
  Testing clustergroup chart (normal)
  Testing acm chart (normal)

* Replicate validatedpatterns/multicloud-gitops#36

* Remove policy to deploy vault CA as unnecessary

* Changes to vault-utils to support vault/external-secrets combo

* Rename the namespace and serviceaccounts to the name of the new golang-based external secrets operator

We're moving to the newer golang-based external secrets operator at https://external-secrets.io/
To be more explicit about our intention we name namespaces and
serviceaccounts golang-external-secrets. Let's rename it in the hub
config as well.

Tested with the other related golang changes and everything worked as
expected.

* Add golang-external-secrets chart

* Add script to push vault secrets

* Fix test error in clustergroup example

This fixed the following:
Testing clustergroup chart (naked)
--- tests/clustergroup-naked.expected.yaml      2022-02-27 18:06:11.474410537 +0100
+++ tests/.clustergroup-naked.expected.yaml     2022-02-27 19:29:17.534554450 +0100
@@ -61,6 +61,7 @@
   # Changing the name affects the ClusterRoleBinding, the generated secret,
   # route URL, and argocd.argoproj.io/managed-by annotations
   name: example-gitops
+  namespace: common-example
   annotations:
     argocd.argoproj.io/compare-options: IgnoreExtraneous
 spec:
FAIL on clustergroup naked with opts []
make: *** [Makefile:34: test] Error 1

* Fix acm naked example test

Currently errors out with:
Testing acm chart (naked)
--- tests/acm-naked.expected.yaml       2022-02-27 18:06:11.474410537 +0100
+++ tests/.acm-naked.expected.yaml      2022-02-27 19:38:40.898341311 +0100
@@ -2,13 +2,6 @@
 # Source: acm/templates/policies/application-policies.yaml
 # TODO: Also create a GitOpsCluster.apps.open-cluster-management.io
 ---
-# Source: acm/templates/policies/hub-certificate-authority.yaml
-# We only push the hub CA to the regional clusters when the user explicitely tells us so
-# This template fetches "ca.crt" from the "kube-root-ca.crt" configMap from the hub
-# (this configmap is present in all namespaces) and puts it in the vault-ca secret inside
-# the k8s-external-secrets namespace so the external-secrets pod knows how to trust
-# the https://vault-vault.apps.hub-domain... endpoint
----
 # Source: acm/templates/multiclusterhub.yaml
 apiVersion: operator.open-cluster-management.io/v1
 kind: MultiClusterHub
FAIL on acm naked with opts []
make: *** [Makefile:34: test] Error 1

This was just a leftover for when we removed the hub-ca app that pushed
it around to managed clusters.

* Fix tests/clustergroup-normal.expected.yaml test

* Add initial checking github action on every pull/push

At the moment this only runs "make test".
We'll later expand this to do some additional linting.

* Add a helmlint target that runs helm lint over the charts

❯ make helmlint
==> Linting install
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed
==> Linting clustergroup
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed
==> Linting acm
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed

* Run make helmlint on every push/pull request

* Add golang-external-secrets to the charts being tested

* Add right params to helmlint

❯ make helmlint
==> Linting install
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed
==> Linting clustergroup
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed
==> Linting acm
[INFO] Chart.yaml: icon is recommended

1 chart(s) linted, 0 chart(s) failed
==> Linting golang-external-secrets
[INFO] Chart.yaml: icon is recommended
[WARNING] /home/michele/Engineering/cloud-patterns/common/golang-external-secrets: chart directory is missing these dependencies: external-secrets

1 chart(s) linted, 0 chart(s) failed

* Move make_common_subtree.sh to common from multicloud-gitops

* Find out charts dynamically

* Fix examples values to use imageregistry

This avoids the following helm error:

Error: template: example/templates/environment.yaml:6:28: executing "example/templates/environment.yaml" at <.Values.global.imageregistry.hostname>: nil pointer evaluating interface {}.hostname

* Add naked and normal expected.yaml files for examples-kustomize-renderer chart

Now that we find charts dynamically we need to make sure this chart is
covered as well

* Add proper naked+normal golang-secrets expected yaml files

* Add external-secrets dependency chart to golang-external-secrets

Otherwise any linting will complain and/or fail. We add
external-secrets-0.4.2.tgz to golang-external-secrets/charts/

This was done with: helm dependency update golang-external-secrets

* Make sure we fail if helm template errors out

We need to explicitely error out when helm template fails so we catch
any errors during templating itself

* Add global hubClusterDomain to golang-external-secrets values

This makes it so that we have a default value and helm templating
won't barf

* Remove Makefile.toplevel

I have not seen it referenced in any other file nor repo in our org.

* Update secret.sh from Industrial Edge

Via 465b1c40 in Industrial Edge we updated common/scripts/secret.sh in
the IE repo only.
Let's bring those changes back into common and while we're at it, let's
make the indenting more consistent.

* Remove old vault-token.sh script

It is not used anywhere any longer:
$ grep -ir vault-token industrial-edge multicloud-gitops common medical-diagnosis
$

* Remove init target

Common is not a submodule any longer. No point in keeping this around

* Add load-secrets target

* Switch to mustonlyhave complianceType in templates pushed out by ACM

As somewhat specified in [1] there are different complianceTypes and
'musthave' does not imply that the object being applied is *identical*
to what is specified in the policy. Namely the docs for 'musthave' state:

  "The other fields in the template are a subset of what exists in the object."

The actual consequence on real deployment of 'musthave' is the
following:
* Existing object
- foo
  bar:
  - a
  - b

If the above template gets changed in ACM:
- foo
  bar:
  - d
  - e

The end result in case of 'musthave' complianceType will be:
- foo
  bar:
  - a
  - b
  - d
  - e

So let's switch to complianceType 'mustonlyhave' which actually
enforces the full object:

  Indicates that an object must exist with the exact name and relevant fields.

Tested this on my multicloud environment and I could observe that
changes to the ACM policy templates started properly replacing objects
on the remote clusters.

[1] https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html-single/governance/index#configuration-policy-yaml-table

Closes: MBP-199

* Removed previous version of common to convert to subtree from https://github.com/hybrid-cloud-patterns/common.git main

Co-authored-by: Andrew Beekhof <andrew@beekhof.net>
Co-authored-by: Martin Jackson <martjack@redhat.com>
Co-authored-by: Martin Jackson <mhjacks@redhat.com>
Co-authored-by: Lester Claudio <claudiol@redhat.com>
Co-authored-by: jrickard <jonathan.m.rickard@gmail.com>
Co-authored-by: Wolfgang Kulhanek <WolfgangKulhanek@gmail.com>
@crenshaw-dev
Copy link
Collaborator

Note that argocd-default-cluster-config is a feature of the Argo CD Operator.

If you're not using the Argo CD Operator but are using declarative cluster configs, use the clusterResource key. I've opened a PR to document that key.

@ruibinghao
Copy link

Thanks for saving my day as well since I ran into the same issue after standing up ArgoCD using Openshift GitOps operator. Just curious is there a parameter you can set to enable the cluster resources for the in-build cluster while bringing up a brand new operator for the first time? Or this is something you need to change afterwards?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants