Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Enable argocd to manage kube-prometheus-stack #1295

Merged
merged 13 commits into from
Jan 24, 2023

Conversation

jaydeland
Copy link
Contributor

What does this PR do?

🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

Motivation

I had to manually set the kube-prometheus-stack enable variable in my ArgoCD App of Apps config. This was defaulting to false and not being updated since the value wasn't past.

  • Resolves #

More

  • [YES] Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
  • [N/A ] Yes, I have added a new example under examples to support my PR
  • [NO ] Yes, I have created another PR for add-ons under add-ons repo (if applicable)
  • [ YES] Yes, I have updated the docs for this feature
  • [NO ] Yes, I ran pre-commit run -a with this PR

Note: Not all the PRs require a new example and/or doc page. In general:

  • Use an existing example when possible to demonstrate a new addons usage
  • A new docs page under docs/add-ons/* is required for new a new addon

For Moderators

  • [YES ] E2E Test successfully complete before merge?

Additional Notes

--- PASS: TestEksBlueprintsE2E (0.00s)
--- PASS: TestEksBlueprintsE2E/TF_PLAN_VALIDATION (31.20s)
--- PASS: TestEksBlueprintsE2E/TF_OUTPUTS_VALIDATION (1.86s)
--- PASS: TestEksBlueprintsE2E/TF_OUTPUTS_VALIDATION/vpc_cidr (0.39s)
--- PASS: TestEksBlueprintsE2E/TF_OUTPUTS_VALIDATION/vpc_private_subnet_cidr (0.37s)
--- PASS: TestEksBlueprintsE2E/TF_OUTPUTS_VALIDATION/vpc_public_subnet_cidr (0.37s)
--- PASS: TestEksBlueprintsE2E/TF_OUTPUTS_VALIDATION/eks_cluster_id (0.37s)
--- PASS: TestEksBlueprintsE2E/TF_OUTPUTS_VALIDATION/eks_managed_nodegroup_status (0.37s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION (4.52s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/MATCH_EKS_CLUSTER_NAME (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/MATCH_TOTAL_EKS_WORKER_NODES (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DEPLOYMENTS_VALIDATION (0.54s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DEPLOYMENTS_VALIDATION/MATCH_REPLICAS_VS_READY-REPLICAS/aws-load-balancer-controller (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DEPLOYMENTS_VALIDATION/UNAVAILABLE_REPLICAS/aws-load-balancer-controller (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DEPLOYMENTS_VALIDATION/MATCH_REPLICAS_VS_READY-REPLICAS/cluster-autoscaler-aws-cluster-autoscaler (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DEPLOYMENTS_VALIDATION/UNAVAILABLE_REPLICAS/cluster-autoscaler-aws-cluster-autoscaler (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DEPLOYMENTS_VALIDATION/MATCH_REPLICAS_VS_READY-REPLICAS/coredns (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DEPLOYMENTS_VALIDATION/UNAVAILABLE_REPLICAS/coredns (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DEPLOYMENTS_VALIDATION/MATCH_REPLICAS_VS_READY-REPLICAS/metrics-server (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DEPLOYMENTS_VALIDATION/UNAVAILABLE_REPLICAS/metrics-server (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DAEMONSETS_VALIDATION (0.39s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DAEMONSETS_VALIDATION/MATCH_DESIRED_VS_CURRENT_PODS/aws-node (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DAEMONSETS_VALIDATION/UNAVAILABLE_REPLICAS/aws-node (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DAEMONSETS_VALIDATION/MATCH_DESIRED_VS_CURRENT_PODS/kube-proxy (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DAEMONSETS_VALIDATION/UNAVAILABLE_REPLICAS/kube-proxy (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DAEMONSETS_VALIDATION/MATCH_DESIRED_VS_CURRENT_PODS/aws-cloudwatch-metrics (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_DAEMONSETS_VALIDATION/UNAVAILABLE_REPLICAS/aws-cloudwatch-metrics (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_SERVICES_VALIDATION (0.49s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_SERVICES_VALIDATION/SERVICE_STATUS/cluster-autoscaler-aws-cluster-autoscaler (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_SERVICES_VALIDATION/SERVICE_STATUS/kube-dns (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_SERVICES_VALIDATION/SERVICE_STATUS/kubernetes (0.00s)
--- PASS: TestEksBlueprintsE2E/EKS_ADDON_VALIDATION/EKS_SERVICES_VALIDATION/SERVICE_STATUS/metrics-server (0.00s)
--- PASS: TestEksBlueprintsE2E/eks-cluster-with-new-vpc (2950.38s)
PASS
ok github.com/aws-ia/terraform-aws-eks-blueprints/src 2950.770s

@jaydeland jaydeland requested a review from a team as a code owner January 6, 2023 19:46
@jaydeland jaydeland changed the title Enable argocd to manage kube-promtheus-stack fix: Enable argocd to manage kube-promtheus-stack Jan 6, 2023
@Zvikan
Copy link
Contributor

Zvikan commented Jan 12, 2023

hey @jaydeland , I believe additional changes are required for this to work with our argo approach.
Please take a look at https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/locals.tf#L8-L54 , we should also add a line there and I don't see any PRs in https://github.com/aws-samples/eks-blueprints-add-ons regards this, were you using this repo or your own version for testing?

The testing that you provided if for the E2E, can you please share details of your e2e testing showing that kube-prometheus-stack is enabled and deployed via argo? a picture of how argo looks like will be good enough.

@jaydeland
Copy link
Contributor Author

https://github.com/aws-samples/eks-blueprints-add-ons

@Zvikan - I have updated the locals.tf. Also the samples repo already has kube_prometheus_stack added: https://github.com/aws-samples/eks-blueprints-add-ons/blob/ee022fd998bb8faf035c3798729cc3758e63d9e5/chart/values.yaml#L218

I am using the samples repo as fork and had to hard code the enable variable to true since it wasn't being passed by blueprints.

@jaydeland jaydeland changed the title fix: Enable argocd to manage kube-promtheus-stack fix: Enable argocd to manage kube-prometheus-stack Jan 20, 2023
@jaydeland jaydeland temporarily deployed to EKS Blueprints Test January 24, 2023 17:01 — with GitHub Actions Inactive
@jaydeland
Copy link
Contributor Author

@Zvikan - any update on this PR?

Copy link
Contributor

@Zvikan Zvikan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, running CIs

@Zvikan Zvikan merged commit e2cb68c into aws-ia:main Jan 24, 2023
@sondrelg
Copy link
Contributor

Think this breaks the module, unfortunately. I'm getting the following error:

│ Error: Invalid index
│ 
│   on .terraform/modules/eks_blueprints_kubernetes_addons/modules/kubernetes-addons/locals.tf line 36, in locals:
│   36:     kubePrometheusStack       = var.enable_kube_prometheus_stack ? module.kube_prometheus_stack[0].argocd_gitops_config : null
│     ├────────────────
│     │ module.kube_prometheus_stack is empty tuple
│ 
│ The given key does not identify an element in this collection value: the collection has no elements.

image

I'm yet to absorb the context of this PR so sorry if I've jumped the gun here 👍

@sondrelg
Copy link
Contributor

sondrelg commented Jan 24, 2023

I see this adds a parameter that doesn't really affect us. I already have kube_prometheus_stack enabled as an addon and this breaks terraform apply.

I tried removing the new kube_prometheus_stack line from the file, but that makes terraform want to destroy the resource 🤷‍♂️

Any ideas how to work around this or fix?


UPDATE: It's of course as simple as reverting to an older version. Updating my source unblocked me

module "eks_blueprints_kubernetes_addons" {
    source = "github.com/aws-ia/terraform-aws-eks-blueprints//modules/kubernetes-addons?ref=v4.20.0"
    ...

@Zvikan
Copy link
Contributor

Zvikan commented Jan 26, 2023

thank you for reporting this @sondrelg ,taking a look and fixing asap.

@Zvikan
Copy link
Contributor

Zvikan commented Jan 26, 2023

@sondrelg thanks again - #1384 created, testing example e2e to ensure this is working with gitops.

@jaydeland FYI, this is a great example why my original ask for a proof of e2e is need for us to ensure everything is working.
I believe you understood the e2e is the terratest, we need to do better job at linking and explaining what E2E means so you can run it yourself (in short, is ensuring example works with terraform apply & destroy and you ran min checks to ensure everything is running as expected).

I'll take note and see what we can do better besides clarifying what is E2E, in this case it seems TF plan was not enough ( according to this).
Thanks both

@Zvikan
Copy link
Contributor

Zvikan commented Jan 26, 2023

@sondrelg @jaydeland I was trying to deploy kube stack prometheus but currently facing an issue where nothing is being deployed and in argo I can see the server could not find the requested resource , I need to spend more time checking why but previous experience tells me it might be related to CRDs.
Did any of you faced this issue before?

@jaydeland
Copy link
Contributor Author

Thank you @Zvikan !

@Zvikan
Copy link
Contributor

Zvikan commented Jan 26, 2023

Can confirm that after installing the CRDs manually (kubectl apply...) and then re-syncing the application in argo UI, everything is working properly.
Need to understand why Argo is not installing the addon CRDs first.

@jaydeland
Copy link
Contributor Author

jaydeland commented Jan 26, 2023

I am currently getting this error, but I haven't switched to using Blueprints to enable kube-prometheus-stack. My app of apps has the enabled flag set to true.

So I don't think this code change is causing the problem.

@sondrelg
Copy link
Contributor

Was the fix merged or is it still branched? I'd be happy to test on my repo 🙂

@jaydeland
Copy link
Contributor Author

jaydeland commented Jan 26, 2023

@sondrelg I believe you can follow this: https://github.com/argoproj/argo-helm/tree/main/charts/argo-cd#custom-resource-definitions

In my case, I also updated the helm chart version to 5.19.8 and as such needed to run

kubectl apply -k "https://github.com/argoproj/argo-cd/manifests/crds?ref=v2.5.7"

@jaydeland jaydeland deleted the kube-prometheus-argocd branch January 26, 2023 20:58
@Zvikan
Copy link
Contributor

Zvikan commented Jan 26, 2023

Was the fix merged or is it still branched? I'd be happy to test on my repo 🙂

I dont have a fix for the CRDs as of now besides suggesting. to install the CRDs manually before argo, the fix for the object argocd_gitops_config which was missing is part of #1384

I'll update once I better understand how to to make the CRDs be installed by Argo first.

@jaydeland
Copy link
Contributor Author

@Zvikan - Reinstalling the CRDs did not fix the problem for me ...

@Zvikan
Copy link
Contributor

Zvikan commented Jan 26, 2023

Quick update - #1384 merged, you can pull and test it again and the failure with argocd_gitops_config object doesnt exists is fixed now.
As for the CRDs issue with Argo - I found this aws-samples/eks-blueprints-add-ons#97 , same issue I was explaining, answer is the pretty much the same, either CRD use replace or install them manually and then re-sync.

What is the issue you are facing @jaydeland ? please provide screenshot and details so I can better help you.

@jaydeland
Copy link
Contributor Author

I have upgrade Argo Helm to v5.19.8 (ArgoCD 2.5.8) and reinstalled the CRDS:

kubectl apply -k "https://github.com/argoproj/argo-cd/manifests/crds?ref=v2.5.8"
customresourcedefinition.apiextensions.k8s.io/applications.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/applicationsets.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/appprojects.argoproj.io unchanged

Still getting:

one or more synchronization tasks are not valid (retried 1 times).

@jaydeland
Copy link
Contributor Author

Oh - looks like its the kube-prometheus-stack CRDs

@Zvikan
Copy link
Contributor

Zvikan commented Jan 26, 2023

Yes, sorry if I wasn't clear, when I was saying the CRDs, I was talking about this specific addon - kube-prometheus-stack => https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#from-43x-to-44x

@jaydeland
Copy link
Contributor Author

I fixed this by adding the details from this post:
https://blog.ediri.io/kube-prometheus-stack-and-argocd-23-how-to-remove-a-workaround

To the eks_blueprints_addons/chart/templates/kube-prometheus-stck.yaml file

{{- if and (.Values.kubePrometheusStack) (.Values.kubePrometheusStack.enable) -}}
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kube-prometheus-stack
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: {{ .Values.repoUrl }}
    path: add-ons/kube-prometheus-stack
    targetRevision: {{ .Values.targetRevision }}
    helm:
      skipCrds: true
      values: |
        kube-prometheus-stack:
          prometheus:
            prometheusSpec:
              podMonitorSelectorNilUsesHelmValues: false
              serviceMonitorSelectorNilUsesHelmValues: false
          grafana:
            podLabels:
              sidecar.istio.io/inject: true
        {{- toYaml .Values.kubePrometheusStack | nindent 10 }}
  destination:
    server: https://kubernetes.default.svc
    namespace: kube-prometheus-stack
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true
    retry:
      limit: 1
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 1m
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kube-prometheus-stack-crds
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
  annotations:
    argocd.argoproj.io/sync-wave: "2"
spec:
  destination:
    name: in-cluster
    namespace: kube-prometheus-stack
  project: default
  source:
    repoURL: https://github.com/prometheus-community/helm-charts.git
    path: charts/kube-prometheus-stack/crds/
    targetRevision: kube-prometheus-stack-44.3.0
    directory:
      recurse: true
  syncPolicy:
    syncOptions:
      - CreateNamespace=true
      - Replace=true
    automated:
      prune: true
      selfHeal: true
{{- end -}}

@Zvikan
Copy link
Contributor

Zvikan commented Jan 27, 2023

I fixed this by adding the details from this post: https://blog.ediri.io/kube-prometheus-stack-and-argocd-23-how-to-remove-a-workaround

To the eks_blueprints_addons/chart/templates/kube-prometheus-stck.yaml file

{{- if and (.Values.kubePrometheusStack) (.Values.kubePrometheusStack.enable) -}}
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kube-prometheus-stack
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: {{ .Values.repoUrl }}
    path: add-ons/kube-prometheus-stack
    targetRevision: {{ .Values.targetRevision }}
    helm:
      skipCrds: true
      values: |
        kube-prometheus-stack:
          prometheus:
            prometheusSpec:
              podMonitorSelectorNilUsesHelmValues: false
              serviceMonitorSelectorNilUsesHelmValues: false
          grafana:
            podLabels:
              sidecar.istio.io/inject: true
        {{- toYaml .Values.kubePrometheusStack | nindent 10 }}
  destination:
    server: https://kubernetes.default.svc
    namespace: kube-prometheus-stack
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true
    retry:
      limit: 1
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 1m
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kube-prometheus-stack-crds
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
  annotations:
    argocd.argoproj.io/sync-wave: "2"
spec:
  destination:
    name: in-cluster
    namespace: kube-prometheus-stack
  project: default
  source:
    repoURL: https://github.com/prometheus-community/helm-charts.git
    path: charts/kube-prometheus-stack/crds/
    targetRevision: kube-prometheus-stack-44.3.0
    directory:
      recurse: true
  syncPolicy:
    syncOptions:
      - CreateNamespace=true
      - Replace=true
    automated:
      prune: true
      selfHeal: true
{{- end -}}

This looks promising, I am going to give this a try and if it works well I'll create a PR.
Thank you for sharing this @jaydeland !

vara-bonthu pushed a commit that referenced this pull request Feb 2, 2023
Co-authored-by: Zvika Nadav <zvi8875@gmail.com>
gminiba pushed a commit to gminiba/terraform-aws-eks-blueprints that referenced this pull request Mar 17, 2023
Co-authored-by: Zvika Nadav <zvi8875@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants