Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argocd app sync/diff --local doesn't account for sidecar CMPs #8145

Open
crenshaw-dev opened this issue Jan 11, 2022 · 18 comments · Fixed by #10019
Open

argocd app sync/diff --local doesn't account for sidecar CMPs #8145

crenshaw-dev opened this issue Jan 11, 2022 · 18 comments · Fixed by #10019
Assignees
Labels
bug Something isn't working component:cmp Config Management Plugin related issues

Comments

@crenshaw-dev
Copy link
Collaborator

Describe the bug

argocd app diff <appname> --local . fails when <appname> is handled by a sidecar CMP. Specifically, it fails with this error:

FATA[0000] config management plugin with name '' is not supported 

argocd app sync <appname> --local . fails with the same error.

To Reproduce

  1. Install a sidecar CMP. The one from the docs works fine.
  2. Add an app which uses that CMP.
  3. Run argocd app diff <appname> --local . from the directory in the git clone where the app is defined.

Expected behavior

I would expect a diff to be generated. Since the example CMP always produces the same manifest, I'd expect the diff to be empty.

Version

argocd: v2.1.6+a346cf9.dirty
  BuildDate: 2021-11-01T02:05:06Z
  GitCommit: a346cf933e10d872eae26bff8e58c5e7ac40db25
  GitTreeState: dirty
  GoVersion: go1.17.2
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v2.1.6+a346cf9.dirty
  BuildDate: 2021-11-01T02:05:06Z
  GitCommit: a346cf933e10d872eae26bff8e58c5e7ac40db25
  GitTreeState: dirty
  GoVersion: go1.17.2
  Compiler: gc
  Platform: darwin/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: v4.4.0 2021-09-27T16:13:36Z
  Helm Version: v3.7.1+g1d11fcb
  Kubectl Version: v0.21.0
  Jsonnet Version: v0.17.0
@crenshaw-dev crenshaw-dev added the bug Something isn't working label Jan 11, 2022
@crenshaw-dev
Copy link
Collaborator Author

Proposal: server-side diffing for local code

Local diff/syncing is currently implemented by running repo-server code on the CLI's host machine. It's the user's responsibility to install the necessary CLIs (like Helm or Kustomize).

We shouldn't expect users to configure their local machine to run the same CMPs as those installed on the repo-server. Instead, we should have the CLI send the user's local files to the repo-server (via the Argo CD API), and have the repo-server send back the generated diff.

Steps

  1. Implement the necessary CLI/API/repo-server components to perform server-side diffing.

    1. CLI: Add a --server-side-generate flag (defaulted to false).

    2. CLI: Print a warning whenever --local is used without --server-side-generate.

      WARNING: Client-side local diffing/syncing does not support sidecar CMPs and is DEPRECATED in vX.Y.Z. Switch to using `--server-side-generate`. Server-side local diffing/syncing will be default behavior in vX.Y+1.Z.
      
      
    3. CLI: Add a --local-include argument accepting a glob pattern (or maybe regex?) to determine which files are sent to the repo-server. If using globs, accept multiple (and union whatever they produce). Use a reasonable default like [*.yaml, *.json, etc.] that will exclude likely-unnecessary files (like .go, .java, etc.).

    4. CLI: Add a warning before sending files for diffing.

      WARNING: The following files will be sent to the Argo CD server for manifest generation.
      <files>
      Continue (use --skip-files-warning to disable this check)? Y/n
      
  2. CLI: Add a --skip-files-warning flag in case the user needs a non-interactive diff/sync.

  3. CLI: Add the code to actually send the files. This will probably involve selecting some archive format and a golang library to perform the archiving.

  4. API: Add/update a diff/sync endpoint to accept user files (as an archive) and send that to repo-server for manifest generation.

    Be careful about what you log. Don't want to inadvertently log a secret that the user accidentally sent with their source files.

  5. API: Add a configurable max size limit for the user's request (if that feature isn't already present). Reject requests that are too large.

  6. repo-server: Add/update a manifest generation endpoint to accept the user's files as an archive. Implement logic to 1) copy the app's current source, 2) unpack the user-provided archive over the source, 3) generate the manifests, and 4) delete the source copy.

    Read the archive library docs carefully the ensure we're not introducing security vulnerabilities (like directory traversal) to the repo-server.

    Be careful about what you log. Don't want to inadvertently log a secret that the user accidentally sent with their source files.

  7. In version Y+1, delete the client-side local diff/sync code and default to server-side.

Security considerations

Noted above, but repeated here:

  1. Don't accept oversize requests.
  2. Empower the user to limit what files they send and inspect the list before actually sending them.
  3. As an extra precaution, be careful not to log the contents of the user's files, in case they accidentally sent something sensitive.
  4. Use the archive library carefully to avoid things like directory traversal.

@alexmt alexmt added this to the v2.4 milestone Jan 31, 2022
@leoluz leoluz self-assigned this Feb 2, 2022
@alexmt alexmt modified the milestones: v2.4, v2.5 Jun 21, 2022
notfromstatefarm added a commit to notfromstatefarm/argo-cd that referenced this issue Jul 17, 2022
Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>
crenshaw-dev pushed a commit that referenced this issue Aug 17, 2022
* feat: server-side manifest generation for diff (#8145)

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* fix docs, mocks, ineffectual err

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* fix CMPs, ineffectual err

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* refactor

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* add unit tests

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* handle err

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* add size limits and inclusion filters

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* fix docs

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* fix errors, increase defaults

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* use quantity, wrap errors, add security fields to logs, deprecation warning

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* have e2e test use server side generation

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* nits

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* remove unused import

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* fix merge conflict

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* fix conflicts

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* fix e2e test

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* add deprecation/breaking change info

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* remove security logging stuff, will be in a separate PR

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* more specific docs

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

* add security logging

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>

Signed-off-by: notfromstatefarm <86763948+notfromstatefarm@users.noreply.github.com>
@crenshaw-dev crenshaw-dev reopened this Oct 13, 2022
@crenshaw-dev
Copy link
Collaborator Author

Not totally fixed. We should do this: #10936

@crenshaw-dev crenshaw-dev modified the milestones: v2.5, v2.6 Oct 13, 2022
@joshuasimon-taulia
Copy link

i'm seeing this error

FATA[0002] rpc error: code = Unknown desc = error receiving manifest file stream: error receiving tgz file: file checksum validation error: calc e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 sent f1b4f206df2ea9a29fa819c5a2c6bdc883cd79433bbf9ad36dd11e600306d4e7

when running argocd app diff jx-preprod --server-side-generate --local "$PWD" from the directory that contains my helmfile.yaml. my app uses this helmfile CMP

this is the argocd application object

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  creationTimestamp: "2022-10-29T22:04:37Z"
  generation: 2847
  name: jx-preprod
  namespace: argocd
  ownerReferences:
  - apiVersion: argoproj.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: ApplicationSet
    name: dev
    uid: 8f4d596e-8ffd-452c-b2d0-f4a5dedbf2e1
  resourceVersion: "724952059"
  uid: 3ee646fa-67f2-4787-ab73-6e34bdbb1dbc
spec:
  destination:
    namespace: jx
    server: https://kubernetes.default.svc
  ignoreDifferences:
  - group: admissionregistration.k8s.io
    jqPathExpressions:
    - .webhooks[].namespaceSelector.matchExpressions[] | select(.key == "control-plane")
    kind: MutatingWebhookConfiguration
    name: istio-sidecar-injector
  project: dev
  source:
    path: helmfiles/jx
    plugin:
      env:
      - name: HELMFILE_USE_CONTEXT_NAMESPACE
        value: "true"
      - name: HELM_TEMPLATE_OPTIONS
        value: --skip-tests
      name: helmfile
    repoURL: https://github.com/my-org/jenkins-x-dev
    targetRevision: HEAD
  syncPolicy:
    syncOptions:
    - CreateNamespace=true
  sourceType: Plugin
  summary:
    externalURLs:
    - https://bucketrepo-jx.dev.my-org.com
    - https://dashboard-jx.dev.my-org.com
    - https://hook-jx.dev.my-org.com/zzhookzz
    - https://lighthouse-jx.dev.my-org.com
    images:
    - chartmuseum/chartmuseum:v0.12.0
    - docker.io/bitnami/external-dns:0.11.1-debian-10-r23
    - ghcr.io/jenkins-x-plugins/jx-preview:0.0.225
    - ghcr.io/jenkins-x/bucketrepo:0.1.67
    - ghcr.io/jenkins-x/jx-boot:3.7.8
    - ghcr.io/jenkins-x/jx-build-controller:0.4.7
    - ghcr.io/jenkins-x/jx-pipelines-visualizer:1.8.2
    - ghcr.io/jenkins-x/jx-slack:0.2.1
    - ghcr.io/jenkins-x/lighthouse-foghorn:1.10.6
    - ghcr.io/jenkins-x/lighthouse-gc-jobs:1.10.6
    - ghcr.io/jenkins-x/lighthouse-keeper:1.10.6
    - ghcr.io/jenkins-x/lighthouse-tekton-controller:1.10.6
    - ghcr.io/jenkins-x/lighthouse-webhooks:1.10.6
    - ghcr.io/jenkins-x/lighthouse-webui-plugin:0.1.7
  sync:
    comparedTo:
      destination:
        namespace: jx
        server: https://kubernetes.default.svc
      source:
        path: helmfiles/jx
        plugin:
          env:
          - name: HELMFILE_USE_CONTEXT_NAMESPACE
            value: "true"
          - name: HELM_TEMPLATE_OPTIONS
            value: --skip-tests
          name: helmfile
        repoURL: https://github.com/my-org/jenkins-x-dev
        targetRevision: HEAD
    revision: a8f2bfc348431499db186a2f1a3ba95d58926fce
    status: OutOfSync
 ✗ argocd version
argocd: v2.5.1+504da42
  BuildDate: 2022-11-01T21:30:52Z
  GitCommit: 504da424c2c9bb91d7fb2ebf3ae72162e7a5a5be
  GitTreeState: clean
  GoVersion: go1.18.7
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v2.5.0+b895da4
  BuildDate: 2022-10-25T14:40:01Z
  GitCommit: b895da457791d56f01522796a8c3cd0f583d5d91
  GitTreeState: clean
  GoVersion: go1.18.7
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v4.5.7 2022-08-02T16:35:54Z
  Helm Version: v3.10.1+g9f88ccb
  Kubectl Version: v0.24.2
  Jsonnet Version: v0.18.0

@crenshaw-dev crenshaw-dev added the component:cmp Config Management Plugin related issues label Nov 4, 2022
@crenshaw-dev
Copy link
Collaborator Author

You get that consistently? The checksum check is meant to prevent tampering or data corruption. I'd be really surprised to see it consistently.

Can you run a hard refresh to make sure you don't keep getting a cached error?

@joshuasimon-taulia
Copy link

i tried

  • hard refresh in the ui
  • adding --refresh to the cli command
  • narrowing down the files sent with --local-include helmfile.yaml
    same result. the command also fails against a different app
argocd app diff istio-system-preprod --server-side-generate --local "$PWD" --refresh --local-include helmfile.yaml
WARN[0001] spec.plugin.name is set, which means this Application uses a plugin installed in the argocd-cm ConfigMap. Installing plugins via that ConfigMap is deprecated in Argo CD v2.5. Starting in Argo CD v2.6, this Application will fail to sync. Contact your Argo CD admin to make sure an upgrade plan is in place. More info: https://argo-cd.readthedocs.io/en/latest/operator-manual/upgrading/2.4-2.5/
FATA[0005] rpc error: code = Unknown desc = error receiving manifest file stream: error receiving tgz file: file checksum validation error: calc e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 sent c897b354287cb29d7886a775131f3514113e7e1a35e48c87b1ff6b5f09a3ae0e

argocd log logs

argocd-server-86c9cf5874-8p8sw server time="2022-11-04T19:47:13Z" level=info msg="received unary call /cluster.SettingsService/Get" grpc.method=Get grpc.request.content= grpc.service=cluster.SettingsService grpc.start_time="2022-11-04T19:47:13Z" span.kind=server system=grpc
argocd-server-86c9cf5874-8p8sw server time="2022-11-04T19:47:13Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=Get grpc.service=cluster.SettingsService grpc.start_time="2022-11-04T19:47:13Z" grpc.time_ms=8.785 span.kind=server system=grpc
argocd-server-86c9cf5874-8p8sw server time="2022-11-04T19:47:13Z" level=info msg="received streaming call /application.ApplicationService/GetManifestsWithFiles" grpc.method=GetManifestsWithFiles grpc.request.content="query:<name:\"istio-system-preprod\" checksum:\"c897b354287cb29d7886a775131f3514113e7e1a35e48c87b1ff6b5f09a3ae0e\" appNamespace:\"\" > " grpc.service=application.ApplicationService grpc.start_time="2022-11-04T19:47:13Z" span.kind=server system=grpc
argocd-server-86c9cf5874-8p8sw server I1104 19:47:14.884336       1 request.go:601] Waited for 1.015823353s due to client-side throttling, not priority and fairness, request: GET:https://172.31.232.1:443/apis/security.istio.io/v1beta1?timeout=32s
argocd-server-86c9cf5874-8p8sw server time="2022-11-04T19:47:15Z" level=error msg="Partial success when performing preferred resource discovery" error="unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request"
argocd-repo-server-545b5785b4-jp2ld repo-server time="2022-11-04T19:47:15Z" level=error msg="finished streaming call with code Unknown" error="error receiving manifest file stream: error receiving tgz file: file checksum validation error: calc e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 sent c897b354287cb29d7886a775131f3514113e7e1a35e48c87b1ff6b5f09a3ae0e" grpc.code=Unknown grpc.method=GenerateManifestWithFiles grpc.service=repository.RepoServerService grpc.start_time="2022-11-04T19:47:15Z" grpc.time_ms=1.423 span.kind=server system=grpc
argocd-server-86c9cf5874-8p8sw server time="2022-11-04T19:47:15Z" level=error msg="finished streaming call with code Unknown" error="rpc error: code = Unknown desc = error receiving manifest file stream: error receiving tgz file: file checksum validation error: calc e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 sent c897b354287cb29d7886a775131f3514113e7e1a35e48c87b1ff6b5f09a3ae0e" grpc.code=Unknown grpc.method=GetManifestsWithFiles grpc.service=application.ApplicationService grpc.start_time="2022-11-04T19:47:13Z" grpc.time_ms=1540.895 span.kind=server system=grpc

is there an example of the directory context and files to include when using --server-side-generate?

@arturhoo
Copy link
Contributor

Following the original motivation, what do you think of app manifests also accepting --server-side-generate?

@cilindrox
Copy link
Contributor

Also seeing the "file checksum validation error" consistently when running app diff --local apps/foo --server-side-generate both locally and on the CI server (ephemeral containers, git clone -> helm dep up -> argocd app diff)

FATA[0004] rpc error: code = Unknown desc = error receiving manifest file stream: error receiving tgz file: file checksum validation error

Seems the crc is not being properly computed or there's a difference in the way it's being generated.

@crenshaw-dev
Copy link
Collaborator Author

@cilindrox can you check the server-side log and post the checksum values?

I did a deep dive on this error with @joshuasimon-taulia and found that when the API server receives the streamed file it receives metadata but no actual file contents. So its checksum is always the checksum of an empty array of bytes.

I still have no idea why that is happening.

@cilindrox
Copy link
Contributor

cilindrox commented Nov 22, 2022

Thanks for the quick reply @crenshaw-dev - I'm not seeing any relevant logs on app.kubernetes.io/name=argocd-server - (checked every other instance, just in case and still no dice).

Here's the crc client-side:

calc e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 sent cc547b59cb862b3f3b389d4cbbed234d27f0282e3de4eab31833fd2bdb7e1ce8

Should I be running this with a debug log level or similar?

Edit: lack of logs got me thinking - could this be getting filtered at the ingress level? ie: the payload's not reaching the server?

@crenshaw-dev
Copy link
Collaborator Author

Yep, e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 is the same "calculated" checksum as what @joshuasimon-taulia was getting - zero bytes.

Should I be running this with a debug log level or similar?

Wouldn't hurt, but I don't think there will be much more info. I think the issue is pretty low-level.

could this be getting filtered at the ingress level? ie: the payload's not reaching the server?

Quite possibly. I doubt that the server is receiving the bytes and then just ignoring them. I bet they're not making it over the network.

@joshuasimon-taulia
Copy link

joshuasimon-taulia commented Nov 22, 2022

could this be getting filtered at the ingress level? ie: the payload's not reaching the server?

we tried the same exercise using kubectl port-forward to bypass nginx ingress, and the results were still the same

@wtam2018 wtam2018 removed this from the v2.6 milestone Dec 12, 2022
@james-trousdale-lyb
Copy link

I hit the same issue in a scenario where we are using --port-forward and --plaintext on the argocd CLI - with the same exact checksum for the empty array of bytes.

For now, we'll skirt around this issue in our use case (which is to show the diffs on pull requests pre-merge) by using the --revision flag, but it'd be nice to be able to run it locally, obviously.

@lblazewski
Copy link

Hi!

I also got into the error of checksums not matching today when trying to figure out the diff from local path. I see that the issue referenced in this PR was removed from the v2.6 milestone but the warning about not using --server-side-generate is still present and it clearly says that it will become the default in 2.6.

Does it mean that this issue will be addressed before v2.6 is released? Since otherwise once v2.6 would be released argocd app diff --local would be broken.

@rassie
Copy link

rassie commented May 9, 2023

Having encountered this checksum problem myself, I can contribute to the confusion a bit: in my environment I could narrow this particular problem down to incorrectly setup non-web gRPC ingress. After fixing my setup, I could verify that app diff --local . works (for various stages of "works", I'm having a different problem after that step) without --grpc-web and doesn't (EOF) with --grpc-web.

@cha7ri
Copy link

cha7ri commented May 15, 2023

For my case it works if if I use --port-forward and skip the ingress. @rassie how did you fix your ingress? Are you using nginx ingress controller?

@rassie
Copy link

rassie commented May 15, 2023

@cha7ri I'm on k3d/k3s, so it's a Traefik-based ingress controller. It seem that the Ingresses provided by the Helm chart have been incomplete, so I've bit the bullet and implemented Traefik Ingress as described in the docs.

@thomassandslyst
Copy link

I'm also getting this issue where the file streaming part of diff --server-side-generate through the grpc-web just doesn't work and the end result is a different checksum.

We cannot use pure grpc as our ArgoCD is behind Cloudflare Access which doesn't support any grpc.

@crenshaw-dev
Copy link
Collaborator Author

Spreading the knowledge: grpc-web doesn't support client streaming, except by websocket: #12032 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:cmp Config Management Plugin related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.