Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Declarative helm repositories with missing secret causes all repositories in ArgoCD to lock (#3492) #5363

Merged
merged 16 commits into from Feb 16, 2021

Conversation

jangraefen
Copy link
Contributor

@jangraefen jangraefen commented Jan 31, 2021

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • I have signed off all my commits as required by DCO
  • My build is green (troubleshooting builds).

Issue details: #3492

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
@codecov-io
Copy link

Codecov Report

Merging #5363 (e5fa559) into master (baa0f2e) will increase coverage by 0.06%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5363      +/-   ##
==========================================
+ Coverage   40.92%   40.98%   +0.06%     
==========================================
  Files         137      137              
  Lines       18562    18566       +4     
==========================================
+ Hits         7596     7610      +14     
+ Misses       9881     9876       -5     
+ Partials     1085     1080       -5     
Impacted Files Coverage Δ
util/db/db.go 84.00% <100.00%> (+20.36%) ⬆️
util/db/repository.go 66.05% <100.00%> (+2.32%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update baa0f2e...e5fa559. Read the comment docs.

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
@jangraefen jangraefen marked this pull request as ready for review January 31, 2021 20:46
@jangraefen jangraefen marked this pull request as draft January 31, 2021 20:53
Just logging the error will be a bad user
experience, since it provides no direct feedback
as before.

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
@jangraefen
Copy link
Contributor Author

I am especially interessted in feedback about my handling in the error case. Currently, if a secret cannot be resolved, the user sees no repositories at all, but gets an error that explains the cause: A missing secret. With my fix, the repository will have a connection state set to failed. I test for unintended site effects but could not find any, but I may be missing something?

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
@jangraefen jangraefen marked this pull request as ready for review February 1, 2021 18:37
@jannfis
Copy link
Member

jannfis commented Feb 4, 2021

Hey @jangraefen, thank you for this awesome PR! Nice to see that you are actually increasing the test coverage, that is something much appreciated!

As for feedback on your error handling: I like the fact that it's exposed as connection failure, but I think we should not expose the gory details to the user. That's probably more confusing than helping, and also provides some intimate details about the cluster assets that should not be exposed to prying eyes. So my proposal in this case would be, to set the reason of the connection failure to something along the lines Configuration error - check server logs and actually log the details of it as warning using log.Warnf() or a similar method.

Instead of displaying a technical error message
that might expose critical information about the
cluster, we only display a generic error message.
The actual error is then logged to the server
logged, so that an administrator can further
drill down into the problem

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
@jangraefen
Copy link
Contributor Author

Thank you @jannfis for your feedback!

My line of thinking was, that the actual error message was exposed before, so it would be nice to not have less information then before on the actual cause. I do however realize, that this information exposed just because of the very bug I fixed. Good catch! 👍

I changed the message to the one you proposed, with a "please" added, to be more polite :-). The logging is right below.

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
This reverts commit a38ff65

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
@jangraefen
Copy link
Contributor Author

@jannfis Can you maybe have a look? For some reason the CodeQL anaylsis is failing and I have no idea why? The code seem completly unrelated to my changed. I still commented out my changed that I might since the analysis is failing, still no luck.

Has CodeQL updated its database and this is not related to my change? I am completely lost 😢

@jangraefen
Copy link
Contributor Author

I confirmed that the CodeQL error also occure on the master branch, by running the checks locally. How do I best proceed with this PR?

@jannfis
Copy link
Member

jannfis commented Feb 6, 2021

@jangraefen Never mind the CodeQL issue, I will take care of it (I had a quick look and it's most likely a false positive anyway).

@jannfis jannfis self-requested a review February 6, 2021 11:11
Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR @jangraefen - looks great so far. I have some minor comments and one topic up for discussion, please see below for more details.

util/db/db.go Outdated Show resolved Hide resolved
util/db/repository.go Outdated Show resolved Hide resolved
util/db/repository.go Outdated Show resolved Hide resolved
Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
@jannfis
Copy link
Member

jannfis commented Feb 12, 2021

@jangraefen So I finally came around to test this PR locally, but somehow the detailed error message will still propagate to the client:

[vagrant@argocd-dev argo-cd]$ ./dist/argocd --server 127.0.0.1:8080 --plaintext --insecure repo list
TYPE  NAME  REPO                                            INSECURE  OCI    LFS    CREDS  STATUS  MESSAGE
git         https://github.com/jannfis/argocd-example-apps  false     false  false  false  Failed  Unable to connect to repository: secret "repo-2563674859" did not contain key "username"

@jangraefen
Copy link
Contributor Author

@jangraefen So I finally came around to test this PR locally, but somehow the detailed error message will still propagate to the client

Hey @jannfis,

thank you very much for having a look. I had another close look at the source code to see if I could trace the problem, but came up empty handed. I will try to reproduce it with a running instance this evening.

Just to make sure their is something obvious going wrong: Since it was programmed this way in a previous iteration, did you maybe had the code checked out and just need to give it a pull?

@jannfis
Copy link
Member

jannfis commented Feb 12, 2021

@jangraefen I did test it on the latest commits from the PR (pulled using GitHub CLI), and I validated that I'm at the tip of that branch. I will have a look too on what's going on.

@jangraefen
Copy link
Contributor Author

@jannfis Thank you for clarifying 🙂. I am sure it is something stupid on my part and I will definitly dive into this tonight.

@jangraefen
Copy link
Contributor Author

@jannfis I found the issue. My code is working just fine, but in the repository Server.getConnectionState we assemble a connection state as well. When refreshing the repositories, this is actually the code that is producing the text for the message. This was not visible previously, since listing all repositories would have errored. Sloppy on my part for not noticing that beforehand. Sorry for wasting your time.

Anyway, we cannot employ the same tactic as before. The error we get now could be Kubernetes related (secret missing, secret misconfiged, etc.) or it could be Git related (connection error, etc.). To distinguish here we would need to introduce a custom error type here, just as you suggested in the review.

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for working this through with me, @jangraefen. Much appreciated!

@jannfis jannfis merged commit d342993 into argoproj:master Feb 16, 2021
@jangraefen jangraefen deleted the fix/missing-repocreds-crash branch February 16, 2021 07:32
@jangraefen
Copy link
Contributor Author

@jannfis And thank you for reviewing this PR and being to patient with me 🙂.

shubhamagarwal19 pushed a commit to shubhamagarwal19/argo-cd that referenced this pull request Apr 15, 2021
…ositories in ArgoCD to lock (argoproj#3492) (argoproj#5363)

* Add test for get repository credentials

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Log error on missing repository credentials

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Fix import formatting

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Use connection state instead of logging

Just logging the error will be a bad user
experience, since it provides no direct feedback
as before.

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Fix test to check for connection state

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Do not expose technical message directly

Instead of displaying a technical error message
that might expose critical information about the
cluster, we only display a generic error message.
The actual error is then logged to the server
logged, so that an administrator can further
drill down into the problem

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Adapt tests to new error message

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Retrigger CI pipeline

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* See if I am actually the cause of this error

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Revert changes to evaluate CodeQL result

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Desperate attempt to find the cause of the CodeQL error

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Revert "Desperate attempt to find the cause of the CodeQL error"

This reverts commit a38ff65

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Fix first to review findings

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Propose a better function name and add docu

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Overwrite connection status for refresh as well

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>

* Fix goimports lint issue

Signed-off-by: Jan Graefen <223234+jangraefen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants