Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support pod exec terminal logging #9385

Merged
merged 7 commits into from
May 17, 2022

Conversation

smcavallo
Copy link
Contributor

Signed-off-by: smcavallo smcavallo@hotmail.com

See - #8905
The exec feature is extremely powerful but lacks auditability.
Many orgs will require some auditing and history tracking of who is exec'ing into pods and containers.

This "feature" logs access to the Terminal/Exec feature.
When a terminal session is opened it will generate a log line.
This seemed to be the best place to add this logging and should have enough info for auditing purposes.

Note on DCO:

If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.

Checklist:

  • [X ] Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • [X ] The title of the PR states what changed and the related issues number (used for the release note).
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • [X ] I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • [X ] My build is green (troubleshooting builds).

@codecov
Copy link

codecov bot commented May 12, 2022

Codecov Report

Merging #9385 (09642e7) into master (8cd7d47) will decrease coverage by 0.01%.
The diff coverage is 23.07%.

@@            Coverage Diff             @@
##           master    #9385      +/-   ##
==========================================
- Coverage   45.78%   45.76%   -0.02%     
==========================================
  Files         220      220              
  Lines       26165    26186      +21     
==========================================
+ Hits        11979    11985       +6     
- Misses      12529    12544      +15     
  Partials     1657     1657              
Impacted Files Coverage Δ
applicationset/generators/cluster.go 76.56% <ø> (ø)
server/application/application.go 31.83% <ø> (ø)
server/application/terminal.go 7.57% <23.07%> (+3.97%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8cd7d47...09642e7. Read the comment docs.

Copy link
Collaborator

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a really good idea! Added a couple thoughts.

server/application/terminal.go Outdated Show resolved Hide resolved
server/application/terminal.go Outdated Show resolved Hide resolved
@crenshaw-dev crenshaw-dev added the cherry-pick/2.4 Candidate for cherry picking into the 2.4 release branch label May 12, 2022
@smcavallo
Copy link
Contributor Author

@crenshaw-dev - The security scan flagged the new log lines with:

Log entries created from user input High
This log write receives unsanitized user input from here.

Since we are sending the url parameters verbatim to logs and the kubernetes API it is unsafe to write them to logs.
It is probably unsafe to send them directly to the kubernetes API as well. These as passed along to k8sClient.CoreV1().RESTClient().Post(). - not exactly sure if that client has any security built in but it would be safer to implement some additional sanitizers before sending them along.

The isValidKubernetesResourceName adds some of that - but calling that out in this PR since it was flagged by the scanner.

Signed-off-by: smcavallo <smcavallo@hotmail.com>
Copy link
Collaborator

@leoluz leoluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR. Tks for improving the logs. Added one suggestion.

break
}
}
if !findContainer {
if foundContainerName == "" {
http.Error(w, "Cannot find container", http.StatusBadRequest)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For auditing it would be interesting to also have this logged (as a warning) with all info: cluster, namespace, pod name and container name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL might complain about logging the un-sanitized container input. But since we're validating the container name above, I think it would be safe to log (and to override the CodeQL warning).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leoluz - yes this is the issue we were trying to avoid - Log entries created from user input High This log write receives unsanitized user input from here.
It is unsafe to log verbatim whatever was posted to URL params which is why it is not logged.
In theory it should rarely happen as most requests should come directly from the argocd application itself and only post namespace + pod + container as already found in argocd. I agree it would be useful to be able to debug though. If a user really wants to know why it can't be found wondering if there is wireshark/tcpdump some level of capturing the http request instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unsafe to log verbatim whatever was posted to URL params which is why it is not logged.

@smcavallo but you are previously sanitizing isn't it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sanitizing, but validating. Which I think should be enough.

If we're reaching this point of the code, then we've shown the user is authenticated and authorized to get the application and create on the exec resource. I'm not too worried about this user filling up the disk with this log line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leoluz and @crenshaw-dev - totally make sense - we've already validated so it's OK to log these. I have added the additional info to these logs.

Signed-off-by: smcavallo <smcavallo@hotmail.com>
Signed-off-by: smcavallo <smcavallo@hotmail.com>
@@ -150,7 +153,8 @@ func (s *terminalHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {

pod, err := kubeClientset.CoreV1().Pods(namespace).Get(ctx, podName, metav1.GetOptions{})
if err != nil {
http.Error(w, "Cannot find pod: "+podName, http.StatusBadRequest)
fieldLog.Warn("Terminal Pod Not Found")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error and should be logged like so:

fieldLog.Errorf("error retrieving pod %s: %s", podName, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leoluz - thank you for checking again - we are already logging the podName. when fieldLog is called it will output ALL the fields (user, namespace, pod, container) to the log line. since it is using WithFields -

Note that it doesn't log until you call Debug, Print, Info, Warn, Fatal or Panic on the Entry it returns.
I left as "Terminal Pod Not Found" to improve the searchability and to create standardized alerts based on that key - we can assume the format of that structured log will stay the same and build alerts based on that key.

I defer to all the argocd folks but the distinction here which is important is user error vs application error.
It makes sense that this throws an http error to the user.
However it is not actually an application error - the application is performing normally.
From my perspective it would be better to be Warn instead of Error for application log level.
As an operator I would want to know that this is happening (Warn) but I would not consider this an indicator that argocd is unhealthy, having an issue, or doing something broken or unexpected.

Let me know if I should change it or if the above doesn't make sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the fact that we don't need the pod name as it is already part of registered fields. However I believe this still needs to be logged as an error. It is ArgoCD internal code that does the call to kube-api to retrieve the pod by its name. If this request fails for some reason it is an internal error and should be logged like so. Retrieving the pod call can fail by several reasons and logging it as terminal pod not found is misleading.
I suggest to change this log to:

fieldLog.Errorf("error retrieving pod: %s", err)

By the way, log messages should be all lowercase sentences.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leoluz what if a Pod is in the resource tree but not synced? In that case an error from the k8s API would be a user error rather than an application error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leoluz - good point - I have updated it and also changed the log messages to all lowercase

Copy link
Collaborator

@leoluz leoluz May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leoluz what if a Pod is in the resource tree but not synced? In that case an error from the k8s API would be a user error rather than an application error.

@crenshaw-dev My understanding is that (contrary to HTTP return codes) from the log perspective, it doesn't matter if the error was caused by the user or by something internal. Our code invoked a method that returned an error and in this case an error should be logged. However there is another "subtle" problem in this case: client-go returns an error if for example the resource isn't found. This isn't an exceptional scenario and if we want to be precise to whether or not log an error (and decide the proper http return code) we need to do an extra check:

if err != nil {
    if apimachineryerrors.IsNotFound(err) {
        // don't log error
        // http 404
    } else {
        // log error
        // http 5xx
    }
}

Signed-off-by: smcavallo <smcavallo@hotmail.com>
Signed-off-by: smcavallo <smcavallo@hotmail.com>
Copy link
Collaborator

@leoluz leoluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crenshaw-dev crenshaw-dev merged commit 23d9cf2 into argoproj:master May 17, 2022
crenshaw-dev pushed a commit that referenced this pull request May 31, 2022
* feat: support pod exec terminal logging
Signed-off-by: smcavallo <smcavallo@hotmail.com>

* enhanced validation and logging when resource not found
Signed-off-by: smcavallo <smcavallo@hotmail.com>

* fix lint
Signed-off-by: smcavallo <smcavallo@hotmail.com>

* log warning when pod or container not found
Signed-off-by: smcavallo <smcavallo@hotmail.com>

* go/log-injection fixes
Signed-off-by: smcavallo <smcavallo@hotmail.com>

* log levels and lowercase message
Signed-off-by: smcavallo <smcavallo@hotmail.com>
@crenshaw-dev
Copy link
Collaborator

Cherry-picked to 2.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick/2.4 Candidate for cherry picking into the 2.4 release branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants