Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display a warning in status when too many log tail #2265

Merged
merged 9 commits into from Sep 18, 2018

Conversation

achntrl
Copy link
Contributor

@achntrl achntrl commented Sep 3, 2018

What does this PR do?

This PR displays a warning in the agent status when the user is trying to tail more files than the logs_config.open_files_limit is allowing

Motivation

For now, this happens almost silently, the only way to find out that not all files are tailed is to read the logs of agent. This PR makes the problem more visible

@tmichelet
Copy link
Contributor

@achntrl thanks for working on this! Someone from the team will review the code shortly, in the meantime, would you mind adding some screenshots of the agent status, to show how it displays? It helps the reviewer to picture how it would look like :)

@achntrl
Copy link
Contributor Author

achntrl commented Sep 3, 2018

Terminal status page :

image

Html status page :

capture d ecran 2018-09-03 a 14 29 12

@achntrl
Copy link
Contributor Author

achntrl commented Sep 3, 2018

I found a small bug: If you have 2 folders with > 100 files, then you delete files and you go under 100 files then you add more than 100 files in the first folder and some files in the second folder, the counter for the second folder doesn't get updated

It seems very specific but it's because of this loop

It shouldn't be too hard to fix, I'll push a fix tomorrow (and hopefully a test!)

@tmichelet
Copy link
Contributor

This is great, adding the number of tailed files per wildcard tailer is more than I hoped we'd do, 👏 .

messages = append(
messages,
fmt.Sprintf(
"The default limit on the maximum number of files in use (%d) has been reached. Increase this limit (thanks to the attribute logs_config.open_files_limit in datadog.yaml) or decrease the number of tailed file.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still reach the limit if we override the files limit -> it's not always the default limit that has been reached

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the number displayed is the actual configured number, I will update the wording

if count == len(inputs) {
return fmt.Sprintf("This path tails %d files", count)
}
return fmt.Sprintf("This path tries to tail %d files but only %d files are tailed", count, len(inputs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General nit: I'd avoid tries to and but only in agent statuses, as those tend to be blurry words.
What about something like %d files tailed out of %d files matching that we could use for both cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path .. -> %s ..." %{path}

Copy link
Contributor

@ajacquemot ajacquemot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this feature, the main logic is here but I'd like to raise any kind of warning from any kind of component, could you address this problem too please ?

@@ -111,3 +119,17 @@ func toDictionary(c *config.LogsConfig) map[string]interface{} {
}
return dictionary
}

func generateOpenFilesLimitReachedWarning(c *config.LogsConfig, inputs []string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should not be here, the status should be agnostic of the type of warning.

Configuration: toDictionary(source.Config),
Status: status,
Inputs: inputs,
WarningMessage: generateOpenFilesLimitReachedWarning(source.Config, inputs),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect something like:

WarningMessage: source.Warning.Render(),

if count == len(inputs) {
return fmt.Sprintf("This path tails %d files", count)
}
return fmt.Sprintf("This path tries to tail %d files but only %d files are tailed", count, len(inputs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path .. -> %s ..." %{path}

@@ -34,6 +35,7 @@ func NewFile(path string, source *config.LogSource) *File {
type Provider struct {
filesLimit int
shouldLogErrors bool
reachedLimit bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used

@@ -71,9 +73,11 @@ func (p *Provider) FilesToTail(sources []*config.LogSource) []*File {

if len(filesToTail) == p.filesLimit {
log.Warn("Reached the limit on the maximum number of files in use: ", p.filesLimit)
status.Warnings.RaisedWarnings[status.OpenFilesLimitReachedType] = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the interface of Warnings was status.Warnings.Raise(key string, message string), we could add any kind of message

return filesToTail
}

status.Warnings.RaisedWarnings[status.OpenFilesLimitReachedType] = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There it would be status.Warnings.Solve(key string)

Configuration map[string]interface{} `json:"configuration"`
Status string `json:"status"`
Inputs []string `json:"inputs"`
WarningMessage string `json:"warning"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should have a Warning struct here that is renderable.


// Warning types
const (
OpenFilesLimitReachedType = "open_files_limit_reached"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Warnings should be aware of the kind of warning.

@@ -4,6 +4,11 @@ NOTE: Changes made to this template should be reflected on the following templat
*/}}==========
Logs Agent
==========
{{- if .warnings }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this block after line 9 because can't raise warnings if the agent is not running

@@ -4,6 +4,11 @@ NOTE: Changes made to this template should be reflected on the following templat
*/}}==========
Logs Agent
==========
{{- if .warnings }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want {{ if .warnings }}

@achntrl achntrl force-pushed the achntrl/tailed-logs-warning-in-status branch 2 times, most recently from 905e242 to 8cf832b Compare September 4, 2018 16:16

// Remove marks a RaisedWarning as solved
func Remove(key string) {
delete(w.raised, key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not safe, you should check if the entry exists first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the entry does not exists, delete is a noop see

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// This product includes software developed at Datadog (https://www.datadoghq.com/).
// Copyright 2018 Datadog, Inc.

package warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package needs to be unit-tested, especially the operations on the map since multiple clients on different threads could call Raise and Get roughly at the same time.
For more details, you can check https://stackoverflow.com/questions/36167200/how-safe-are-golang-maps-for-concurrent-read-write-operations

@ajacquemot
Copy link
Contributor

This looks great and fully reusable, I left two comments but we're heading to the right direction.
Thanks

// This product includes software developed at Datadog (https://www.datadoghq.com/).
// Copyright 2018 Datadog, Inc.

package warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think we should move this package/file to ./pkg/logs/status

@achntrl achntrl force-pushed the achntrl/tailed-logs-warning-in-status branch from a1de789 to ec760b4 Compare September 5, 2018 08:38
@ajacquemot ajacquemot added this to the 6.6.0 milestone Sep 5, 2018
return "foo"
}

func TestRaise(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not test the inner map of the warnings struct but just that Raise followed by a Get returns what I expect

assert.Equal(t, []Warning{concreteWarning1, concreteWarning2}, GetWarnings())
}

func TestRemove(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I'd rather do something like Raise, Get, Remove, Get

ajacquemot
ajacquemot previously approved these changes Sep 5, 2018
Copy link
Contributor

@ajacquemot ajacquemot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@mnshdw mnshdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code would be much simpler if we concentrated on having the possibility to display an arbitrary list of messages. It would be much more re-usable and we still can have the file provider use it to register a warning when we are tailing too many files.

Integrations []Integration `json:"integrations"`
IsRunning bool `json:"is_running"`
Integrations []Integration `json:"integrations"`
WarningMessages []string `json:"warnings"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could have been simply generalized into Messages.
If we need a severity we can always add that later.

var w = newWarnings()

// Warning is a generic interface that generate warning messages
type Warning interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a string so I wonder if the abstraction is really necessary.

Config *LogsConfig
Status *LogStatus
inputs map[string]bool
Overview string
Copy link
Contributor

@mnshdw mnshdw Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Messages would be more appropriate here as well as it doesn't convey additional semantics that aren't necessarily true here.

@@ -61,6 +61,7 @@ func Get() Status {
}
// Convert to json
var integrations []Integration
warningsDeduplicator := make(map[string]bool)
Copy link
Contributor

@ajacquemot ajacquemot Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: make(map[string]bool) -> make(map[string]struct{})

})

for _, warning := range source.Messages.GetWarnings() {
warningsDeduplicator[warning] = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warningsDeduplicator[warning] = true -> warningsDeduplicator[warning] = struct{}{}

warningMessages := make([]string, len(warnings))
for i, w := range warnings {
warningMessages[i] = w.Render()
warnings := make([]string, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warnings := make([]string, 0) -> var warnings []string

WarningMessages: warningMessages,
IsRunning: true,
Integrations: integrations,
Messages: warnings,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit misleading, I am not sure the messages that we want to report at the very beginning of the logs status are always related to logs-sources.
We could for example report to the user that he is using a deprecated attribute and the he should use the new one.

@@ -61,6 +61,7 @@ func Get() Status {
}
// Convert to json
var integrations []Integration
warningsDeduplicator := make(map[string]bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a warning/message was not tight to a source, you'd need to do any deduplicate here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the alternative would have been to compute the warning at the status create but this would require computer to identify LogSource for file an count the inputs

I found it messier than just deduplicating warning messages (which is quite a small operation)

@achntrl achntrl force-pushed the achntrl/tailed-logs-warning-in-status branch from c9ba6dd to 2481ddc Compare September 7, 2018 07:31

// AddWarning create a warning
func (m *Messages) AddWarning(key string, warning string) {
m.AddMessage("warning_"+key, warning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you adding a prefix here ? If it's just for filtering out purpose, I'd rather avoid doing that because a string compare is not efficient.
You could create a structure:

type Message struct {
   Value String
   IsWarning bool
}

And use IsWarning for filtering out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to avoid collision between keys and to reuse the same methods so most warning methods are just calling the message one with the prefix warning_

mnshdw
mnshdw previously approved these changes Sep 7, 2018
"sync"
)

// Messages holds a message message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "message" twice normal?

import "sync"
import (
"sync"
)

// LogSource holds a reference to and integration name and a log configuration, and allows to track errors and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: to "an" integration

@achntrl achntrl merged commit debcc81 into master Sep 18, 2018
@achntrl achntrl deleted the achntrl/tailed-logs-warning-in-status branch September 18, 2018 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants