Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add app-id and drain url in the error message #212

Merged
merged 1 commit into from
Jan 27, 2023

Conversation

chombium
Copy link
Contributor

@chombium chombium commented Jan 26, 2023

As discussed in #203 we've seen when an app has multiple syslog drains or one syslog drain was bound to multiple apps in was hard to tell which syslog drain for which app has log drops. We've adjusted the error message which is injected to the applications log, so that this becomes clear.
Now the error messages looks as following:

 2023-01-26T17:08:28.31+0000 [LGR/LGR] ERR 10000 messages lost for application <app-guid> in user provided syslog drain with url <anonymized-syslog-drain-url>

We've adjusted the unit test and did integration test, deployed the new version of the syslog-agent and checked the app logs for an application which produces many logs/s. We've also tested with a syslog url like https://user:pass@example.com?p1=v1 and in the logs we only saw the "base url", https://example.com from the mentioned example. The anonymization that @ctlong implement works properly as well 馃帀

Fixes #203

Description

Please include a summary of the change.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Testing performed?

  • Unit tests
  • Integration tests
  • Acceptance tests

Checklist:

  • This PR is being made against the main branch, or relevant version branch
  • I have made corresponding changes to the documentation
  • I have added testing for my changes

If you have any questions, or want to get attention for a PR or issue please reach out on the #logging-and-metrics channel in the cloudfoundry slack

- In cases where one application has two or more syslog drains bound, if
there are drops in one of the drains, now, the app owner can see the syslog
url of the drain dropping the logs.
- If one syslog drain is bound to multiple application, and there are
drops in the drain, the app owner will see the app-id in the logs.

Co-authored-by: Felix Hambrecht <Felix.Hambrecht@sap.com>
@Benjamintf1
Copy link
Member

hmm, this seems slightly off to me. When using a app drain, I'd expect to see:
10000 messages lost for application <app-guid> in user provided syslog drain with url <anonymized-syslog-drain-url>
as commented.

However, aggregate drains exist and using some of the same data structures. For aggregate drains, the "appid" is blank. So if you have a aggregate drain, I think you would see
10000 messages lost for application in user provided syslog drain with url <anonymized-syslog-drain-url>
which might be a bit weird/confusing.

Otherwise, love it.

@chombium
Copy link
Contributor Author

chombium commented Jan 26, 2023

@Benjamintf1 good catch. I guess we'll have to adapt the text of the message depending if the app id is blank or not.

I would say that as the aggregate drains are configured by a cf operator and are generally used for central logging platforms, the cf operators would check the state of the drains by adding monitoring and alerting based on the ingress, egress and dropped metrics. This error message would be essential for "external users" who do:

cf cups mydrain -l syslog://mydrain.com
cf bind-service myapp mydrain

@ctlong ctlong moved this from Reviewer Assigned to Review in Progress in DEPRECATED App Platform - Logging and Metrics Jan 26, 2023
@Benjamintf1
Copy link
Member

yeah, you're right. I think maybe ignore my objection lol.

@Benjamintf1
Copy link
Member

If you want to be fancy, maybe switch the code so that it emits the code under the syslog agent itself for aggregate drains and make the app message make sense?

@Benjamintf1
Copy link
Member

Benjamintf1 commented Jan 26, 2023

I'm going to wait for the pr pipeline to run and merge in the morning tomorrow if it succeeds.

@chombium
Copy link
Contributor Author

chombium commented Jan 27, 2023

I've checked what would it look like to move the code which formats the log messages and I don't think it's worth moving anything:

  • The SyslogConnector.Connect() which creates a DiodeWriter based on the binding configuration is called only twice in the Syslog Manager: once for app drains and the second time for aggregate drains. When we change something we'll have to duplicate the code of the function
  • If we define the error messages in the calls in the Binding Manager, the code would be a bit unreadable as the messages will be formatted once with the binding related data and than in the SyslogConnector.Connect() the number of the dropped messages will be added to the message.
  • There is a way how to attach the alerter function to an existing instance DiodeWriter and the OneToOneEnvelopeV2 diode, so we cannot create the DiodeWriter and than attach an alerter function with the properly formatted error messages

It would have been nice if we could've added an alerting function to an existing instance of a DiodeWriter/OneToOneEnvelopeV2 diode, but it seems to me, to be able to to that, we'll need to refactor the code.cloudfoundry.org/go-diodes first.

If we want to emit the error message for the aggregate drains as well, we could emit different messages based on the appID

func (w *SyslogConnector) Connect(ctx context.Context, b Binding) (egress.Writer, error) {
...
	dw := egress.NewDiodeWriter(ctx, writer, diodes.AlertFunc(func(missed int) {
		w.droppedMetric.Add(float64(missed))
		drainDroppedMetric.Add(float64(missed))

		var logMsg string
		if b.AppId == "" {
			// aggregate drain
			logMsg = fmt.Sprintf(
				"Dropped %d %s logs for aggregate drain with url %s",
				missed, urlBinding.Scheme(), anonymousUrl.String(),
			)
		} else {
			// application drain
			logMsg = fmt.Sprintf("%d messages lost for application %s in user provided syslog drain with url %s", missed, b.AppId, anonymousUrl.String())
		}
		w.emitLoggregatorErrorLog(b.AppId, logMsg)

		w.emitStandardOutErrorLog(b.AppId, urlBinding.Scheme(), anonymousUrl.String(), missed)
	}), w.wg)
...

and the emitLoggregatorErrorLog could be:

func (w *SyslogConnector) emitLoggregatorErrorLog(appID, message string) {
	if appID == "" {
		w.logClient.EmitLog(message)
		return
	}

	option := loggregator.WithAppInfo(appID, "LGR", "")
	w.logClient.EmitLog(message, option)

	option = loggregator.WithAppInfo(
		appID,
		"SYS",
		w.sourceIndex,
	)
	w.logClient.EmitLog(message, option)
}

TBH, I'm not sure if w.logClient.EmitLog(message) will work at all. I'm wondering if emitting log messages without AppID will work and if we can attach a log message to an aggregate drain directly without appID. As far as I know, the logs are always associated with a source.

@Benjamintf1 Do you maybe have some other, better idea?

@Benjamintf1 Benjamintf1 merged commit 69dec6f into cloudfoundry:main Jan 27, 2023
DEPRECATED App Platform - Logging and Metrics automation moved this from Review in Progress to Done Jan 27, 2023
@Benjamintf1
Copy link
Member

For aggregate drains, you'd just want to emit it with syslog agent's source ID instead of the application id.

That said, given we don't emit it at all right now, we can handle that with a different issue/pr. That is, IF we want to do that at all, which is maybe unnecessary.

@Benjamintf1
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Syslog Log drop error message for app with multiple syslog drains
2 participants