Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
123 lines (76 sloc) 7.44 KB

Internal error reporting

Allow administrators to opt-in to providing Datalust with error details when an unhandled error condition or exception occurs within a Seq instance.

Motivation

Telemetry is an important tool for improving the quailty of Seq in diverse deployment scenarios. Seq currently writes unhandled error details to its internal log files (C:\ProgramData\Seq\Logs by default), but:

  • Administrators may not be aware of the presence of errors in this file, and
  • It's laborious to check, collect, and then send these errors to Seq Support, especially because many will be benign/transient conditions
  • In many cases, it's difficult to ascertain whether an error is a once-off issue, or representative of a wider problem

By allowing administrators to opt-in to automatic error reporting, we can more efficiently detect and resolve issues with Seq.

Our goal is to enable administrators to turn on error reporting when it is appropriate. We don't expect error reporting to be turned on for all installations: in some high-sensitivity environments, we understand that even reporting of error messages will be undesirable. We want users to make an informed decision to help improve the quality of Seq when it is reasonable to do so.

Detailed design

The proposed method of error reporting is to enable a remote (HTTPS) log server as an additional target for the Error and Fatal-level events normally written to the Seq internal log file, with detailed event properties removed.

Enabling error reporting

By default, no error details will be reported. Seq currently (4.1.x) maintains a strict policy of making no external network calls without an explicit opt-in by an administrator (e.g. for version update checks). Clear and explicit opt-in is a cornerstone of Seq's respect for user privacy and security.

To opt-in, an administrative Seq user will open the Settings > Diagnostics page and navigate to the Telemetry heading. Initially, only a single, unchecked box will be presented, with the title:

Enable error reporting

And help-text:

Check this box to automatically report internal errors in the Seq application to the Seq development team. (More >> Error reports will contain a unique identifier for this Seq installation, error messages and contextual information generated by Seq itself, and stack-trace information showing where the error occurred in the Seq program code.)

When the box is checked, an additional text input will appear, with the label and help-text:

Reply email address If you would like us to contact you about errors generated by this Seq installation, please include an email address where we can reach you.

The first time error reporting is enabled, a 20-character installation id will be generated so that errors from the same instance can be correlated.

Error report contents

The example below illustrates the level of detail that will be collected:

Time:      2017-11-15T20:10:19.8362537Z
Version:   4.1.17
Instance:  j00foew4qjofe2ijofp4
Email:     admin@example.com
Event:     Error serving {RequestUrl} (token: {ErrorToken}) 
Exception: System.NullReferenceException: Object reference not set to an instance of an object.
      at Seq.Server.Data.Documents.Signals.Deletions.InstanceSignalDeletedPolicy.<>c__DisplayClass3_0.<CanDelete>b__1()
      at Seq.Server.Web.EntityResourceModule`2.Remove(String id)
      at Seq.Server.Web.Api.Apps.AppsModule.Remove(String id)
      at Nancy.Routing.Route.<>c__DisplayClass4.<Wrap>b__3(Object parameters, CancellationToken context)

(Field names are for descriptive purposes only.)

  • Time - the error timestamp
  • Version - the Seq version that generated the error report
  • Instance - the unique id assigned to the Seq instance reporting the error
  • Email - (if supplied,) the reply email address for the administrator of the Seq instance
  • Event - the template describing the error condition, without any parameters substituted
  • Exception - the full exception message and stack trace, or null if no exception is associated with the event

Values of event-specific properties like RequestUrl and ErrorToken will not be added to the report; this is a trade-off, we may wish to mark particular properties as 'safe' in a future iteration of the feature, but for now we can do without this level of detail. (It may be desirable to attach null placeholder values for event-specific properties, so that the event's message template can be considered valid.)

Exception messages are the least-controllable data item here; currently, Seq itself avoids including senstive information in exception messages. By their nature, however, exceptions are unpredictable. Initial implementation of the feature will need to include a review current exception messages, and we will futher need to monitor the content of reported exceptions to limit the amount of user/implementation-specific data that is collected.

Blacklisted components

Known security-sensitive components, e.g. the authentication providers, will be explicitly excluded from crash reporting (e.g. by pushing Telemetry = false on the internal log context and filtering these events from reporting.)

Exceptions generated by plug-in apps will also be excluded, as it's not possible to determine in advance what information might be revealed by a user-created app.

The blacklisted components will be determined by review.

Internal crash collection/queueing

To avoid destabilizing the Seq server, or consuming excessive memory or bandwidth, the number of queued/reported exceptions needs to be capped. Starting with very small initial limits seems sensible:

  • Event size limit - 16 kB
  • Reported exceptions - 100/day, (therefore a maximum 1600 kB per day egress)
  • Queued exceptions internally awaiting transmission - 10

These values can easily be modified in the future, if desired.

Transport/protocol

The error reports will be sent in JSON format via HTTPS to a remote collection endpoint.

Data retention

By default, we will retain error reports for 30 days or less. If an error report results in an issue being raised, we will copy non-identifying data to an issue tracker ticket.

Continuity

The feature must not cause any undesirable performance, data transfer, or reliability side-effects if/when the server-side reporting endpoint is switched off or unavailable.

API client update

The system settings for crash reporting will need to be exposed by the Seq.Api C# client.

Drawbacks

Collecting any end-user data introduces privacy and security concerns. The feature design attempts to mitigate these through:

  • User opt-in
  • Limited scope of reporting (unhandled errors only)
  • Exclusion of detailed event properties
  • HTTPS transport
  • Blacklisted components
  • (Server-side) short data retention time

Alternatives

  • Reports could be generated and sent from a separate "monitor" process, particularly so that non-.NET exceptions can be collected and reported; the added implementation cost rules this out in the short-term, but it may be useful as a future enhancement
  • Crash reports could be manually audited by an administrator before being sent; this would create an additional burden on the users' time that may defeat the purpose of the feature, but could also be an option in the future
  • A whitelist could be used, instead of a blacklist, for selecting which components can report crashes; this would be unwieldy, as the number of components we would wish to exclude would be very small

References

Left blank.