Update the README, introduce /rfcs along with the first example RFC…

… (internal error reporting), and include a code of conduct
datalust · Dec 18, 2017 · d9c0e12 · d9c0e12
1 parent 9e330e6
commit d9c0e12
Show file tree

Hide file tree

Showing 4 changed files with 173 additions and 5 deletions.
diff --git a/0000-template.md b/0000-template.md
@@ -0,0 +1,29 @@
+# (Feature Name)
+
+(Summary, one or two sentences describing the proposed feature.)
+
+## Motivation
+
+(Why do we want to do this?)
+
+## Detailed design
+
+(Explain how the feature works, in sufficient detail that an implementation can proceed.
+
+* Examples of usage, _user stories_, sample input/output
+* Interactions with other features
+* Corner cases, and cases determined to be out-of-scope
+* Compatibility, e.g. for existing Seq Apps, consumers of the API
+* Include description of necessary changes to client libraries)
+
+## Drawbacks
+
+(Are there any known drawbacks to implementing the change as proposed?)
+
+## Alternatives
+
+(What alternatives, if any, are available, and why is the proposed change preferred?)
+
+## References
+
+(Source material and further reading.)
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -0,0 +1,3 @@
+# Code of conduct
+
+The [Datalust organization](https://github.com/datalust) has adopted the code of conduct defined by the [Contributor Covenant](http://contributor-covenant.org/) [version 1.4](http://contributor-covenant.org/version/1/4/) to clarify expected behavior in our community.
diff --git a/README.md b/README.md
@@ -1,21 +1,33 @@
-# Seq [![Join the chat at https://gitter.im/datalust/seq](https://img.shields.io/gitter/room/datalust/seq.svg)](https://gitter.im/datalust/seq) [![Feature Requests](https://img.shields.io/badge/features-uservoice-orange.svg)](https://seq.uservoice.com) [![Read Documentation](https://img.shields.io/badge/docs-online-blue.svg)](http://docs.getseq.net)
+# Seq [![Join the chat at https://gitter.im/datalust/seq](https://img.shields.io/gitter/room/datalust/seq.svg)](https://gitter.im/datalust/seq) [![Read Documentation](https://img.shields.io/badge/docs-online-blue.svg)](https://docs.getseq.net)
 
-Welcome to the issue tracker for [Seq](https://getseq.net)! You can **[search existing issues](https://github.com/datalust/seq-releases/issues)** or **[report an issue](https://github.com/datalust/seq-releases/issues/new)** here.
+Welcome to the hub for issues, design discussions and the feature roadmap for [Seq](https://getseq.net).
 
 ### What is Seq?
 
 Seq is a log server designed to speed up diagnostics in complex, asynchronous and distrubuted applications. It has a strong focus on _structured logging_, the idea that log events should carry important information in first-class properties.
 
 ### Bugs and feature requests
 
-If you think you may have found a bug in Seq, this is the place to report it.
+If you think you may have found a bug in Seq, this is the place to report it. You can **[search existing issues](https://github.com/datalust/seq-releases/issues)** or **[raise a new issue](https://github.com/datalust/seq-releases/issues/new)** here.
 
-Please also feel free to suggest features here. We will endeavour to consider and respond to every request we receieve. To keep the volume of items in this tracker manageable, feature requests that we're planning to work on in the very near future are moved to our [UserVoice suggestions](https://seq.uservoice.com).
+Please also feel free to suggest features here. We will endeavour to consider and respond to every request we receieve.
 
 ### Documentation and support
 
 The Seq [online documentation](http://docs.getseq.net) has information on all aspects of Seq - it's a great place to start if you are configuring or using Seq for the first time.
 
 If you need help with [Serilog.Sinks.Seq](https://github.com/serilog/serilog-sinks-seq), the [Troubleshooting section](https://github.com/serilog/serilog-sinks-seq#troubleshooting) of the sink documentation has some useful steps for tracking down common issues.
 
-For more information or help with Seq, please feel free to visit our [support forum](http://docs.getseq.net/discuss) or email **support@getseq.net**.
+For more information or help with Seq, please feel free to visit our [support forum](https://docs.getseq.net/discuss) or email **support@getseq.net**.
+
+### Feature roadmap
+
+We maintain [feature milestones](https://github.com/datalust/seq-tickets/milestones?direction=asc&sort=due_date&state=open) to provide an outline of what to expect in upcoming releases.
+
+### Design discussions
+
+For Seq 5.0 onwards, we have adopted an RFC ('request for comment') process so that design discussions for larger features can happen in the open, with full community involvement.
+
+Accepted RFCs are recorded under `/rfcs` in this repository, while new proposals are [discussed via pull requests](https://github.com/datalust/seq-tickets/pulls).
+
+_To propose a feature, please create an issue to discuss it with the Seq development team. We will create RFCs and PRs where necessary._
diff --git a/rfcs/0001-internal-error-reporting.md b/rfcs/0001-internal-error-reporting.md
@@ -0,0 +1,124 @@
+# Internal error reporting
+
+Allow administrators to opt-in to providing Datalust with error details when an unhandled error condition or exception occurs within a Seq instance.
+
+## Motivation
+
+Telemetry is an important tool for improving the quailty of Seq in diverse deployment scenarios. Seq currently writes unhandled error details to its internal log files (`C:\ProgramData\Seq\Logs` by default), but:
+
+* Administrators may not be aware of the presence of errors in this file, and
+* It's laborious to check, collect, and then send these errors to Seq Support, especially because many will be benign/transient conditions
+* In many cases, it's difficult to ascertain whether an error is a once-off issue, or representative of a wider problem
+
+By allowing administrators to opt-in to automatic error reporting, we can more efficiently detect and resolve issues with Seq.
+
+Our goal is to _enable administrators to turn on error reporting when it is appropriate_. We don't expect error reporting to be turned on for all installations: in some high-sensitivity environments, we understand that even reporting of error messages will be undesirable. We want users to make an informed decision to help improve the quality of Seq when it is reasonable to do so.
+
+## Detailed design
+
+The proposed method of error reporting is to enable a remote (HTTPS) log server as an additional target for the `Error` and `Fatal`-level events normally written to the Seq internal log file, with detailed event properties removed.
+
+### Enabling error reporting
+
+By default, no error details will be reported. Seq currently (4.1.x) maintains a strict policy of making no external network calls without an explicit opt-in by an administrator (e.g. for version update checks). Clear and explicit opt-in is a cornerstone of Seq's respect for user privacy and security.
+
+To opt-in, an administrative Seq user will open the _Settings_ > _System_ page and navigate to the _Telemetry_ heading. Initially, only a single, unchecked box will be presented, with the title:
+
+> **Enable error reporting**
+
+And help-text:
+
+> Check this box to automatically report internal errors in the Seq application to the Seq development team. (More >> Error reports will contain a unique identifier for this Seq installation, error messages and contextual information generated by Seq itself, and stack-trace information showing where the error occurred in the Seq program code.)
+
+When the box is checked, an additional text input will appear, with the label and help-text:
+
+> **Reply email address**
+> If you would like us to contact you about errors generated by this Seq installation, please include an email address where we can reach you.
+
+The first time error reporting is enabled, a 20-character _installation id_ will be generated so that errors from the same instance can be correlated.
+
+### Error report contents
+
+The example below illustrates the level of detail that will be collected:
+
+```
+Time:      2017-11-15T20:10:19.8362537Z
+Version:   4.1.17
+Instance:  j00foew4qjofe2ijofp4
+Email:     admin@example.com
+Event:     Error serving {RequestUrl} (token: {ErrorToken}) 
+Exception: System.NullReferenceException: Object reference not set to an instance of an object.
+      at Seq.Server.Data.Documents.Signals.Deletions.InstanceSignalDeletedPolicy.<>c__DisplayClass3_0.<CanDelete>b__1()
+      at Seq.Server.Web.EntityResourceModule`2.Remove(String id)
+      at Seq.Server.Web.Api.Apps.AppsModule.Remove(String id)
+      at Nancy.Routing.Route.<>c__DisplayClass4.<Wrap>b__3(Object parameters, CancellationToken context)
+```
+
+(Field names are for descriptive purposes only.)
+
+* **Time** - the error timestamp
+* **Version** - the Seq version that generated the error report
+* **Instance** - the unique id assigned to the Seq instance reporting the error
+* **Email** - (if supplied,) the reply email address for the administrator of the Seq instance
+* **Event** - the _template_ describing the error condition, without any parameters substituted
+* **Exception** - the full exception message and stack trace, or null if no exception is associated with the event
+
+Event-specific properties like `RequestUrl` and `ErrorToken` will not be added to the report; this is a trade-off, we may wish to mark particular properties as 'safe' in a future iteration of the feature, but for now we can do without this level of detail.
+
+Exception messages are the least-controllable data item here; currently, Seq itself avoids including senstive information in exception messages. By their nature, however, exceptions are unpredictable. Initial implementation of the feature will need to include a review current exception messages, and we will futher need to monitor the content of reported exceptions to limit the amount of user/implementation-specific data that is collected.
+
+### Blacklisted components
+
+Known security-sensitive components, e.g. the authentication providers, will be explicitly excluded from crash reporting (e.g. by pushing `Telemetry = false` on the internal log context and filtering these events from reporting.)
+
+Exceptions generated by plug-in apps will also be excluded, as it's not possible to determine in advance what information might be revealed by a user-created app.
+
+The blacklisted components will be determined by review.
+
+### Internal crash collection/queueing
+
+To avoid destabilizing the Seq server, or consuming excessive memory or bandwidth, the number of queued/reported exceptions needs to be capped. Starting with very small initial limits seems sensible:
+
+* Event size limit - 16 kB
+* Reported exceptions - 100/day, (therefore a maximum 1600 kB per day egress)
+* Queued exceptions internally awaiting transmission - 10
+
+These values can easily be modified in the future, if desired.
+
+### Transport/protocol
+
+The error reports will be sent in JSON format via HTTPS to a remote collection endpoint.
+
+### Data retention
+
+By default, we will retain error reports for 30 days or less. If an error report results in an issue being raised, we will copy non-identifying data to an issue tracker ticket.
+
+### Continuity
+
+The feature must not cause any undesirable performance, data transfer, or reliability side-effects if/when the server-side reporting endpoint is switched off or unavailable.
+
+### API client update
+
+The system settings for crash reporting will need to be exposed by the _Seq.Api_ C# client.
+
+## Drawbacks
+
+Collecting any end-user data introduces privacy and security concerns. The feature design attempts to mitigate these through:
+
+* User opt-in
+* Limited scope of reporting (unhandled errors only)
+* Exclusion of detailed event properties
+* HTTPS transport
+* Blacklisted components
+* (Server-side) short data retention time
+
+## Alternatives
+
+* Reports could be generated and sent from a separate "monitor" process, particularly so that non-.NET exceptions can be collected and reported; the added implementation cost rules this out in the short-term, but it may be useful as a future enhancement
+* Crash telemetry settings could be enabled under _Settings_ > _Diagnostics_; this would be a more logical grouping, but the layout of the _Diagnostics_ page does not incorporate form-style fields
+* Crash reports could be manually audited by an administrator before being sent; this would create an additional burden on the users' time that may defeat the purpose of the feature, but could also be an option in the future
+* A whitelist could be used, instead of a blacklist, for selecting which components can report crashes; this would be unwieldy, as the number of components we would wish to exclude would be very small
+
+## References
+
+_Left blank._