Skip to content
This repository has been archived by the owner on Dec 18, 2023. It is now read-only.

Commit

Permalink
Event logging - docs and a single event example. (#85)
Browse files Browse the repository at this point in the history
* documents

* first warning was implemented

* first version of docs complete
  • Loading branch information
SergeyKanzhelev committed Jan 2, 2019
1 parent af9a10e commit d2b4b35
Show file tree
Hide file tree
Showing 5 changed files with 209 additions and 2 deletions.
41 changes: 41 additions & 0 deletions docs/error-handling.md
@@ -0,0 +1,41 @@
# Error handling in Open Census C# SDK

Open Census is a library that will in many cases run in a context of customer
app performing non-essential from app business logic perspective operations.
Open Census SDK also can and will often be enabled via platform extensibility
mechanisms and potentially only enabled in runtime. Which makes the use of SDK
non-obvious for the end user and sometimes outside of the end user control.

This makes some unique requirements for Open Census error handling practices.

## Basic error handling principles

Open Census SDK must not throw or leak unhandled or user unhandled exceptions.

1. APIs must not throw or leak unhandled or user unhandled exceptions when the
API is used incorrectly by the developer. Smart defaults should be used so
that the SDK generally works.
2. SDK must not throw or leak unhandled or user unhandled exceptions for
configuration errors.
3. SDK must not throw or leak unhandled or user unhandled exceptions for errors
in their own operations. Examples: telemetry cannot be sent because the
endpoint is down or location information is not available because device
owner has disabled it.

## Guidance

1. In .NET 4.0 and above, catching all exceptions will not catch corrupted
state exceptions (CSEs).
- We want this behavior—don’t catch CSEs
- This allows exceptions like stack overflow, access violation to flow through
- More information: http://msdn.microsoft.com/en-us/magazine/dd419661.aspx
2. Every background operation callback, Task or Thread method should have a
global `try{}catch` statement to ensure reliability of an app.
3. When catching all exceptions in other cases, reduce the scope of the `try` as
much as possible.
4. In general, don't catch, filter, and rethrow
- Catch all exceptions and log error
- If you must rethrow use `throw;` not `throw ex;`. It will ensure
original call stack is preserved.
5. Beware of any call to external callbacks or override-able interface. Expect
them to throw.
103 changes: 103 additions & 0 deletions docs/error-logging.md
@@ -0,0 +1,103 @@
# Error logging

This document explains how Open Census SDK logs information about it's own
execution.

There are the following scenarios for SDK manageability:

1. Send error & warning logs to the back-end for customer self-troubleshooting.
2. Visualize OC SDK health in external tools.
3. Visualize OC SDK health in Z-Pages.
4. Show errors/warnings/information in Visual Studio F5 debug window.
5. Testing – no debugger troubleshooting.
6. Customer support – collect verbose logs.

## Definition of verbosity levels

The following severity levels are defined for SDK logs.

### Severity `Error`

Problem in SDK operation resulted in data loss or inability to collect data.

### Severity `Warning`

Problem in SDK operation that MAY result in data loss if not attended to.
`Warning` level may also identify data quality problem.

### Severity `Informational`

Major, most often rarely happening operation completion.

### Severity `Verbose`

All other logs. Typically used for troubleshooting of a hard to reproduce
issues or issues happening in specific production environments.

## Logging with EventSource

1. Find or create an assembly-specific `internal` class inherited from
`EventSource`.
2. Prefix the name of EventSource with `OpenCensus-` using class attribute like
this: `[EventSource(Name = "OpenCensus-Base")]`.
3. Create a new `Event` method with the arguments that needs to be logged. Each
event should have index, message and event severity (level). It is a good
practice to include event severity (level) into the method name.
4. Use the following rules to pick event index:
1. Do not reorder existing event method indexes. Otherwise versioning of
logs metadata will not work well.
2. Do not put large gaps between indices. E.g. use sequential indices
instead of events categorization based on index (`1X` for one category,
`2X` for another). Unassigned indices in `1X` category will affect
logging performance.
5. Use the following rules to author the event message:
1. Make event description actionable and explain the effect of the problem.
For instance, instead of *"No span in current context"* use something
like *"No span in current context. Span name will not be updated. It may
indicate incorrect usage of Open Census API - please ensure span wasn't
overridden explicitly in your code or by other module."*
6. Use the following definition of the severity from the next section.
7. Follow the performance optimization techniques.

## Minimizing logging performance impact

### Pass object references

EventSource requires to use primitive types like `int` or `string` in `Write`
method. This limitation requires to format complex types like `Exception` before
calling trace statement.

Since formatting happens before calling `Write` method it will be called
unconditionally – whether listener enabled or not. To minimize performance hit
create `NonEvent` methods in EventSource that accept complex types and check
`Log.IsEnabled` before serializing those and passing to `Event` methods.

### Diagnostics events throttling

Throttling is required for the following scenarios:

- Minimize traffic we use to report problems to portal
- Make sure *.etl are not overloaded with similar errors

Logs subscribers will implement throttling logic. However log producer may have
an additional logic to prevent excessive logging. For instance, if problem
cannot be resolved in runtime - producer of the `Error` log may decide to only
log it once or once in a while. Note, this technique should be used carefully
as not every log subscriber can be enabled from the process start and may miss
this important error message.

## Subscribing to EventSource

EventSource allows us to separate logic of tracing and delivering those traces
to different channels. Default ETW subscriber works out of the box. For all
other channels in-process subscribers can be used for data delivery.

![event-source-listeners](event-source-listeners.png)

## EventSource vs. using SDK itself

1. No support for `IsEnabled` when exporter/listener exists. It's important for
verbose logging.
2. ETW channel is not supported.
3. In-process subscription/extensibility is not supported.
4. Logging should be more reliable then SDK itself.
Binary file added docs/event-source-listeners.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 63 additions & 0 deletions src/OpenCensus/Implementation/OpenCensusEventSource.cs
@@ -0,0 +1,63 @@
// <copyright file="OpenCensusEventSource.cs" company="OpenCensus Authors">
// Copyright 2018, OpenCensus Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// </copyright>

namespace OpenCensus.Implementation
{
using System;
using System.Diagnostics.Tracing;
using System.Globalization;
using System.Threading;

[EventSource(Name = "OpenCensus-Base")]
internal class OpenCensusEventSource : EventSource
{
public static readonly OpenCensusEventSource Log = new OpenCensusEventSource();

[NonEvent]
public void ExporterThrownExceptionWarning(Exception ex)
{
if (Log.IsEnabled(EventLevel.Warning, EventKeywords.All))
{
this.ExporterThrownExceptionWarning(ToInvariantString(ex));
}
}

[Event(1, Message = "Exporter failed to export items. Exception: {0}", Level = EventLevel.Warning)]
public void ExporterThrownExceptionWarning(string ex)
{
this.WriteEvent(1, ex);
}

/// <summary>
/// Returns a culture-independent string representation of the given <paramref name="exception"/> object,
/// appropriate for diagnostics tracing.
/// </summary>
private static string ToInvariantString(Exception exception)
{
CultureInfo originalUICulture = Thread.CurrentThread.CurrentUICulture;

try
{
Thread.CurrentThread.CurrentUICulture = CultureInfo.InvariantCulture;
return exception.ToString();
}
finally
{
Thread.CurrentThread.CurrentUICulture = originalUICulture;
}
}
}
}
4 changes: 2 additions & 2 deletions src/OpenCensus/Trace/Export/SpanExporterWorker.cs
Expand Up @@ -20,6 +20,7 @@ namespace OpenCensus.Trace.Export
using System.Collections.Concurrent;
using System.Collections.Generic;
using OpenCensus.Common;
using OpenCensus.Implementation;

internal class SpanExporterWorker : IDisposable
{
Expand Down Expand Up @@ -139,8 +140,7 @@ private void Export(IEnumerable<ISpanData> export)
}
catch (Exception ex)
{
// TODO Log warning
Console.WriteLine(ex);
OpenCensusEventSource.Log.ExporterThrownExceptionWarning(ex);
}
}
}
Expand Down

0 comments on commit d2b4b35

Please sign in to comment.