how is this related to existing systems like riemann? #29

timofeytt · 2020-07-25T15:38:38Z

What is the overlapping functionality with existing systems like riemann and how can they be used better together?

timofeytt · 2020-07-25T15:44:31Z

A note about metrics-clojure would be helpful, too.

BrunoBonacci · 2020-07-25T16:31:22Z

You are correct saying that Riemann is somehow overlapping with µ/log, in fact both systems are event-based systems although in Riemann the basic event is a metric event (an event that describes or samples a metric).
In µ/log, each event is a free-form, pure, event which means that you as in Reimann you have a bunch of categorical properties (tags) which you can use to "slice & dice" the events and group them the way you want, but, in opposition to Riemann, µ/log doesn't constrain the user to a single numerical field.
If an event needs multiple numerical properties to describe it fully in µ/log you can pack this information in a single event.

Another difference is that the core of Riemann is a streaming and aggregation engine which allows you to turn raw data into high level (meaningful) insights. µ/log (at this stage) is just a client to produce the raw info.
It is entirely possible to write a µ/log publisher to send µ/log events to Riemann in its expected format.

Regarding metrics-clojure the difference is more fundamental. metrics-clojure, like many other libraries, is basic of a metering system. Events happen on a remote system and get aggregated at the source, then, time to time, the metric is sampled and the sample is sent to a collection system. Because the events are aggregated at the source, you are not able to slice & dice the metrics at query time unless you have expressly captured that particular category.
I'm very familiar with this approach, I used it for many years and I even wrote a Clojure wrapper for it (TRACKit!). Some tools like Prometheus try to overcome the lack of categorical dimensions providing a hybrid approach, but still not as rich as µ/log.

The benefits of switching to an event-based system are enormous although not very apparent at the start.
Instrumenting your code with a metrics library and produce a rich set of metrics is a very tedious and time consuming.
For example, if you instrument only your webservice request handlers with µ/log you could ask the following questions:

how many requests I've received in the last week
how many requests by day/hour/minute/second
how many requests by user over time
how many requests by endpoint overtime
how many requests were failures (4xx or 5xx)
of the error request, how many were for a specific endpoint
which user issued the failing requests
what do they have different than the successful requests
what's the latency distribution of the successful request vs the failed requests
which content-type/content-encoding was used
what's the distribution of the failures by host/jvm
what's the status JVM metrics (GC/memory/etc) of failing hosts during that time.
what's the repartition of the latencies between internal processing and external connections (db query, caches, etc)
and much more. All this from 1 single good log instrumentation.

To achieve the same with a metrics system you will need several dozens of metrics to be collected and published.

µ/log works incredibly well with Elasticsearch which is an amazing tool to slice and dice the data the way you need.
One side of Elasticsearch which is not very well known is that Elasticsearch has a very fast and robust aggregation engine as well.

The final point is that traditional systems consider logs different from metrics and different from traces (the 3 pillars of observability), in reality, they are all different forms of events. For example, the same events that you can use for the logs and to capture metrics can represent traces. In µ/log, if you add a Zipkin publisher you get the traces collected and visualised as follow:

all this just come from simple µ/log instrumentation.

BrunoBonacci self-assigned this Jul 25, 2020

BrunoBonacci added the question A request for information of clarification label Jul 25, 2020

BrunoBonacci closed this as completed Aug 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how is this related to existing systems like riemann? #29

how is this related to existing systems like riemann? #29

timofeytt commented Jul 25, 2020

timofeytt commented Jul 25, 2020

BrunoBonacci commented Jul 25, 2020

how is this related to existing systems like riemann? #29

how is this related to existing systems like riemann? #29

Comments

timofeytt commented Jul 25, 2020

timofeytt commented Jul 25, 2020

BrunoBonacci commented Jul 25, 2020