title | perex | date | author | proofreading | preferredLang |
---|---|---|---|---|---|
Monitor |
evitaDB's monitoring facilities are designed to help you monitor running evitaDB instances as well as to help you
optimise your application during development. All monitoring facilities are based on our operational experience and
development of e-commerce projects.
|
17.1.2023 |
Ing. Jan Novotný |
done |
java |
Work in progress
This article will contain description of Evita monitoring facilities - would it be directly Prometheus or OpenTelemetry. There should be also information how to log slow queries or see other problems within application (logging). The functionality is not finalized - see issue #18.
evitaDB uses the SLF4J logging facade for logging both application log messages and access log messages. By default only application log messages are enabled, the access log messages must be explicitly enabled in configuration.
If the accessLog
property is set to true
in the configuration, the server will log
access log messages for all APIs using the
Slf4j logging facade. These messages are logged at the INFO
level and contain the ACCESS_LOG
marker which you can use to separate standard messages from access log messages.
Access log messages can be further categorized using UNDERTOW_ACCESS_LOG
and GRPC_ACCESS_LOG
markers. This is because
evitaDB uses Undertow web server for REST and GraphQL APIs and separate web server
for gRPC. It might be sometimes useful to log these separately because even though they both
use the same log format, for example, gRPC doesn't support all properties as Undertow.
evitaDB server comes ready with several custom utilities for easier configuration of the custom logged data.
Note: These utilities are only available in evitaDB server because the rest of the evitaDB codebase doesn't rely on a concrete implementation of the Slf4j logging facade. If the evitaDB is used as embedded instance, the following tools are not available, but can be used as reference to custom implementation in chosen framework.
The basic utilities are two Logback filters ready-to-use to easily separate access log messages from app log messages.
There is io.evitadb.server.log.AccessLogFilter
filter to only log access log messages.
This filter can be used as follows:
<appender name="FILE" class="ch.qos.logback.core.FileAppender">
<filter class="io.evitadb.server.log.AccessLogFilter"/>
<file>/path/to/access.log</file>
<encoder>
<pattern>%msg%n</pattern>
</encoder>
</appender>
There is also io.evitadb.server.log.AppLogFilter
filter to only log standard log messages.
This filter can be used as follows:
<appender name="FILE" class="ch.qos.logback.core.FileAppender">
<filter class="io.evitadb.server.log.AppLogFilter"/>
<file>/evita/logs/evita_server.log</file>
<encoder>
<pattern>%date %level [%thread] %logger{10} [%file:%line] -%kvp- %msg%n</pattern>
</encoder>
</appender>
This filter exists because when you enable access logs the log messages with the ACCESS_LOG
marker aren't filtered out
by default.
If a log aggregator is used to consume evitaDB log messages, it is often useful to app log messages as one-line JSON objects. Therefore, there is Logback layout ready-to-use to easily log app log messages as JSON objects. This layout logs messages as JSON objects and makes sure that everything is properly escaped, even newline characters in log messages (e.g. stack traces).
The layout is the io.evitadb.server.log.AppLogJsonLayout
layout to log app log messages, and can be used as follows:
<configuration>
<!-- ... -->
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<target>System.out</target>
<filter class="io.evitadb.server.log.AppLogFilter" />
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
<layout class="io.evitadb.server.log.AppLogJsonLayout"/>
</encoder>
</appender>
<!-- ... -->
</configuration>
The evitaDB server provides endpoints for Kubernetes readiness and liveness probes. The liveness probe is also c
onfigured as healthcheck by default in our Docker image. By default the health check waits 30s
before it
starts checking the server health, for larger databases you may need to increase this value using environment variable
HEALTHCHECK_START_DELAY
so that they have enough time to be loaded into memory.
The healthcheck in the Docker image is configured to use the default system API port, which is 5557
. If you change
the port, the health check will immediately report an unhealthy container because it won't be able to reach the probe
endpoint. You need to specify the new port using the SYSTEM_API_PORT
environment variable of the Docker container.
Both probes are available in the system
API and are accessible at the following endpoints:
curl -k "http://localhost:5557/system/readiness" \
-H 'Content-Type: application/json'
The probe will return 200 OK
if the server is ready to accept traffic, otherwise it will return 503 Service Unavailable
.
Probe internally calls all enabled APIs via HTTP call on the server side to check if they are ready to serve traffic.
Example response:
{
"status": "READY",
"apis": {
"rest": "ready",
"system": "ready",
"graphQL": "ready",
"lab": "ready",
"observability": "ready",
"gRPC": "ready"
}
}
The overall status may be one of the following constants:
- STARTING
- At least one API is not yet ready.
- READY
- The server is ready to serve traffic.
- STALLING
- At least one API that was ready is not ready anymore.
- SHUTDOWN
- Server is shutting down. None of the APIs are ready.
Each of the enabled APIs has its own status so that you can see which particular API is not ready in case of STARTING
or STALLING
status.
curl -k "http://localhost:5557/system/liveness" \
-H 'Content-Type: application/json'
If the server is healthy, the probe will return 200 OK
. Otherwise, it will return 503 Service Unavailable
.
Example response:
{
"status": "healthy",
"problems": []
}
If the server is unhealthy, the response will list the problems.
- MEMORY_SHORTAGE
- Signalized when the consumed memory never goes below 85% of the maximum heap size and the GC tries to free the old generation at least once. This leads to repeated attempts of expensive old generation GC and pressure on host CPUs.
- INPUT_QUEUES_OVERLOADED
- Signalized when the input queues are full and the server is not able to process incoming requests. The problem is reported when there is ration of rejected tasks to accepted tasks >= 2. This flag is cleared when the rejection ratio decreases below the specified threshold, which signalizes that server is able to process incoming requests again.
- JAVA_INTERNAL_ERRORS
- Signaled when there are occurrences of Java internal errors. These errors are usually caused by the server itself and are not related to the client's requests. Java errors signal fatal problems inside the JVM.
- EXTERNAL_API_UNAVAILABLE
- Signalized when the readiness probe signals that at least one external API, that is configured to be enabled doesn't respond to internal HTTP check call.
In order to monitor which requests each client executes against evitaDB, each client and each request can be identified by a unique identifier. In this way, evitaDB calls can be grouped by requests and clients. This may be useful, for example, to see if a particular client is executing queries optimally and not creating unnecessary duplicate queries.
Both identifiers are provided by the client itself. The client identifier is expected to be a constant for a particular
client, e.g. Next.js application
, and will group together all calls to a evitaDB from this client.
The request identifier is expected to be a UUID
but can be any string value, and will group together all evitaDB calls with this request identifier for a particular client.
The request definition (what a request identifier represents) is up to the client to decide, for example, a single request
for JavaScript client may group together all evitaDB calls for a single page render.
This mechanism is not part of an evitaQL language. Check documentation for your specific client for more information.
If you are using the Java remote client, you are suggested to provide the clientId
in
evita_external_api/evita_external_api_grpc/client/src/main/java/io/evitadb/driver/config/EvitaClientConfiguration.java
for all requests. The requestId
is then provided by wrapping your code in a lambda passed to executeWithRequestId
method on evita_api/src/main/java/io/evitadb/api/EvitaSessionContract.java interface.
Provide the client and request ids to the server
If you use embedded variant of evitaDB server there is no sense to provide clientId
since there is only one client.
The requestId
is then provided the same way as described above.
To pass the request identification using GraphQL API, our GraphQL API utilizes GraphQL extensions.
Therefore, to pass request identification information to the evitaDB, pass the following JSON object within the extensions
property of a GraphQL request:
"clientContext": {
"clientId": "Next.js application",
"requestId": "05e620b2-5b40-4932-b585-bf3bb6bde4b3"
}
Both identifiers are optional.
In order to pass request identification using REST API, our REST API utilizes HTTP headers. Therefore, to pass request identification information to the evitaDB, pass the following HTTP headers:
X-EvitaDB-ClientID: Next.js application
X-EvitaDB-RequestID: 05e620b2-5b40-4932-b585-bf3bb6bde4b3
Both headers are optional.
These identifiers can be also used to group application log messages by clients and requests for easier debugging of
errors that happened during a specific request. This is done using MDC
support. evitaDB passes the client and request identifiers to the MDC context under clientId
and requestId
names.
The specific usage depends on the used implementation of the SLF4J logging facade. For example
in Logback this can be done using %X{clientId}
and %X{requestId}
patterns in the log pattern:
<encoder>
<pattern>%d{HH:mm:ss.SSS} %-5level %logger{10} C:%X{clientId} R:%X{requestId} - %msg%n</pattern>
</encoder>