Skip to content

Logging Agent Architecture

Dan Bason edited this page Jan 23, 2023 · 2 revisions

Logging Agent Architecture

Summary:

Log shipping is done using the Opensearch Data Prepper application. This is configured to write logs to the Opni Opensearch endpoint using the cluster specific indexing user. Data Prepper is set up with an http input for receiving logs. Data Prepper and its configuration are managed by a custom resource, and controller that ships with the Opni Agent.

Log collection is configured by the Banzaicloud Logging Operator. This is imported as a library and run with the other controllers by the Opni Agent. Opni will create a ClusterOutput that points to the Data Prepper http interface, a ClusterFlow that collects all pod logs, and a generic Logging. It will also deploy a cluster specific log scraper for control plane logs (if available).

When Logging is enabled the Logging plugin will also begin watching all Kubernetes events, which are sent to the Data Prepper http interface.

If the Opni Agent disconnects from the Gateway for any reason logging will automatically be disabled until a connection can be re-established.

Table of contents

Architecture:

High Level Architecture

Plugin interface

This receives SyncNow requests from the gateway. It then passes these to the cluster drivers to take the appropriate action based on whether the status is enabled or not.

The Logging status message contains the following data along with the enabled boolean

  • Opensearch username
  • Opensearch password
  • Opensearch external URL

Kubernetes cluster driver

This is the internal object responsible for managing the state of the Kubernetes resources used for Logging.

Responsibilities

If the Logging capability is enabled the driver will reconcile the Kubernetes resources. If the capability is being disabled the resources will be deleted. The resources it manages are as follows:

LogAdapter

This is a wrapper resource. It creates a Banzaicloud Logging, and creates a Fluentbit DaemonSet specific to the Kubernetes distribution.

ClusterFlow

Manages the log scraping configuration. This will exclude the Opni system namespace from log scraping.

ClusterOutput

Configures the Fluentd output to be the Opni Shipper Data Prepper

DataPrepper

Manages the Data Prepper configuration. It will be updated with the URL and username and password sent in the sync request. The password is stored in a Secret which is referenced in the DataPrepper resource.

Events cluster driver

This is the internal object responsible for gathering events in the agent cluster

Responsibilities

If the Logging capability is enabled the driver will start a watch on the Kubernetes events API. Events are buffered in an internal queue and sent as json documents to Data Prepper. If the capability is disabled the watch is canceled and the queue is purged.

Scale and performance:

Currently agent throughput doesn't represent a bottleneck so scaling and performance hasn't been extensively tested. If this changes in the future components may be scaled out using the custom resources.

Security:

Username and passwords are isolated to an individual cluster. The user account only has write access so should it be leaked it won't be able to read any data.

High availability:

System works on eventual consistency so HA isn't a design consideration. All components have internal buffering so if an upstream service become unavailable they will retry.

Testing:

All testing for this system is currently manual.

Clone this wiki locally