LogBuddy is a log-ingestion and rule-based alerting system built as three Java microservices:
SparkProcessingreads logs from streaming or file-based sources with Apache Spark Structured Streaming.DataProcessingreceives parsed log batches over gRPC, evaluates configurable rules, and sends alerts to HTTP endpoints.ControlPanelappears intended to be a small HTTP gateway for controlling the other services, but it is currently incomplete.
The project solves a common operational problem: collecting logs from multiple platforms, normalizing them into a common structure, checking them against detection rules, and triggering alerts when combinations of rule conditions are met.
At a high level:
- Spark handles ingestion and parsing.
- A Spring Boot service handles rule evaluation and alert dispatch.
- A lightweight control layer is intended to expose operational endpoints such as health, reload, sleep/wake, and query listing/stopping.
SparkProcessingis the ingestion engine.DataProcessingis the decision engine.ControlPanelis meant to be the operator-facing entrypoint.
SparkProcessingcommunicates withDataProcessingusing gRPC client-streaming.DataProcessingcommunicates with external alert receivers using HTTP POST.ControlPanelis intended to communicate with the other services using REST over HTTP.
+--------------------+
| External Log |
| Sources |
| - Kafka |
| - Pulsar |
| - Files |
| - Delta/Iceberg |
| - Hudi |
| - Socket/Rate |
+---------+----------+
|
| Spark Structured Streaming
v
+--------------------+ gRPC stream +----------------------+
| SparkProcessing | ------------------------> | DataProcessing |
| - reads streams | | - evaluates rules |
| - parses logs | | - tracks sessions |
| - batches entries | | - triggers alerts |
+---------+----------+ +----------+-----------+
^ |
| HTTP control | HTTP POST
| v
+---------+----------+ +----------------------+
| ControlPanel | | Alert Endpoints |
| - intended gateway | | Slack/webhook/API |
| - proxies commands | | destinations |
+--------------------+ +----------------------+
ControlPanel looks like a planned Spring Boot REST gateway that should expose operational endpoints for the other services. Based on the controllers, it is meant to forward requests to Spark and Data Processing.
Important note: this service is currently not production-ready in the repository. Its Main.java is still the IntelliJ template starter, there is no @SpringBootApplication, and the controllers do not yet match the Spark service paths exactly.
- Java 21
- Spring Boot 4
- Spring Web
RestTemplate
- Proxy control requests to
SparkProcessing - Potentially proxy control requests to
DataProcessing - Provide a single operator-facing HTTP API
ControlPanel/pom.xmlControlPanel/src/main/java/com/logbuddy/control/panel/Main.javaControlPanel/src/main/java/com/logbuddy/control/panel/config/WebConfig.javaControlPanel/src/main/java/com/logbuddy/control/panel/controller/ControlPanelController.javaControlPanel/src/main/java/com/logbuddy/control/panel/controller/DataProcessingControlPanelController.javaControlPanel/src/main/java/com/logbuddy/control/panel/controller/SparkControlPanelController.java
These endpoints are defined in code, but the service is not currently runnable as-is:
GET /spark/reload-settingsGET /spark/stop-query/{queryId}GET /spark/list-queries
Observed issues:
SPARK_HOSTis just"localhost"and does not include a scheme or port.- The controller expects
/reload-settingsand/stop-query, whileSparkProcessingactually exposes/reload-configand/terminate-query. DataProcessingControlPanelControllerhas no implemented endpoints yet.
DataProcessing is the core rule engine. It loads JSON configuration files, maintains per-data-source processing state, accepts parsed log entries over gRPC, applies rules, groups rule completions into alerts, and sends alerts to configured HTTP endpoints.
- Java 21
- Spring Boot 4
- Spring Web
- Spring WebFlux
WebClient - Spring gRPC
- Protocol Buffers / gRPC
- Log4j2
- Lombok
- Start a gRPC server for log ingestion
- Load application, data source, and rule configuration from disk
- Maintain in-memory rule sessions and alert sessions
- Evaluate rule checks against incoming log entries
- Send alerts asynchronously to configured HTTP endpoints
- Expose operational REST endpoints for health and sleep/wake control
DataProcessing/pom.xmlDataProcessing/src/main/resources/application.propertiesDataProcessing/src/main/proto/ingest.protoDataProcessing/src/main/java/com/alexander/processing/Main.javaDataProcessing/src/main/java/com/alexander/processing/ProcessingContext.javaDataProcessing/src/main/java/com/alexander/processing/controller/ControlPanelController.javaDataProcessing/src/main/java/com/alexander/processing/data/config/AppSettingsConfig.javaDataProcessing/src/main/java/com/alexander/processing/data/service/ds/DataSourceIngestService.javaDataProcessing/src/main/java/com/alexander/processing/data/service/ds/DataProcessingService.javaDataProcessing/src/main/java/com/alexander/processing/data/service/rule/RuleProcessingService.javaDataProcessing/src/main/java/com/alexander/processing/data/service/alert/AlertingService.java
REST endpoints:
GET /api/control-panel/healthGET /api/control-panel/statusGET /api/control-panel/sleepGET /api/control-panel/wakeGET /api/control-panel/restartreturns501 Not Implemented
gRPC service:
service IngestService {
rpc ingest (stream IngestRequest) returns (IngestResponse);
}IngestRequest contains:
dsName: data source namelogEntries: repeated normalized log entries
IngestResponse contains:
received: number of streamed request messages received
The rule engine currently supports these check types:
LogLevelCheckDataRegexMatchCheckMessageLengthCheckTimestampCheck
The service expects these files to exist:
/opt/logbuddy/config/ds.conf/opt/logbuddy/config/rule.conf/opt/logbuddy/config/app.conf
From the settings models, these files appear to contain:
ds.conf: data sources, log formats, required rules, alert definitions, schedulesrule.conf: named rules and their check definitionsapp.conf:controlPanelServerPortandgrpcSettings
SparkProcessing is the ingestion microservice. It reads configured streaming sources with Spark Structured Streaming, parses each record into a shared log structure, buffers records into batches, and streams those batches to DataProcessing over gRPC.
It also starts a small embedded HTTP server for runtime controls such as reloading config and stopping or listing active Spark queries.
- Java 21
- Apache Spark 4.1 Structured Streaming
- Scala 2.13 Spark artifacts
- gRPC / Protocol Buffers
- Log4j2
- Lombok
- JDK built-in
HttpServer
- Load application and data-source configuration from disk
- Start and manage Spark streaming queries
- Support multiple source connectors
- Parse JSON, logfmt, and table-style logs
- Push parsed log batches to
DataProcessingover gRPC - Expose simple HTTP control endpoints for query management
SparkProcessing/pom.xmlSparkProcessing/src/main/java/com/alexander/spark/Main.javaSparkProcessing/src/main/java/com/alexander/spark/RuntimeContext.javaSparkProcessing/src/main/java/com/alexander/spark/controlpanel/controller/ControlPanelController.javaSparkProcessing/src/main/java/com/alexander/spark/controlpanel/service/ControlPanelService.javaSparkProcessing/src/main/java/com/alexander/spark/job/service/SparkService.javaSparkProcessing/src/main/java/com/alexander/spark/job/service/QueryScheduler.javaSparkProcessing/src/main/java/com/alexander/spark/job/service/GrpcWriter.javaSparkProcessing/src/main/proto/ingest.proto
HTTP endpoints exposed by the embedded server:
GET /control-plane/statusGET /control-plane/reload-configGET /control-plane/terminate-querywith headerQuery-Id: <data-source-name>GET /control-plane/list-queries
Behavior notes:
terminate-queryuses the request header namedQuery-Id, but the value passed intoControlPanelService.stopQueryis actually treated as the data source name, not the Spark UUID.reload-configstops all active queries, reloads config from disk, and reschedules all queries.
The connector enum shows support for:
- Kafka
- Pulsar
- File text streams
- Delta Lake
- Apache Iceberg
- Apache Hudi
- Socket streams
- Spark rate source
- JSON
- LOGFMT
- TABLE
- CUSTOM is present in the model, but the Spark parser side does not implement a dedicated custom parser yet
This is the intended end-to-end flow for a single log stream:
SparkProcessingloadsds.confandapp.conf.- For each configured data source,
QuerySchedulerschedules a Spark Structured Streaming query. - A connector reads new records from Kafka, Pulsar, files, lakehouse tables, or another configured platform.
SparkServicechooses the appropriate parser based onlogFormat.logType().- Parsed rows become
LogEntryDTOobjects. GrpcWriterbuffers entries and sends them toDataProcessingusing theIngestService.ingestclient-streaming gRPC API.DataSourceIngestServicereceives each streamed request and forwards the entries toDataProcessingService.DataProcessingServicelooks up the data source definition and evaluates each required rule.- Rule results are accumulated inside in-memory processing sessions and alert sessions.
- When all required rule completions for an alert condition are satisfied,
AlertingServicesends the alert to one or more configured HTTP endpoints.
- Java 21
- Spring Boot 4
- Spring Web
- Spring WebFlux
- Spring gRPC
- Apache Spark 4.1
- Protocol Buffers
- gRPC
- Maven
- Log4j2
- Lombok
- JDK
HttpServer - Optional stream and storage integrations:
- Kafka
- Pulsar
- Delta Lake
- Apache Iceberg
- Apache Hudi
- Kinesis connector class exists in source, though the dependency setup should be verified separately
- JDK 21
- Maven 3.9+
- A machine capable of running local Spark jobs
- Access to whatever input platform your data source config references
- Writable config directory matching the hard-coded paths used by the services
Recommended directory:
/opt/logbuddy/config
On Windows, you will likely need to adapt the code or create equivalent paths because both SparkProcessing and DataProcessing currently expect Linux-style absolute paths.
- Clone the repository.
- Create the required config directory.
- Create the config files:
app.confds.confrule.conf
- Build each service with Maven.
Example build commands:
cd ControlPanel
mvn clean package
cd ../DataProcessing
mvn clean package
cd ../SparkProcessing
mvn clean packageNo required environment variables are defined in the codebase.
Instead, the project relies mostly on hard-coded config file paths:
/opt/logbuddy/config/app.conf
/opt/logbuddy/config/ds.conf
/opt/logbuddy/config/rule.conf
If you want a more portable setup, a good improvement would be to make these paths configurable via environment variables.
There are no sample config files in the repository, so this example is inferred from the Java record classes and should be treated as a starting point, not guaranteed final schema.
{
"serverPort": 8081,
"controlPanelServerPort": 8080,
"grpcSettings": {
"serverHost": "localhost",
"serverPort": 9090,
"maxLinesPerReq": 100
}
}Notes:
SparkProcessingappears to useserverPort.DataProcessingappears to usecontrolPanelServerPortandgrpcSettings.- Sharing one file between both services may require both fields to be present.
{
"dataSources": {
"app-logs": {
"name": "app-logs",
"path": "/var/log/app.log",
"pathInfo": {
"platform": "FILE_TEXT",
"location": "/var/log/input",
"options": {
"maxFilesPerTrigger": "1"
}
},
"logFormat": {
"logType": "JSON",
"defaultFields": {
"timestamp": "timestamp",
"timestampFormat": "yyyy-MM-dd HH:mm:ss",
"level": "level",
"message": "message",
"source": "source",
"data": "data",
"logger": "logger"
},
"customFields": {
"requestId": "STRING"
}
},
"requiredRules": ["error-level-rule"],
"alertData": {
"error-alert": {
"alertName": "error-alert",
"requiredRules": ["error-level-rule"],
"timeWindowMillis": 60000,
"alertEndpoints": ["http://localhost:9000/webhook"],
"aiOverviewEnabled": false
}
},
"schedule": {
"delayAfterStartUpMillis": 1000,
"intervalsMillis": []
}
}
}
}{
"rules": {
"error-level-rule": {
"ruleName": "error-level-rule",
"check": {
"level": "ERROR"
},
"logTargetCount": 1,
"maxCompletionsPerAlert": 1
}
}
}Important caveat:
- The exact JSON polymorphism for
checkobjects is not obvious from the code alone. The repository defines the check classes, but there is no sample configuration showing how Jackson distinguishes betweenLogLevelCheck,TimestampCheck,MessageLengthCheck, andDataRegexMatchCheck.
cd DataProcessing
mvn spring-boot:runExpected defaults:
- gRPC server port:
9090fromapplication.properties - Config files:
/opt/logbuddy/config/ds.conf/opt/logbuddy/config/rule.conf/opt/logbuddy/config/app.conf
cd SparkProcessing
mvn clean package
java -jar target/spark-processing.jarNotes:
- The final shaded JAR name should be
target/spark-processing.jarbecausefinalNameisspark-processing. - The service starts Spark locally with
local[*]. - It also starts an embedded HTTP server on the
serverPortvalue fromapp.conf.
The current codebase does not contain a runnable Spring Boot bootstrap class for this service, so these instructions are tentative:
cd ControlPanel
mvn clean packageBefore this service can run properly, it likely needs:
@SpringBootApplicationinMain.java- a valid server port configuration
- proper downstream base URLs including
http://and ports - endpoint path alignment with
SparkProcessing
Because ControlPanel is incomplete, the most reliable usage examples are against DataProcessing and SparkProcessing directly.
curl -i http://localhost:8080/api/control-panel/healthExpected response:
HTTP/1.1 200 OKThe exact HTTP port is not explicit in application.properties, so unless another config overrides it, Spring Boot default port 8080 is the most likely assumption.
curl -i http://localhost:8080/api/control-panel/sleepExpected behavior:
- Service returns
200 OK - Incoming gRPC batches are ignored while the sleep flag is enabled
curl -i http://localhost:8080/api/control-panel/wakecurl -i http://localhost:8081/control-plane/statusExpected response:
HTTP/1.1 200 OKAssumption:
8081is just an example. Use theserverPortvalue fromapp.conf.
curl -i http://localhost:8081/control-plane/list-queriesExpected response:
["app-logs", "security-stream"]curl -i http://localhost:8081/control-plane/reload-configExpected behavior:
- Existing queries are stopped
- Config files are reloaded from disk
- Queries are scheduled again
curl -i -H "Query-Id: app-logs" http://localhost:8081/control-plane/terminate-queryExpected behavior:
- The named query is stopped
- The header name says
Query-Id, but the value used in code is the data source name
- The project title should remain
LogBuddy, based on the root README, package names, Spark app name, and config paths. ControlPanelis intended to be a Spring Boot API gateway, even though its bootstrap class is unfinished.DataProcessingREST API likely runs on Spring Boot default port8080unless overridden elsewhere.SparkProcessingruns its HTTP API onapp.conf.serverPort.- Config files are JSON, even though they use
.confextensions.
ControlPanelis incomplete and currently not runnable as a real microservice.ControlPanelendpoint names do not match the endpoint names implemented bySparkProcessing.SparkProcessingcontroller path matching usesgetRequestURI().getHost()in a way that may not behave as intended.DataProcessinghascontrolPanelServerPortin config, but its REST server port is not directly wired from that field in the visible code.DataSource.pathexists inDataProcessingbutSparkProcessingusespathInfoinstead; this suggests the two services may expect slightly different config models.rule.confpolymorphic deserialization format for thecheckfield is not obvious without example config.DataProcessing/Dockerfilelooks inconsistent:- it copies
build/logBuddyProcessing-exec.jar - it exposes port
6969 - it runs
logBuddy-exec.jarThese names do not match each other cleanly.
- it copies
- Add a root parent Maven project for the three services.
- Provide sample
app.conf,ds.conf, andrule.conf. - Finish the
ControlPanelbootstrap and align its proxied routes with the downstream services. - Replace hard-coded config paths with environment variables.
- Add OpenAPI or gRPC documentation.
- Add integration tests for Spark-to-gRPC-to-alert flow.
- Add durable state or persistence if alerts must survive restarts.
LogBuddy/
|-- ControlPanel/
| |-- pom.xml
| `-- src/main/java/com/logbuddy/control/panel/
| |-- config/
| `-- controller/
|-- DataProcessing/
| |-- pom.xml
| |-- Dockerfile
| |-- src/main/java/com/alexander/processing/
| | |-- controller/
| | |-- data/
| | | |-- config/
| | | |-- model/
| | | `-- service/
| | |-- error/
| | |-- settings/
| | `-- util/
| |-- src/main/proto/
| `-- src/main/resources/
|-- SparkProcessing/
| |-- pom.xml
| |-- src/main/java/com/alexander/spark/
| | |-- controlpanel/
| | |-- ds/
| | |-- grpc/
| | |-- job/
| | |-- log/
| | |-- settings/
| | `-- util/
| |-- src/main/proto/
| `-- src/main/resources/
`-- logs/
ControlPanel/: intended operator-facing proxy serviceDataProcessing/: Spring Boot rule engine and alert senderSparkProcessing/: Spark ingestion and parsing enginelogs/: runtime log outputs already present in the repository
- Make all ports and config paths environment-driven.
- Add Docker support for all three services, not just one partial Dockerfile.
- Add
docker-compose.ymlor Kubernetes manifests for local orchestration. - Add example webhook receiver for testing alerts.
- Add retry and dead-letter behavior for failed alert deliveries.
- Add metrics and tracing for Spark ingestion, gRPC throughput, and rule matches.
- Clarify configuration schema with JSON Schema or YAML examples.
- Persist alert and session state if this system needs crash recovery.
- Standardize API naming:
reload-configvsreload-settingsterminate-queryvsstop-querycontrol-planevscontrol-panel