Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 21 additions & 153 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,170 +1,38 @@
# Spring Boot Security & Observability Lab

This repository is a hands-on lab designed to demonstrate the architectural evolution of a modern Java application. We will build a system from the ground up, starting with a secure monolith and progressively refactoring it into a fully observable, distributed system using cloud-native best practices.
This repository is an advanced, hands-on lab demonstrating the architectural evolution of a modern Java application. We will build a system from the ground up, starting with a secure monolith and progressively refactoring it into a fully observable, distributed system using cloud-native best practices.

---

## Lab Progress: Phase 5 - Correlated Logs & Access Auditing
## Workshop Guide: The Evolutionary Phases

The `main` branch currently represents the completed state of **Phase 5**.
This lab is structured in distinct, self-contained phases. The `main` branch always represents the latest completed phase. To explore a previous phase's code and detailed documentation, use the links below.

* **Git Tag for this Phase:** `v5.0-correlated-logs-auditing`

### Objective

The goal of this phase was to complete the "three pillars of observability" by introducing a centralized, structured logging pipeline. We have also added a critical security layer by implementing a non-invasive, AOP-based audit logging mechanism. The system is now not only fully observable (metrics, traces, and logs), but all three pillars are correlated, allowing for seamless navigation from a distributed trace directly to the logs generated during that specific transaction.

### Key Concepts Demonstrated

* **Centralized Logging:** Introducing Grafana Loki as a scalable, efficient log aggregation system.
* **Unified Telemetry Collection:** Adopting Grafana Alloy as the modern, state-of-the-art agent for collecting **both logs and traces**, replacing older, single-purpose agents.
* **Docker Service Discovery:** Configuring Alloy to use the Docker socket to automatically discover and scrape logs from all running containers, creating a "zero-touch" logging pipeline that scales automatically.
* **Trace-to-Log Correlation:** Configuring Grafana to provide one-click navigation from a trace span in Tempo to the exact logs in Loki that correspond to that trace ID.
* **Structured JSON Logging:** Ensuring all application logs are emitted as single-line, machine-readable JSON, a critical prerequisite for reliable parsing and querying.
* **Aspect-Oriented Programming (AOP):** Creating a shared `lab-aspects` module to implement cross-cutting concerns without modifying business logic.
* **Custom Audit Logs & Metrics:** Building `@Auditable` aspect that generates rich, structured audit logs and corresponding Micrometer metrics (`Counter` and `Timer`) for security monitoring and alerting.
* **Multi-Module Maven Project:** Refactoring the build to support a shared library module and creating robust, multi-module-aware `Dockerfiles`.

### Architecture Overview

Phase 5 enriches our distributed system with a complete, correlated observability pipeline managed by Grafana Alloy.

```mermaid
graph TD
subgraph "User's Machine"
B[Browser]
end

subgraph "Docker Compose Network (lab-net)"
P[Traefik Proxy]

subgraph "Application Services"
WC[Web Client]
RS[Resource Server]
end

subgraph "Identity Services"
KC[Keycloak]
DB[(PostgreSQL)]
end

subgraph "Observability Stack"
A[Alloy Agent]
L[Loki]
T[Tempo]
Prom[Prometheus]
G[Grafana]
end
end

%% User and Service Flows (Unchanged)
B -- "User Interaction" --> P --> WC;
WC -- "Backend API Call" --> RS;
RS -- "Token Validation" --> KC;

%% NEW: Observability Data Flow
subgraph "Telemetry Collection"
RS -- "1a. Emits Traces (OTLP)" --> A;
WC -- "1b. Emits Traces (OTLP)" --> A;
RS -- "2a. Writes Logs (stdout)" --> Docker;
WC -- "2b. Writes Logs (stdout)" --> Docker;
Docker -- "3. Scraped by" --> A;
end

subgraph "Telemetry Processing & Storage"
A -- "4a. Forwards Traces" --> T;
A -- "4b. Forwards Logs" --> L;
RS -- "5. Exposes Metrics" --> Prom;
WC -- "6. Exposes Metrics" --> Prom;
end

subgraph "Visualization"
G -- "Queries Traces" --> T;
G -- "Queries Logs" --> L;
G -- "Queries Metrics" --> Prom;
end
```

1. **[Grafana Loki](config/loki/loki-config.yml):** The new log storage backend. It is configured to run in a simple, single-tenant mode and stores its data in a persistent Docker volume.
2. **[Grafana Alloy](config/alloy/alloy-config.river):** The new heart of our collection pipeline. It performs two critical functions:
* **Log Collection:** It connects to the Docker socket to discover our running application containers, scrapes their `stdout` log streams, and forwards them to Loki.
* **Trace Collection:** It acts as an OTLP endpoint, receiving traces from our applications' Java agents and forwarding them to Tempo.
| Phase | Description & Key Concepts | Code & Docs (at tag) | Key Pull Requests |
|:-----------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **1. The Secure Monolith** | A standalone service that issues and validates its own JWTs. Concepts: `AuthenticationManager`, custom `JwtAuthenticationFilter`, `jjwt` library, and a foundational CI pipeline. | [`v1.0-secure-monolith`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v1.0-secure-monolith) | [#2](https://github.com/apenlor/spring-boot-security-observability-lab/pull/2), [#3](https://github.com/apenlor/spring-boot-security-observability-lab/pull/3), [#4](https://github.com/apenlor/spring-boot-security-observability-lab/pull/4) |
| **2. Observing the Monolith** | The service is containerized and orchestrated via `docker-compose`. Concepts: Micrometer, Prometheus, Grafana, custom metrics, and automated dashboard provisioning. | [`v2.0-observable-monolith`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v2.0-observable-monolith) | [#6](https://github.com/apenlor/spring-boot-security-observability-lab/pull/6) |
| **3. Evolving to Federated Identity** | The system is refactored into a multi-service architecture with an external IdP. Concepts: Keycloak, OIDC, OAuth2 Client (`web-client`) vs. Resource Server, Traefik reverse proxy, service-to-service security. | [`v3.0-federated-identity`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v3.0-federated-identity) | [#8](https://github.com/apenlor/spring-boot-security-observability-lab/pull/8) |
| **4. Tracing a Distributed System** | Services are instrumented with the OpenTelemetry agent to generate traces. Concepts: Tempo, agent-based instrumentation, W3C Trace Context, Service Graphs, and a hybrid PUSH/PULL metrics architecture. | [`v4.0-distributed-tracing`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v4.0-distributed-tracing) | [#10](https://github.com/apenlor/spring-boot-security-observability-lab/pull/10) |
| **5. Correlated Logs & Access Auditing** | The three pillars of observability are complete (metrics, traces, logs). Alloy is the unified collection agent. Concepts: Loki, Grafana Alloy, Docker service discovery, structured JSON logs, AOP-based auditing, trace-to-log correlation, and detailed audit metrics. | [`v5.0-correlated-logs-auditing`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v5.0-correlated-logs-auditing) | [#12](https://github.com/apenlor/spring-boot-security-observability-lab/pull/12) |
| **6. Proactive Alerting** | _Upcoming..._ | - | - |
| **7. Continuous Security Integration** | _Upcoming..._ | - | - |
| **8. Advanced Secret Management** | _Upcoming..._ | - | - |

---

### Key Configuration Details
## How to Follow This Lab

#### 1. Grafana Alloy & Docker Service Discovery

To achieve a fully automated logging pipeline the `alloy` service in our `docker-compose.yml` mounts the host's Docker socket (`/var/run/docker.sock`) in read-only mode.

This is a privileged operation, and the decision to use it is a deliberate architectural trade-off, as documented in the `docker-compose.yml`'s security disclaimer.
* **Benefit:** Alloy can query the Docker API to automatically discover every container on our project's network. It gets rich metadata like the `container_name` for free, which it uses to create labels in Loki. This means we can add new services, and our logging pipeline will **automatically start collecting their logs** with zero configuration changes.
* **Mitigation:** The risk is managed by using the official, minimalist Grafana Alloy image and mounting the socket as **read-only**. Anyway, not recommendable for production environments.

The [Alloy configuration](config/alloy/alloy-config.river) is written in the River (`.river`) language and defines a clear pipeline: discover Docker containers, filter them by our project's network, relabel them with a clean `container_name`, and forward their logs to Loki.

#### 2. AOP-based Audit Logging

To handle security auditing as a cross-cutting concern, we introduced a new, shared Maven module: [`lab-aspects`](lab-aspects).
* This module contains a custom `@Auditable` annotation and the `AuditLogAspect`.
* The aspect intercepts any method marked with `@Auditable` and performs two actions:
1. **Logs a Structured Event:** It uses SLF4J's Fluent API to create a rich, nested JSON object containing detailed context about the event (principal, roles, outcome, duration, sanitized request details, and exception info). These are logged to a dedicated `AUDIT` logger.
2. **Emits Metrics:** It records a `Counter` (`app.audit.events.total`) and a `Timer` (`app_audit_events_duration_seconds`) for every audit event. These metrics are tagged with low-cardinality labels (`method`, `outcome`), making them perfect for building high-performance dashboards and alerts.

This implementation is fully tested with its own integration test suite, which validates every feature, including the metric emission and context handling.
1. **Start with the `main` branch** to see the latest state of the project.
2. To go back in time, use the **"Code & Docs" link** for a specific phase. This will show you the `README.md` for that phase, which contains the specific instructions and examples for that version of the code.
3. To understand the *"why"* behind the changes, review the **Key Pull Requests** for each phase.

---

## Local Development & Quick Start

The prerequisites and setup are the same as in previous phases.

1. **Configure Local Hostnames (One-Time Setup, if not already done):**
Edit your local `hosts` file to add:
```
127.0.0.1 keycloak.local
```
2. **Create and Configure Your Environment File:**
```bash
cp .env.example .env
# ...then edit .env to add your WEB_CLIENT_SECRET from Keycloak.
```
3. **Build and run the entire stack:**
```bash
docker-compose up --build -d
```
4. **Access the Services:**
* **Web Client Application:** [http://localhost:8082](http://localhost:8082) (Login with `lab-user`/`lab-user` or
`lab-admin`/`lab-admin`)
* **Keycloak Admin Console:** [http://keycloak.local](http://keycloak.local) (Login with `admin`/`admin`)
* **Traefik Dashboard:** [http://localhost:8080](http://localhost:8080)
* **Prometheus UI:** [http://localhost:9090](http://localhost:9090)
* **Grafana UI:** [http://localhost:3000](http://localhost:3000) (Login with `admin`/`admin`)
---

## Validating the New Observability Features

1. **Generate Traffic:** Log in to the web client as `lab-user`/`lab-user` and click the "Call Secure API" and "Call Admin API" buttons several times.

2. **Validate Audit Logs:**
* In Grafana, go to Explore -> Loki.
* Run the query: `{container_name="resource-server"} | json | logger_name="AUDIT"`
* Inspect the logs. You will see the structured `audit` object with `outcome="SUCCESS"` for successful calls and `outcome="FAILURE"` for the denied admin call.

3. **Validate Audit Metrics:**
* In Grafana, go to Explore -> Prometheus.
* Run the query: `rate(app_audit_events_total{outcome="FAILURE"}[1m])`
* You should see the rate of failed audit events for the `getAdminData` method.
## Running the Project

4. **Validate Trace-to-Log Correlation:**
* Find a trace in Tempo for a `GET /fetch-data` operation.
* Click on the span for the `resource-server`.
* In the span details panel, a **blue "Logs for this span" button** will be visible.
* Clicking it will open Loki and show you the exact logs—including the audit log—for that specific trace.
To run the application and see usage examples for the **current phase**, please refer to the detailed instructions in its tagged `README.md` file.

#### Stop the Environment
**[>> Go to instructions for the current phase: `v5.0-correlated-logs-auditing` <<](https://github.com/apenlor/spring-boot-security-observability-lab/blob/v5.0-correlated-logs-auditing/docs/phase-5-readme.md#local-development--quick-start)**

```bash
docker-compose down -v
```
As the lab progresses, this link will always be updated to point to the latest completed phase.