apenlor · apenlor · Sep 4, 2025 · Sep 4, 2025
diff --git a/README.md b/README.md
@@ -1,122 +1,38 @@
 # Spring Boot Security & Observability Lab
 
-This repository is a hands-on lab designed to demonstrate the architectural evolution of a modern Java application. We will build a system from the ground up, starting with a secure monolith and progressively refactoring it into a fully observable, distributed system using cloud-native best practices.
+This repository is an advanced, hands-on lab demonstrating the architectural evolution of a modern Java application. We will build a system from the ground up, starting with a secure monolith and progressively refactoring it into a fully observable, distributed system using cloud-native best practices.
 
 ---
 
-## Lab Progress: Phase 6 - Proactive Alerting with Alertmanager
+## Workshop Guide: The Evolutionary Phases
 
-The `main` branch currently represents the completed state of **Phase 6**.
+This lab is structured in distinct, self-contained phases. The `main` branch always represents the latest completed phase. To explore a previous phase's code and detailed documentation, use the links below.
 
-*   **Git Tag for this Phase:** `v6.0-proactive-alerting`
-
-### Objective
-
-The goal of this phase was to transition our monitoring strategy from passive (dashboards) to **proactive**. We have integrated the Prometheus Alertmanager into our stack to create a system that can automatically detect and route notifications about problems, without requiring a human to be watching a screen. This demonstrates the completion of a production-grade monitoring feedback loop.
-
-### Key Concepts Demonstrated
-
-*   **Prometheus Alerting Pipeline:** Understanding the distinct roles of Prometheus (which evaluates rules and generates alerts) and Alertmanager (which receives, de-duplicates, groups, and routes alerts).
-*   **Declarative Alerting Rules:** Defining alerting conditions as code using PromQL expressions in a version-controlled YAML file.
-*   **Alerting on Technical & Security Metrics:** Creating two distinct types of alerts:
-    1.  A **technical alert** (`ApiServerErrorRateHigh`) that fires on infrastructure-level signals like a spike in 5xx server errors.
-    2.  A **security alert** (`UnauthorizedAdminAccessSpike`) that fires on application-level signals, such as an abnormal rate of `4xx` errors on a privileged endpoint.
-*   **Alert Lifecycle:** Observing the full lifecycle of an alert: `Inactive` -> `Pending` -> `Firing` -> `Resolved`.
-*   **UI-Driven Test Harness:** Building a dedicated "Alerting Test Panel" in our web application to reliably trigger alert conditions on demand, proving the entire pipeline works end-to-end.
-
-### Architecture Overview
-
-Phase 6 introduces Alertmanager and connects it to our existing Prometheus instance. The data flow for alerting is now a core part of our observability stack.
-
-```mermaid
-graph TD
-    subgraph "Application Services"
-        RS[Resource Server]
-        WC[Web Client]
-    end
-
-    subgraph "Observability Stack"
-        Prom[Prometheus] -->|1. Scrapes Metrics| RS
-        Prom -->|1. Scrapes Metrics| WC
-
-        subgraph "Alerting Pipeline"
-            Rules[alerts.yml] -->|2. Evaluates| Prom
-            Prom -->|3. Sends Firing Alerts| AM[Alertmanager]
-        end
-
-        G[Grafana]
-    end
-
-    subgraph "Operators / External Systems"
-        AM -->|4. Routes Notifications| Notif[Email, Slack, etc.]
-        Ops[Operator] -->|Views & Manages Alerts| AM
-        Ops -->|Views Dashboards| G
-    end
-```
-
-1.  **[Prometheus](config/prometheus/prometheus.yml):** Its role is expanded. It is now configured to load a [rule file](config/prometheus/alerts.yml) and to send any alerts that become "Firing" to the Alertmanager service. The `--web.external-url` flag is set to ensure backlinks are generated with a browser-resolvable hostname.
-2.  **[Alertmanager](config/alertmanager/alertmanager.yml):** The new central hub for all alerts. It receives alerts from Prometheus, groups them to reduce noise, and would (in a production setup) route them to configured receivers. For this lab, we use a "null" receiver.
+| Phase                                    | Description & Key Concepts                                                                                                                                                                                                                                               | Code & Docs (at tag)                                                                                                                    | Key Pull Requests                                                                                                                                                                                                                              |
+|:-----------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **1. The Secure Monolith**               | A standalone service that issues and validates its own JWTs. Concepts: `AuthenticationManager`, custom `JwtAuthenticationFilter`, `jjwt` library, and a foundational CI pipeline.                                                                                        | [`v1.0-secure-monolith`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v1.0-secure-monolith)                   | [#2](https://github.com/apenlor/spring-boot-security-observability-lab/pull/2), [#3](https://github.com/apenlor/spring-boot-security-observability-lab/pull/3), [#4](https://github.com/apenlor/spring-boot-security-observability-lab/pull/4) |
+| **2. Observing the Monolith**            | The service is containerized and orchestrated via `docker-compose`. Concepts: Micrometer, Prometheus, Grafana, custom metrics, and automated dashboard provisioning.                                                                                                     | [`v2.0-observable-monolith`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v2.0-observable-monolith)           | [#6](https://github.com/apenlor/spring-boot-security-observability-lab/pull/6)                                                                                                                                                                 |
+| **3. Evolving to Federated Identity**    | The system is refactored into a multi-service architecture with an external IdP. Concepts: Keycloak, OIDC, OAuth2 Client (`web-client`) vs. Resource Server, Traefik reverse proxy, service-to-service security.                                                         | [`v3.0-federated-identity`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v3.0-federated-identity)             | [#8](https://github.com/apenlor/spring-boot-security-observability-lab/pull/8)                                                                                                                                                                 |
+| **4. Tracing a Distributed System**      | Services are instrumented with the OpenTelemetry agent to generate traces. Concepts: Tempo, agent-based instrumentation, W3C Trace Context, Service Graphs, and a hybrid PUSH/PULL metrics architecture.                                                                 | [`v4.0-distributed-tracing`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v4.0-distributed-tracing)           | [#10](https://github.com/apenlor/spring-boot-security-observability-lab/pull/10)                                                                                                                                                               |
+| **5. Correlated Logs & Access Auditing** | The three pillars of observability are complete (metrics, traces, logs). Alloy is the unified collection agent. Concepts: Loki, Grafana Alloy, Docker service discovery, structured JSON logs, AOP-based auditing, trace-to-log correlation, and detailed audit metrics. | [`v5.0-correlated-logs-auditing`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v5.0-correlated-logs-auditing) | [#12](https://github.com/apenlor/spring-boot-security-observability-lab/pull/12)                                                                                                                                                               |
+| **6. Proactive Alerting**                | The system transitions from passive to proactive monitoring. Concepts: Alertmanager, declarative PromQL alert rules, alerting on technical vs. security metrics, and a UI-driven test harness.                                                                           | [`v6.0-proactive-alerting`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v6.0-proactive-alerting)             | [#14](https://github.com/apenlor/spring-boot-security-observability-lab/pull/14)                                                                                                                                                               |
+| **7. Continuous Security Integration**   | _Upcoming..._                                                                                                                                                                                                                                                            | -                                                                                                                                       | -                                                                                                                                                                                                                                              |
+| **8. Advanced Secret Management**        | _Upcoming..._                                                                                                                                                                                                                                                            | -                                                                                                                                       | -                                                                                                                                                                                                                                              |
 
 ---
 
-### Key Configuration Details
+## How to Follow This Lab
 
-#### 1. Prometheus Alert Rules
-
-The core of this phase is the [alerts.yml](config/prometheus/alerts.yml) file. We have defined two rules that are specifically tailored for our application and optimized for a lab environment with short `for` durations for rapid testing.
-
-*   **`ApiServerErrorRateHigh`:** This rule fires when the rate of `5xx` status codes from the `resource-server` exceeds 0 for a continuous period. It is designed to be triggered by our `ChaosController`.
-*   **`UnauthorizedAdminAccessSpike`:** This security-focused rule fires when the rate of `4xx` status codes on the specific `/api/secure/admin` endpoint exceeds 0. This is more robust than checking for just `403` as it captures any client-side error on this privileged endpoint, signaling a potential issue.
-
-#### 2. UI-Driven Test Harness
-
-To validate the entire alerting pipeline, we implemented a dedicated "Alerting Test Panel" in the `web-client`.
-*   The `ChaosController` in the `resource-server` was enhanced with a guaranteed-failure endpoint (`/api/chaos/error`).
-*   The `WebController` in the `web-client` was updated with two new `POST` endpoints that call the backend to generate `5xx` and `4xx` errors.
-
----
-
-## Local Development & Quick Start
-
-The prerequisites and setup are the same as in previous phases.
-
-1.  **Configure Local Hostnames (One-Time Setup, if not already done):**
-    Edit your local `hosts` file to add:
-    ```
-    127.0.0.1   keycloak.local
-    ```
-2.  **Create and Configure Your Environment File:**
-    ```bash
-    cp .env.example .env
-    # ...then edit .env to add your WEB_CLIENT_SECRET from Keycloak.
-    ```
-3.  **Build and run the entire stack:**
-    ```bash
-    docker-compose up --build -d
-    ```
-4.  **Access the Services:**
-    *   **Web Client Application:** [http://localhost:8082](http://localhost:8082) (Login with `lab-user`/`lab-user` or `lab-admin`/`lab-admin`)
-    *   **Keycloak Admin Console:** [http://keycloak.local](http://keycloak.local) (Login with `admin`/`admin`)
-    *   **Prometheus UI:** [http://localhost:9090](http://localhost:9090)
-    *   **Alertmanager UI:** [http://localhost:9093](http://localhost:9093)
-    *   **Grafana UI:** [http://localhost:3000](http://localhost:3000)
+1.  **Start with the `main` branch** to see the latest state of the project.
+2.  To go back in time, use the **"Code & Docs" link** for a specific phase. This will show you the `README.md` for that phase, which contains the specific instructions and examples for that version of the code.
+3.  To understand the *"why"* behind the changes, review the **Key Pull Requests** for each phase.
 
 ---
 
-## Validating the New Alerting Features
-
-1.  **Confirm Rules are Loaded:**
-    *   Navigate to the Prometheus UI's "Alerts" tab ([http://localhost:9090/alerts](http://localhost:9090/alerts)).
-    *   Verify that both new alerts are present and in the green "Inactive" state.
+## Running the Project
 
-2.  **Trigger the Alerts via the UI:**
-    *   Log in to the Web Client as **`lab-user` / `lab-user`**.
-    *   In the "Alerting Test Panel", repeatedly click the buttons to generate `403` and `5xx` errors.
-    *   Watch the Prometheus Alerts UI. The alerts will transition from `Inactive` to `Pending` (yellow) and then to `Firing` (red).
-    *   Once firing, the alerts will appear in the Alertmanager UI.
+To run the application and see usage examples for the **current phase**, please refer to the detailed instructions in its tagged `README.md` file.
 
-#### Stop the Environment
+**[>> Go to instructions for the current phase: `v6.0-proactive-alerting` <<](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v6.0-proactive-alerting?tab=readme-ov-file#local-development--quick-start)**
 
-```bash
-docker-compose down -v
-```
+As the lab progresses, this link will always be updated to point to the latest completed phase.