A trade off of running distributed applications is there are more types of downstream provider failures that may impact the upstream consumers.
Applications may fail in various ways:
-
Hardware failures
-
Network failures
-
OS, Virtualization or Container software failures
-
Application process failures
In distributed applications, developers are on the hook to consider and handle liveness issues. But this means developers must handle non-functional concerns, as well as the business application concerns.
In Cloud orchestration architectures including Kubernetes and Cloud Foundry, the Platform Operator role is responsible for hardening the underlying platforms, but failures still may occur. This is why we design our applications for disposability.
The scope of this example is to demonstrate failure detection of application instances, and how Cloud Foundry can handle it.
This is where Spring Boot Actuator helps. Actuator handles detection of backing resource dependencies in our applications.
We will demonstrate how to use an Actuator to expose a custom health check in a sample application. Cloud Foundry will leverage the Actuator health checks to dispose of unhealthy instances, and recover them.
-
A Cloud Foundry account:
- Your account set up with
SpaceDeveloper
role - Sufficient quota for 2 instances at 768M each
- A Pivotal Web Service account should be sufficient
- Your account set up with
-
JDK 8
-
Clone this project
-
Login to your Cloud Foundry account through
cf login
-
The
HealthCheckExampleApplication
is a simple Spring Boot application. -
The
HelloController
will fail if an improper id is passed as part of a GET request. This is obviously not a realistic use case, but we want a deterministic why to induce failure when running on Cloud Foundry. -
The
HelloControllerHealthCheck
is an Actuator exposed health check that will detect aHelloController
failure. -
Review the
application.properties
, the Actuator endpoint security is disabled for simplicity of running the example. -
Review the Cloud Foundry manifest
manifest.yml
. Note the HTTP health check and its associated endpoint configuration. Configure a unique route by filling in the{unique id}
and one of your foundation domains as{domain}
.
-
From the project root, build the project:
./gradlew build
-
Push the application:
cf push
-
Verify both app instances are running:
cf app hello
-
Review the app events:
cf events hello
-
Execute a successful GET request:
curl -i http://{route from your manifest}/0
-
Verify the Actuator health check:
curl -i http://{route from your manifest}/health
What do you see? You should see HTTP 200 and payload with STATUS of UP.
-
Open a separate terminal window, and tail the application logs:
cf logs hello
-
Execute a failure mode GET request:
curl -i http://{route from your manifest}/1
You should see an HTTP 500 error.
-
Verify the Actuator health check:
curl -i http://{route from your manifest}/health
What do you see? You should see HTTP 503 and payload with STATUS of DOWN.
-
Check state of the Cloud Foundry
hello
application.cf app hello
You might need to refresh, given Cloud Foundry's 30 second health check interval.
-
What states do you see an instance of the
hello
application? You might see acrashed
state, followed bystarting
. You should see Cloud Foundry recovery of the instance in arunning
state. -
Review the app logs. If you find the logs verbose, search your terminal window by
HEALTH
search pattern. What do you see? You should see the following:ERR Failed to make HTTP request to '/health' on port 8080: received status code 503
-
Review the application events:
cf events hello
What do you see? You should see an event for the application crash.
This a simple example of how Cloud Foundry's health check mechanism may be used to dispose of unhealthy instances. Without an application layer health check, it may be possible that an application instance process may be running, not healthy, and causing cascading failures in upstream consumers.