Permalink
Find file Copy path
222 lines (165 sloc) 10.9 KB

SE4 - Simple Standard Service Endpoints

Objective

The object of this specification is to provide a standard convention for access to server status, configuration and live health via HTTP (and SPDY if supported by the service). Services within the Beamly environment must implement this specification in order to be deployed into the production environment.

Important: these endpoints are only intended for internal consumption and not intended to be available externally, doing so may leak sensitive information outside your system.

Resources

The following resources must be implemented:

Name HTTP Verb URI Path
Status GET /service/status
Healthcheck GET /service/healthcheck
GTG (Good to Go) GET /service/healthcheck/gtg
Service Canary GET /service/healthcheck/asg

The following resources are desirable:

Name HTTP Verb URI Path
Config GET /service/config

Status

The status resource returns information about the service.

Valid Response Codes: 200 OK

Response Media Type: application/json

Element Path Required? Type Description Example
artifact_id M String Artifact name or maven artifact id "cpt-server"
build_number M String The build pipeline number "1537.1"
build_machine M String The machine the artifact was built or verified on "ip-10-4-1-16 (127.0.1.1)"
built_by M String The user that did the build "go"
built_when M DateTime When the build was done "2014-03-11T08:40:18.877Z"
compiler_version O1 String The compiler version "1.7.0_51"
current_time M DateTime The current time (time of request) "2014-03-12T19:40:18.877Z"
git_sha1 M String The git sha1 that can be used to identify the primary material for the build "d567d2650318f704747204815adedd2396a203f5"
group_id O String The maven group id "beamly.platform"
machine_name M String The name of the machine responding to this request "ip-10-1-11-196 (127.0.1.1)"
os_arch M String The architecture the OS of the machine responding to the request "amd64"
os_avgload O String The average load of the machine responding to the request "0.0"
os_name M String The name of the OS of the machine responding to the request "Linux"
os_numprocessors O String The number of processors of the machine responding to the request "2"
os_version M String The version of the OS of the machine responding to the request "3.2.0-55-virtual"
runbook_uri M URI The URI where the RUNBOOK can be found "https://XXXXXX/RUNBOOKS/CPT+Runbook"
up_duration M String How long the service responding to the request has been up "730444633 milliseconds"
up_since M DateTime The time at which the service was started "2014-03-04T08:46:13.877Z"
version M String The version of the service responding to the request "1537"
vm_name O2 String The name of the VM that the service is running on "Java HotSpot(TM) 64-Bit Server VM"
vm_vendor O2 String The vendor of the VM that the service is running on "Oracle Corporation"
vm_version O2 String The version of the VM that the service is running on "24.51-b03"

M Mandatory

O Optional

O1 Optional - however mandatory for compiled languages

O2 Optional - however mandatory for virtual machine based languages

Example:

< HTTP/1.1 200 OK
< Server: cpt-server-i/0.0.1/1552 (ip-10-0-1-231 HttpServer2/1552)
< X-Request-Id: req37d516de-cc86-11e3-fedf-ea0b6d27b8ee
< X-Request-Time: 2ms
< Cache-Control: no-cache
< Content-Type: application/json
< Content-Length: 717
<
{
    "artifact_id": "cpt-server",
    "build_number": "1552.1",
    "build_machine": "10-0-10-22",
    "built_by": "go",
    "built_when": "20140417-1342",
    "compiler_version": "1.7.0_51",
    "current_time": "2014-04-25T14:30:58.877Z",
    "git_sha1": "f61f8a375c6a5656a434a011cf93a245815a3e78",
    "group_id": "beamly.platform",
    "machine_name": "ip-10-0-1-231 (127.0.1.1)",
    "os_arch": "amd64",
    "os_avgload": "0.09",
    "os_name": "Linux",
    "os_numprocessors": "1",
    "os_version": "3.2.0-55-virtual",
    "runbook_uri": "https://XXXXXXXXXXXXXXX/XXXXX/RUNBOOKS/CPT+Runbook",
    "up_duration": "283103946 milliseconds",
    "up_since": "2014-04-22T07:52:34.877Z",
    "version": "1552",
    "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
    "vm_vendor": "Oracle Corporation",
    "vm_version": "24.55-b03"
}

Healthcheck

The healthcheck resource provides information about internal health and its perceived health of downstream dependencies.

It is up for the implementation of this specification to describe how a given healthcheck resource may affect the current state of the GTG and/or ASG resources, or neither.

Important: the healthcheck resource must not block waiting for healthcheck probes to execute, it should return the last known status.

Valid response codes: 200 OK

Response Media Type: application/json

Element Path Required? Type Description Example
report_as_of M DateTime The time at which this report was generated (this may not be the current time) "2014-03-12T20:16:55.447Z"
report_duration M String How long it took to generate the report "0 seconds"
tests M Array array of test results
tests[].duration_millis M Float Number of milliseconds taken to run the test 1.0
tests[].test_name M String The name of the test, a name that is meaningful to supporting engineers "CPTCluster"
tests[].test_result M String (Enum) The state of the test, may be "not_run", "running", "passed", "failed" "passed"
tests[].tested_at M DateTime The time at which this test was executed "2014-03-12T20:16:45.013Z"

Example:

< HTTP/1.1 200 OK
< Server: cpt-server-i/0.0.1/1552 (ip-10-0-1-164 HttpServer2/1552)
< X-Request-Id: req941baf96-cc86-11e3-e4b9-add31c37a536
< X-Request-Time: 11ms
< Cache-Control: no-cache
< Content-Type: application/json
< Content-Length: 418
<
{
    "report_as_of": "2014-04-25T14:33:33.383Z",
    "report_duration": "0 seconds",
    "tests": [
        {
            "duration_millis": 1.0,
            "test_name": "Cassandra (CPT)",
            "test_result": "passed",
            "tested_at": "2014-04-25T14:33:15.229Z"
        }
    ]
}

GTG - Good to Go

The "Good To Go" (GTG) returns a successful response in the case that the service is in an operational state and is able to receive traffic. This resource is used by load balancers and monitoring tools to determine if traffic should be routed to this service or not.

Note that GTG is not used to determine if the service is healthy or not, only if it is able to receive traffic. A healthy instance may not be able to accept traffic due to the failure of critical downstream dependencies.

A successful response is a 200 OK with a content of the text "OK" (including quotes) and a media type of "text/plain"

A failed response is a 5XX response with either a 500 or 503 response preferred. Failure to respond within a predetermined timeout typically 2 seconds is also treated as a failure.

< HTTP/1.1 200 OK
< Server: cpt-server-i/0.0.1/1552 (ip-10-0-1-164 HttpServer2/1552)
< X-Request-Id: reqd56ff738-cc86-11e3-0144-d52c0ac401f1
< X-Request-Time: 0ms
< Cache-Control: no-cache
< Content-Type: text/plain
< Content-Length: 4
<
"OK"

ASG - Service Canary

The "Service Canary" (ASG) returns a successful response in the case that the service is in a healthy state. If a service returns a failure response or fails to respond within a predefined timeout then the service can expect to be terminated and replaced. (Typically this resource is used in auto-scaling group healthchecks.)

A successful response is a 200 OK with a content of the text "OK" (including quotes) and a media type of "text/plain"

A failed response is a 5XX response with either a 500 or 503 response preferred. Failure to respond within a predetermined timeout typically 2 seconds is also treated as a failure.

< HTTP/1.1 200 OK
< Server: cpt-server-i/0.0.1/1552 (ip-10-0-11-226 HttpServer2/1552)
< X-Request-Id: reqef5a9cb8-cc86-11e3-c19c-1741c559ab62
< X-Request-Time: 0ms
< Cache-Control: no-cache
< Content-Type: text/plain
< Content-Length: 4
<
"OK"

Config

The Config resource returns the configuration that is used by the service. Typically this is a json representation of the configuration however it is left here as implementation dependent.

Valid response codes: 200 OK

Response Media Type: application/json