Agents are libraries that can act as a middlewear / helper utility in capturing incoming requests & outgoing responses data in order to construct ALF objects which can be used with the Galileo service.
Agents need to be injected at the appropriate point in the request-response lifecycle of the HTTP server, at the start of the request before processing any business logic, and at the end of the response before sending back to the client.
The Agent should expose the following configurations to the user, with fallback to default values when none are provided:
name | type | values | description | default |
---|---|---|---|---|
SERVICE_TOKEN |
String |
- |
Required, Galileo Service Token | - |
ENVIRONMENT |
String |
- |
Galileo Environemnt Slug | - |
LOG_BODIES |
String |
all , none , request , response |
Capture & send the full bodies of request & response | none |
RETRY_COUNT |
Integer |
0-10 |
Number of retries in case of failures | 0 |
CONNECTION_TIMEOUT |
Integer |
0-60 |
Timeout in seconds before aborting the current connection | 30 |
FLUSH_TIMEOUT |
Integer |
0-60 |
Timeout in seconds before flushing the current queue | 2 |
QUEUE_SIZE |
Integer |
0-1000 |
Total queue size before flushing | 1000 |
HOST |
RFC3986 |
- |
DNS Host Address of Galileo Collector | collector.galileo.mashape.com |
PORT |
RFC3986 |
- |
Port for Galileo Socket Service | 443 |
FAIL_LOG |
RFC3986 |
- |
File system path, storage location for failed requests | /dev/null |
The Galileo Collector accepts gzip
and deflate
compression, as well as plain text. Content-Encoding
must be set correctly when compression is used. Example: Content-Encoding: gzip
.
The Content-Type
header should be equal to application/json
.
The Galileo Collector provides two API endpoints to send data through:
Method:
POST
Content-Type:application/json
Group multiple ALF Objects into an array
[
{
"version": "1.1.0",
"serviceToken": "<my service token>",
"environment": "PRODUCTION",
"har": {
"log": {
"creator": {
"name": "HAR Logger",
"version": "1.0.0"
},
"entries": [{...}]
}
}
},
{
"version": "1.1.0",
"serviceToken": "<my service token>",
"environment": "PRODUCTION",
"har": {
"log": {
"creator": {
"name": "HAR Logger",
"version": "1.0.0"
},
"entries": [{...}]
}
}
},
...
]
Method:
POST
Content-Type:application/json
Construct a single ALF Object with multiple entries
.
{
"version": "1.1.0",
"serviceToken": "<my service token>",
"environment": "PRODUCTION",
"har": {
"log": {
"creator": {...},
"entries": [
{
"startedDateTime": "2016-03-13T03:47:16.937Z",
"serverIPAddress": "10.10.10.10",
"clientIPAddress": "10.10.10.20",
"time": 82,
"request": {...},
"response": {...},
"cache": {...},
"timings": {...}
},
{
"startedDateTime": "2016-03-13T03:47:16.937Z",
"serverIPAddress": "10.10.10.10",
"clientIPAddress": "10.10.10.20",
"time": 82,
"request": {...},
"response": {...},
"cache": {...},
"timings": {...}
},
...
]
}
}
}
errors
: Empty array, there are no errors
sent
: Integer, number of received ALF entries
saved
: Integer, number of saved ALF entries. This will be the same as sent
, since there were no errors
{"errors":[], "sent": 5, "saved": 5}
errors
: Array of strings. Each element is a detailed description of an error that occured. If possible, the string will begin with ALF[i]
where i
is the index of the ALF containing the error.
sent
: Integer, number of received ALF entries
saved
: Integer, number of saved ALF entries. This will be different from sent
, since there were errors
{"errors": ["ALF[2] Quota exceeded"], "sent": 3, "saved": 2}
Something is wrong in your HTTP request itself. Are you using gzip/deflate compression without setting the Content-Encoding
header? If the issue persists, please contact support.
The Collector will only accept requests smaller than 500 MB
.
An un-expected error occurred, please contact support@mashape.com if the error continues.
The agent MUST follow the following considerations in its operational logic:
-
On an Interval of
CONNECTION_TIMEOUT
and without a response from the server, the agent should terminate the request. -
On the cases of failure to send data, (whether through a rejection from The Collector, a
CONNECTION_TIMEOUT
event, or otherwise a network failure), the agent should retry up toRETRY_COUNT
, then eventually write toFAIL_LOG
.
TBD
- Collect data and add to a local memory queue before attempting to send to The Collector.
- Flush the queue and send to The Collector at:
- every
FLUSH_TIMEOUT
seconds - every time the queue length reaches
QUEUE_SIZE
- when the queue data size reaches
500 MB
- every
The Agent will use API Log Format to create log entries. Most of the fields in ALF spec are self-explanatory. Check out the ALF spec for additional information.
The following rules are beyond the scope of ALF and MUST be applied to all agents:
- Parse the request headers to obtain the true client IP (see reference table below).
- fallback to capturing the raw socket client IP if possible.
header | priority | description |
---|---|---|
Forwarded |
1 | RFC 7239 Standard |
X-Real-IP |
2 | mostly used in proxies |
X-Forwarded-For |
3 | common, non-standard |
Fastly-Client-IP |
4 | Fastly |
CF-Connecting-IP |
4 | CloudFlare |
X-Cluster-Client-IP |
4 | Rackspace, X-Ray |
Z-Forwarded-For |
5 | Z Scaler |
WL-Proxy-Client-IP |
5 | Oracle Web Logic |
Proxy-Client-IP |
5 | no-references |
-
Agents cannot obstruct the application natural flow.
- should not prevent the application from getting the request data for its own processing
- this is likely framework dependent, or in the case of
PHP
,Node.js
, the input stream can only be read once, and thus the agent must re-institute the stream so the application logic can continue un-interrupted.
-
Agents should attempt to get the RAW request as early as possible (as soon as the last byte is received and before application business logic)
- RAW referrs to the request body in its original state, before any post-processing by the application. This includes
gzip
encoded data, binary or any other content type or format that cannot be processed as plain text. - This is to ensure all original headers and body state are captured properly.
- Body capture should be triggered prior to any processing (decompression, modification, normalization, etc...) by the application or application framework.
- In many languages (especially:
PHP
,Node.js
) reading the input stream is awarded to the first listener, the stream is then flushed, thus blocking any following listeners from reading.- The agent should expect this scenario and provide detailed documentation and instructions for proper installment at the appropriate location for capturing the input stream.
- If the agent is successful in capturing the stream in those scenarios, it should also attempt to redirect the stream for any listeners afterwards, or, provide a raw body property for the application framework to use.
- RAW referrs to the request body in its original state, before any post-processing by the application. This includes
- The Agent should only attempt to process the response object at the time the application is ready to send it. (as soon as the last byte is ready to send).
- Just as with the request scenario, this is to ensure all possible headers and final modifications to the response objects are captured.
- Some languages (such as
PHP
) would terminate as soon as as the last byte is sent, it is important to trigger the agent logic, before sending the response is started, but not before constructing the response is completed.
Agents MUST adhere to the following steps regardless of the LOG_BODIES
option value:
- Calculate the request & response body size manually (in bytes)
- Fallback on the
Content-Length
header when manual calculation is not possible. - Use
0
when manual calculation is not possible or response comes from cache (e.g.304
)
- When not readily available, the agent should attempt to calculate headers sizes:
ALF.har.log.entries[].request.headersSize
,ALF.har.log.entries[].response.headersSize
- This can be achieved by reconstructing the HTTP Message from the start of the HTTP request/response message until (and including) the double
CRLF
before the body. - This means calculating the length of the headers as they appeared "on the wire", including the colon, space and
CRLF
between headers.
- This can be achieved by reconstructing the HTTP Message from the start of the HTTP request/response message until (and including) the double
There are 3 mandatory fields that require to be manually calculated (where applicable):
ALF.har.log.entries[].timings.send
: duration in milliseconds, between the first byte of request received and processing time.ALF.har.log.entries[].timings.wait
: duration in milliseconds, between the processing time and the first byte of response sent time.ALF.har.log.entries[].timings.receive
: duration in milliseconds, between the first byte of the response time and the last byte sent time.
The term "processing time" can refers to different meanings given the agent type:
- Proxy Agents: sending the last byte to the upstream service
- Native Agents: the start of the application business logic
When the request/response bodies are captured and will be transmitted, they MUST be encoded in base64:
For request bodies:
request.postData = { encoding: 'base64', text: 'BASE64_BODY' }
For response bodies:
response.content = { encoding: 'base64', text: 'BASE64_BODY' }