An Embulk plugin to ingest json records from REST API with transformation by jq
.
- Plugin type: input
- Resume supported: yes
- Cleanup supported: yes
- Guess supported: no
- scheme: URI Scheme for the endpoint (string, default:
"https"
, allows:"https"
,"http"
) - host: Hostname or IP address of the endpoint (string, required)
- port: Port number of the endpoint (integer, optional, allows:
0-65535
) - path: Path of the endpoint (string, optional)
- headers: HTTP Headers (array of map, optional, allows: 1 element can contains 1 key-value.)
- method: HTTP Method (string, default:
"GET"
, allows:"GET"
,"POST"
,"PUT"
,"PATCH"
,"DELETE"
,"GET"
,"HEAD"
,"OPTIONS"
,"TRACE"
,"CONNECT"
) - params: HTTP Request params. This is merged with params for pagenation when the
pager
option is specified. (array of map, optional, allows: 1 element can contains 1 key-value.) - body: HTTP Request body. (json, optional)
- success_condition: jq filter to check whether the response is succeeded or not. You can use
jq
to query for the status code and the response body. (string,".status_code_class == 200"
) - transformer: jq filter to transform the api response json. (string,
"[.response_body]"
) - extract_transformed_json_array: If true, the plugin extracts the transformed json array, and ingest them as records. (boolean, default:
true
) - pager: (the following options are acceptable, default:
{}
)- initial_params: Additional HTTP Request params that is used the first request. (array of map, optional, allows: 1 element can contains 1 key-value.)
- next_params: Additional HTTP Request params that is used the subsequent requests. The value is treated as a
jq
filter to transform the prior response. (array of map, optional, allows: 1 element can contains 1 key-value.) - next_body_transformer: jq filter to transform the prior response to the next request body. (string, default:
".request_body"
) - while: jq filter to check whether the pagination is required or not. You can use
jq
to query for the status code and the response body. (string,"false"
) - interval_millis: Interval in milliseconds between requests. (integer, default:
100
)
- retry: (the following options are acceptable, default:
{}
)- condition: jq filter to check whether the response is retryable or not. This condition will be used when it is determined that the response is not succeeded by
success_condition_jq
. You can usejq
to query for the status code and the response body. (string,"true"
) - max_retries: Maximum retries. (integer, default:
7
) - initial_interval_millis: Initial retry interval in milliseconds. (integer, default:
1000
) - max_interval_millis: Maximum retries interval in milliseconds. (integer, default:
60000
)
- condition: jq filter to check whether the response is retryable or not. This condition will be used when it is determined that the response is not succeeded by
- show_request_body_on_error: Show request body on error. (boolean, default:
true
) - default_timezone: Default timezone. (string, default:
"UTC"
) - default_timestamp_format: Default timestamp format. (string, default:
"%Y-%m-%d %H:%M:%S %z"
) - default_date: Default date. (string, default:
"1970-01-01"
)
About the jq
filter
The following options accept the jq
filter to transform the api response json.
- success_condition
- transformer
- pager/next_params
- pager/next_body_transformer
- retry/condition
All of the jq
filters transform json that has the same format as the following.
{
"request_params": [
{"name": "foo", "value": "bar"}
],
"request_body": {
"foo": "bar"
},
"status_code": 201,
"status_code_class": 200,
"response_body": {
"foo": "bar",
"results": [
{"id": 1, "name": "foo"},
{"id": 2, "name": "bar"}
]
}
}
The response of api is stored as the "response_body"
field, so please note that the jq
filter definition must start with .response_body
in order to perform jq transformations on the API response results.
in:
type: http_json
scheme: http
host: localhost
port: 8080
path: /example
method: GET
transformer: '.response_body.integerValues'
success_condition: '.status_code_class == 200'
out:
type: stdout
Firstly, you need to start the mock server.
$ ./example/run-mock-server.sh
then, you run the example.
$ ./gradlew gem
$ embulk run -Ibuild/gemContents/lib -X min_output_tasks=1 example/config.yml
The requested records are shown on the mock server console.
$ ./gradlew test
$ ./gradlew gem # -t to watch change of files and rebuild continuously
$ ./gradlew dependencies --write-locks
## Just check the format violations
$ ./gradlew spotlessCheck
## Fix the all format violations
$ ./gradlew spotlessApply
A new tag is pushed, then a new gem will be released. See the Github Action CI Setting.
See. Github Releases